Search configuration
This page covers how to customize search behavior for your model using oarepo-model’s preset system and customizations.
Presets
The oarepo-model framework uses presets to dynamically generate search configuration at build time. These presets scan your model schema and create the necessary service classes, facets, and OpenSearch mappings.
| Preset | Generated | Description |
|---|---|---|
RecordSearchOptionsPreset | RecordSearchOptions class | Adds grouped facets (GroupedFacetsParam) and query parsing with SearchQueryValidator |
RecordFacetsPreset | RecordFacets dict | Generates facets from model’s record fields |
MetadataFacetsPreset | facet modules | Generates facets from model’s metadata fields |
RecordMappingPreset | Elasticsearch mapping | Creates JSON mapping for the index |
How Facets Are Generated
The get_facets() function scans your model’s schema types through the type registry:
- Schema Type Discovery - Each datatype in the type registry implements
get_facet() - Facet Building -
build_facet()converts facet definitions into facet objects - Module Generation - Facets are added to the
facetsmodule withAddToModule - Dictionary Collection - All facets are collected into
RecordFacetsdictionary
How Mappings Are Generated
The get_mapping() function creates OpenSearch mappings from your model metadata:
- Type Resolution - Datatype is retrieved from the type registry
- Mapping Creation -
create_mapping()generates base mapping from the schema - Type Removal - The top-level
"type"key is removed from the mapping - Merge with Defaults - Final mapping is merged with default fields (id, created, pid, etc.)
- File Output - Mapping is written to
mappings/os-v2/<base_name>/metadata-v<version>.json
The merged mapping includes base fields:
{
"mappings": {
"dynamic": "strict",
"properties": {
"$schema": {"type": "keyword"},
"id": {"type": "keyword"},
"created": {"type": "date"},
"updated": {"type": "date"},
"expires_at": {"type": "date"},
"indexed_at": {"type": "date"},
"uuid": {"type": "keyword"},
"version_id": {"type": "integer"},
"pid": {
"properties": {
"obj_type": {"type": "keyword", "index": false},
"pid_type": {"type": "keyword", "index": false},
"pk": {"type": "long", "index": false},
"status": {"type": "keyword", "index": false}
}
},
"metadata": { /* your model's fields */ }
}
}
}Customizations
Add these customizations in your model’s model/model_config.py to modify search behavior without overriding base classes.
Basic Search Customizations
from oarepo_model.customizations import (
SetDefaultSearchFields,
SetIndexTotalFieldsLimit,
SetIndexNestedFieldsLimit,
)
class ModelConfig:
customizations = [
# Set default text search fields for Full Text Search
SetDefaultSearchFields("metadata.title", "metadata.description"),
# Increase field limits for complex metadata models
SetIndexTotalFieldsLimit(5000),
SetIndexNestedFieldsLimit(500),
]Custom Analyzers and Multi-Field Mappings
For advanced search scenarios like people names with diacritics, you can add custom analyzers and multi-field mappings:
from oarepo_model.customizations import (
PatchIndexSettings,
PatchIndexPropertyMapping,
AddFacetGroup,
)
customizations = [
# Define custom analyzers for name search
PatchIndexSettings({
"analysis": {
"people_analyzer": {
"tokenizer": "people_tokenizer",
},
"asciifolded_people_analyzer": {
"tokenizer": "people_tokenizer",
"filter": ["asciifolding"],
},
},
"tokenizer": {
"people_tokenizer": {
"type": "pattern",
"pattern": "\\s*[,.]\\s*",
},
},
}),
# Add multi-field mappings for creator/contributor names
PatchIndexPropertyMapping("metadata.creators.person_or_org.name", {
"type": "text",
"fields": {
"_search": {
"type": "text",
"analyzer": "people_analyzer",
},
"_ascii_search": {
"type": "text",
"analyzer": "asciifolded_people_analyzer",
},
},
}),
# Add facet groups for organizing filters
AddFacetGroup(
"default",
["metadata.publisher", "metadata.resource_type", "metadata.languages"],
),
]This allows:
- Searching for names with diacritics (e.g., “Černý”) while also matching ASCII-folded versions (e.g., “Cerny”)
- Names are tokenized by pattern (
\s*[,.]\s*) to handle different name formats - Facets organized into groups for better UX
Available Search Customizations
| Customization | Description |
|---|---|
SetDefaultSearchFields(*search_fields) | Specifies default text search fields for Full Text Search |
PatchIndexSettings(settings) | Modifies OpenSearch index settings (analyzers, tokenizers, filters, etc.) |
SetIndexTotalFieldsLimit(limit) | Sets the maximum number of top-level fields in mapping |
SetIndexNestedFieldsLimit(limit) | Sets the maximum number of nested object fields |
PatchIndexPropertyMapping(field, mapping) | Modifies or adds OpenSearch mapping for a specific field |
AddFacetGroup(name, facets, exists_ok=False) | Adds a facet group for organizing related filters |
Facet Configuration in Model Schema
Facets are automatically generated from fields that define facet behavior in your model schema. For vocabulary fields and custom types, facets are built by calling get_facet() on the datatype.
Example: Adding Facet to a Field
To make a field facetable, the datatype must implement the get_facet() method that returns facet configuration. Most built-in types already support facets.
publishers:
type: vocabulary
vocabulary-type: institutions
label:
en: Publisher
facets:
aggregation: publisherfacet