Skip to Content
CustomizeModel backendMetadata reference

Data types

This file describes the data types supported by the model builder for defining record metadata schemas.

Data typeShort description
booleanBoolean values (true/false)
int32-bit integer number
long64-bit integer number
floatSingle precision floating-point number
doubleDouble precision floating-point number
keywordKeyword field for exact matching
fulltextFull-text searchable field
fulltext+keywordFull-text field with keyword validation
i18nInternationalized text with language-value pairs
multilingualArray of language-value pairs for multilingual content
i18ndictSimple multilingual dictionary (language code keys)
dateBasic date values
datetimeDate and time values
timeTime values
edtf-timeEDTF (Extended Date/Time Format) time values
edtfEDTF (Extended Date/Time Format) values
edtf-intervalEDTF intervals
objectStructured object with defined properties
nestedNested object for complex structures
arrayArray/list of items
dynamic-objectObject with dynamic properties
polymorphicDiscriminated union type
pid-relationRelation to another record using a PID
vocabularyReference to a controlled vocabulary

Boolean data types

boolean

Data type for storing true/false values. Essential for research metadata like peer review status, open access availability, data availability, embargo status, and publication flags. In the UI, boolean fields are typically rendered as checkboxes. The data type includes internationalization support, automatically translating boolean values to localized “true”/“false” text for display purposes. The OpenSearch mapping stores boolean values efficiently and supports boolean queries for filtering and aggregations in research discovery interfaces.

PropertyDescription
source codeboolean.py 
jsonschema typeboolean
mappingboolean
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Boolean
ui_marshmallow_field_classFormatBoolean

Example:

peer_reviewed: type: boolean open_access: type: boolean

Valid input json example:

{ "peer_reviewed": true, "open_access": false }

Sample search queries:

QueryDescription
peer_reviewed:trueSearch for peer-reviewed documents
open_access:falseSearch for non-open access documents
peer_reviewed:true AND open_access:trueSearch for peer-reviewed open access documents

Numeric data types

int

Data type for 32-bit signed integers, supporting values from -2,147,483,648 to 2,147,483,647. Perfect for citation counts, sample sizes, publication years, version numbers, dataset entry counts, and other whole number values within this range. The type includes built-in validation for range constraints and supports localized number formatting in the UI. OpenSearch stores integers efficiently and provides fast numerical queries, aggregations, and sorting for research metrics and filtering. Use strict_validation to ensure the input is exactly an integer and not a string representation.

PropertyDescription
source codenumbers.py 
jsonschema typeinteger
mappinginteger
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Integer
ui_marshmallow_field_classFormatNumber
min_inclusiveMinimum allowed value (inclusive)
max_inclusiveMaximum allowed value (inclusive)
min_exclusiveMinimum allowed value (exclusive)
max_exclusiveMaximum allowed value (exclusive)
strict_validationMake sure that the value is exactly an integer, not a string containing an integer

Example:

citation_count: type: int min_inclusive: 0 sample_size: type: int min_inclusive: 1 max_inclusive: 1000000

Valid input json example:

{ "citation_count": 42, "sample_size": 1500 }

Sample search queries:

QueryDescription
citation_count:[10 TO 100]Search for documents with citation_count between 10 and 100
citation_count:10Search for documents with citation_count exactly 10

long

Data type for 64-bit signed integers, supporting values from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Essential for large numerical values like timestamps, file sizes in bytes, unique identifiers, or any integer that exceeds the 32-bit range. Uses the same marshmallow Integer field but maps to OpenSearch’s long type for extended range support. Includes the same validation and formatting features as the int type but with a much larger value range.

PropertyDescription
source codenumbers.py 
jsonschema typeinteger
mappinglong
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Integer
ui_marshmallow_field_classFormatNumber
min_inclusiveMinimum allowed value (inclusive)
max_inclusiveMaximum allowed value (inclusive)
min_exclusiveMinimum allowed value (exclusive)
max_exclusiveMaximum allowed value (exclusive)
strict_validationMake sure that the value is exactly an integer, not a string containing an integer

Example:

file_size_bytes: type: long min_inclusive: 0 dataset_entries: type: long min_inclusive: 1

Valid input json example:

{ "file_size_bytes": 2147483648, "dataset_entries": 5000000 }

Sample search queries:

QueryDescription
file_size_bytes:[1048576 TO 10485760]Search for files between 1MB and 10MB
dataset_entries:>=1000000Search for datasets with at least 1 million entries
file_size_bytes:>2147483647Search for files larger than 32-bit integer limit

float

Data type for single precision floating-point numbers (32-bit IEEE 754). Supports decimal values with approximately 7 decimal digits of precision, ranging from -3.4×10³⁸ to 3.4×10³⁸. Ideal for measurements, prices, ratings, percentages, and other decimal values where moderate precision is sufficient. The type provides localized number formatting and supports range validation. OpenSearch stores floats efficiently for numerical operations, but be aware of potential precision limitations for financial calculations or high-precision scientific data.

PropertyDescription
source codenumbers.py 
jsonschema typenumber
mappingfloat
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Float
ui_marshmallow_field_classFormatNumber
min_inclusiveMinimum allowed value (inclusive)
max_inclusiveMaximum allowed value (inclusive)
min_exclusiveMinimum allowed value (exclusive)
max_exclusiveMaximum allowed value (exclusive)
strict_validationMake sure that the value is exactly a float, not a string containing a float

Example:

impact_factor: type: float min_inclusive: 0.0 max_inclusive: 100.0 confidence_level: type: float min_inclusive: 0.0 max_inclusive: 1.0

Valid input json example:

{ "impact_factor": 3.247, "confidence_level": 0.95 }

Sample search queries:

QueryDescription
impact_factor:[2.0 TO 5.0]Search for publications with impact factor between 2.0 and 5.0
confidence_level:>=0.95Search for results with confidence level 95% or higher
impact_factor:>10.0Search for high-impact publications

double

Data type for double precision floating-point numbers (64-bit IEEE 754). Provides approximately 15-17 decimal digits of precision with a range from -1.8×10³⁰⁸ to 1.8×10³⁰⁸. Essential for high-precision calculations, scientific measurements, geographic coordinates, financial calculations requiring precision, and any decimal values where single precision is insufficient. Uses the same marshmallow Float field but maps to OpenSearch’s double type for extended precision and range. Recommended for most decimal number use cases due to its superior precision.

PropertyDescription
source codenumbers.py 
jsonschema typenumber
mappingdouble
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Float
ui_marshmallow_field_classFormatNumber
min_inclusiveMinimum allowed value (inclusive)
max_inclusiveMaximum allowed value (inclusive)
min_exclusiveMinimum allowed value (exclusive)
max_exclusiveMaximum allowed value (exclusive)
strict_validationMake sure that the value is exactly a float, not a string containing a float

Example:

latitude: type: double min_inclusive: -90.0 max_inclusive: 90.0 longitude: type: double min_inclusive: -180.0 max_inclusive: 180.0 measurement_value: type: double

Valid input json example:

{ "latitude": 49.2033737261832, "longitude": 16.59716724451779, "measurement_value": 123.456789012345 }

Sample search queries:

QueryDescription
latitude:[49.0 TO 50.0]Search for locations in latitude range
longitude:[16.0 TO 17.0] AND latitude:[49.0 TO 50.0]Search for locations in specific geographic area
measurement_value:>100.0Search for measurements above 100.0

Text data types

keyword

Data type for exact-match text fields that are not analyzed (not broken into tokens). Perfect for IDs, status values, categories, tags, and structured data where you need precise matching and efficient aggregations. OpenSearch stores keywords as-is, enabling fast exact searches, sorting, and faceting. Limited to 256 characters by default (ignore_above: 256). Supports pattern validation with regular expressions, enumerated value lists, and length constraints. Use this for dropdown options, and any text that should match exactly.

PropertyDescription
source codestrings.py 
jsonschema typestring
mappingkeyword with ignore_above: 256
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.String
ui_marshmallow_field_classno special serialization for UI
min_lengthMinimum string length
max_lengthMaximum string length
enumList of allowed values
patternRegular expression pattern for validation

Example:

publication_status: type: keyword enum: ["draft", "submitted", "accepted", "published", "retracted"] resource_type: type: keyword enum: ["dataset", "publication", "software", "image", "video"] subject_classification: type: keyword max_length: 100

Valid input json example:

{ "publication_status": "published", "resource_type": "dataset", "subject_classification": "Computer Science" }

Sample search queries:

QueryDescription
publication_status:publishedSearch for published documents
resource_type:(dataset OR publication)Search for datasets or publications
subject_classification:"Computer Science"Exact match for subject classification
publication_status:published AND resource_type:datasetSearch for published datasets

fulltext

Data type for analyzed text content that supports full-text search capabilities. OpenSearch breaks the text into tokens using analyzers, enabling sophisticated search features like stemming, synonyms, and relevance scoring. Ideal for descriptions, abstracts, comments, article content, and any text where users need to search within the content rather than match exactly. Unlike keyword fields, fulltext fields are optimized for search relevance but cannot be used efficiently for sorting or aggregations. No character limit by default.

PropertyDescription
source codestrings.py 
jsonschema typestring
mappingtext
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.String
ui_marshmallow_field_classno special serialization for UI
min_lengthMinimum string length
max_lengthMaximum string length
enumList of allowed values
patternRegular expression pattern for validation

Example:

abstract: type: fulltext min_length: 50 max_length: 5000 research_methods: type: fulltext technical_info: type: fulltext

Valid input json example:

{ "abstract": "This research investigates the impact of machine learning algorithms on data processing efficiency in large-scale distributed systems.", "research_methods": "We employed a mixed-methods approach combining quantitative analysis and qualitative interviews.", "technical_info": "The system was implemented using Python 3.9, TensorFlow 2.8, and deployed on Kubernetes clusters." }

Sample search queries:

QueryDescription
abstract:"machine learning"Search for documents mentioning machine learning in abstract
research_methods:(quantitative OR qualitative)Search for documents using specific research methods
technical_info:PythonSearch for documents mentioning Python in technical info
abstract:algorithms AND technical_info:TensorFlowSearch combining abstract and technical criteria

fulltext+keyword

Data type that combines both full-text search and exact-match capabilities in a single field. OpenSearch creates a multi-field mapping with the main field analyzed for full-text search and a .keyword sub-field for exact matching, sorting, and aggregations. Perfect for titles, names, and other text that needs both search functionality and precise filtering/faceting. This is the most versatile text type - use it for fields like article titles, author names, or any text where you want both search and exact-match capabilities. The keyword sub-field respects the 256-character limit.

PropertyDescription
source codestrings.py 
jsonschema typestring
mappingtext with keyword sub-field (ignore_above: 256)
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.String
ui_marshmallow_field_classno special serialization for UI
min_lengthMinimum string length
max_lengthMaximum string length
enumList of allowed values
patternRegular expression pattern for validation

Example:

title: type: fulltext+keyword required: true min_length: 5 max_length: 500 journal_name: type: fulltext+keyword required: true author_name: type: fulltext+keyword

Valid input json example:

{ "title": "Machine Learning Applications in Bioinformatics: A Comprehensive Survey", "journal_name": "Journal of Computational Biology", "author_name": "Dr. Jane Smith" }

Sample search queries:

QueryDescription
title:"Machine Learning"Full-text search in titles
title.keyword:"Machine Learning Applications in Bioinformatics: A Comprehensive Survey"Exact title match using keyword subfield
journal_name:"Journal of Computational Biology"Search for specific journal
author_name:SmithSearch for authors with surname Smith
title:bioinformatics AND journal_name:"Computational Biology"Combined search across fields

i18n

Data type for internationalized text fields that store language-value pairs as nested objects. Each entry contains a language code and corresponding text value, stored in OpenSearch as nested documents for independent querying. Supports customizable field names for language and value properties. Ideal for structured multilingual content where you need to query specific language-text combinations. Automatically creates nested facets for language-based filtering and provides specialized UI fields for multilingual input and display.

PropertyDescription
source codemultilingual.py 
jsonschema typeobject with language and value properties
mappingnested with language (keyword) and value (text+keyword) fields
Property in YAML schemaDescription
marshmallow_field_classI18nStrField
ui_marshmallow_field_classI18nStrUIField
multilingual.lang_nameCustom name for language field (defaults to “lang”)
multilingual.value_nameCustom name for value field (defaults to “value”)

Example:

title_multilingual: type: i18n multilingual: lang_name: "language" value_name: "text" abstract_i18n: type: i18n keywords_i18n: type: i18n

Valid input json example:

{ "title_multilingual": { "language": "en", "text": "Machine Learning in Bioinformatics" }, "abstract_i18n": { "lang": "en", "value": "This paper presents a comprehensive survey of machine learning applications in bioinformatics." }, "keywords_i18n": { "lang": "en", "value": "machine learning, bioinformatics, computational biology" } }

Sample search queries:

QueryDescription
title_multilingual.language:enSearch for documents with English titles
abstract_i18n.value:"machine learning"Search for machine learning in English abstracts
title_multilingual.language:en AND title_multilingual.text:bioinformaticsSearch for English titles containing bioinformatics
keywords_i18n.lang:en AND keywords_i18n.value:biologySearch for English keywords containing biology

multilingual

Data type for arrays of language-value pairs, enabling multiple translations per field. Similar to i18n but stores content as an array of nested objects rather than a single nested object. Perfect for fields that may have multiple translations or regional variants. Uses the same nested mapping as i18n but with array semantics. Provides array-based UI components for managing multiple language entries and supports the same faceting capabilities.

PropertyDescription
source codemultilingual.py 
jsonschema typearray of objects with language and value properties
mappingnested with language (keyword) and value (text+keyword) fields
Property in YAML schemaDescription
marshmallow_field_classMultilingualField
ui_marshmallow_field_classMultilingualUIField
multilingual.lang_nameCustom name for language field (defaults to “lang”)
multilingual.value_nameCustom name for value field (defaults to “value”)

Example:

description: type: multilingual additional_titles: type: multilingual notes: type: multilingual

Valid input json example:

{ "description": [ { "lang": "en", "value": "A comprehensive dataset for machine learning research" }, { "lang": "cs", "value": "Komplexní datová sada pro výzkum strojového učení" } ], "additional_titles": [ { "lang": "de", "value": "Maschinelles Lernen in der Bioinformatik" } ], "notes": [ { "lang": "en", "value": "Updated methodology section" } ] }

Sample search queries:

QueryDescription
description.lang:enSearch for documents with English descriptions
description.value:"machine learning"Search for machine learning in description values
description.lang:en AND description.value:datasetSearch for English descriptions containing dataset
additional_titles.lang:de AND additional_titles.value:BioinformatikSearch for German titles containing Bioinformatik

i18ndict

Data type for simple multilingual dictionaries where language codes are direct object keys mapping to text values (e.g., {"en": "English text", "fi": "Finnish text"}). Most straightforward multilingual format but with dynamic structure since language codes are property names. Uses OpenSearch object mapping with dynamic field creation. Ideal for simple translations where you don’t need structured querying capabilities and prefer the compact key-value format.

PropertyDescription
source codemultilingual.py 
jsonschema typeobject with additionalProperties: string
mappingobject with dynamic: true
Property in YAML schemaDescription
marshmallow_field_classi18n_strings (from invenio_vocabularies)
ui_marshmallow_field_class(no UI transformation - returns empty dict)

Example:

title_translations: type: i18ndict abstract_translations: type: i18ndict subject_headings: type: i18ndict

Valid input json example:

{ "title_translations": { "en": "Machine Learning in Bioinformatics", "cs": "Strojové učení v bioinformatice", "de": "Maschinelles Lernen in der Bioinformatik" }, "abstract_translations": { "en": "This paper presents a comprehensive survey", "cs": "Tento článek představuje komplexní přehled" }, "subject_headings": { "en": "Computer Science", "cs": "Informatika" } }

Sample search queries:

QueryDescription
title_translations.en:"Machine Learning"Search English title translations
title_translations.cs:bioinformaticeSearch Czech title translations
abstract_translations.en:surveySearch English abstract translations
subject_headings.*:"Computer Science"Search subject headings in any language

Date and time data types

date

Data type for date-only values (year-month-day) without time information. Stores dates in ISO 8601 format (YYYY-MM-DD) and supports various input formats through OpenSearch’s date parsing. Perfect for publication dates, birth dates, deadlines, and any date-centric information. The UI provides localized date formatting in multiple styles (long, medium, short, full) for different display contexts. Supports date range validation and efficient date-based queries, sorting, and aggregations in OpenSearch.

PropertyDescription
source codedate.py 
jsonschema typestring with format: date
mappingdate with format: basic_date||strict_date
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Date
ui_marshmallow_field_classmarshmallow_utils.fields.FormatDate
min_dateMinimum allowed date
max_dateMaximum allowed date

Example:

publication_date: type: date min_date: "1900-01-01" max_date: "2030-12-31" embargo_end_date: type: date data_collection_start: type: date

Valid input json example:

{ "publication_date": "2023-03-15", "embargo_end_date": "2024-03-15", "data_collection_start": "2022-06-01" }

Sample search queries:

QueryDescription
publication_date:[2023-01-01 TO 2023-12-31]Search for documents published in 2023
embargo_end_date:<=2024-12-31Search for embargoes ending by end of 2024
data_collection_start:>=2022-01-01Search for data collected from 2022 onwards
publication_date:"2023-03-15"Search for documents published on specific date

datetime

Data type for complete date and time information including timezone support. Stores timestamps in ISO 8601 format (YYYY-MM-DDTHH:MM:SSZ) with support for various input formats including milliseconds and timezone variations. Essential for created/modified timestamps, event scheduling, logging, and any time-sensitive data. OpenSearch automatically handles timezone conversions and provides powerful time-based queries, range filtering, and time-series aggregations. The UI offers localized datetime formatting for different presentation needs.

PropertyDescription
source codedate.py 
jsonschema typestring with format: date-time
mappingdate with multiple datetime formats
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.DateTime
ui_marshmallow_field_classmarshmallow_utils.fields.FormatDatetime
min_datetimeMinimum allowed datetime
max_datetimeMaximum allowed datetime

Example:

record_created: type: datetime last_modified: type: datetime submission_timestamp: type: datetime min_datetime: "2020-01-01T00:00:00Z"

Valid input json example:

{ "record_created": "2023-03-15T10:30:00Z", "last_modified": "2023-03-20T14:45:30.123Z", "submission_timestamp": "2023-03-15T09:15:22Z" }

Sample search queries:

QueryDescription
record_created:[2023-03-15T00:00:00Z TO 2023-03-15T23:59:59Z]Search for records created on specific day
last_modified:>=2023-03-20T00:00:00ZSearch for records modified from specific date
submission_timestamp:[2023-03-15T09:00:00Z TO 2023-03-15T10:00:00Z]Search for submissions in specific hour range
record_created:>=now-7dSearch for records created in last 7 days

time

Data type for time-only values (hours, minutes, seconds) without date information. Stores time in ISO 8601 time format (HH:MM:SS) with support for various input formats including milliseconds. Ideal for opening hours, duration timestamps, recurring event times, and any time-of-day information. OpenSearch maps this as a date field with time-specific formatting, enabling time-based queries and aggregations. The UI provides localized time formatting and supports time range validation for business rules.

PropertyDescription
source codedate.py 
jsonschema typestring with format: time
mappingdate with time formats
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Time
ui_marshmallow_field_classmarshmallow_utils.fields.FormatTime
min_timeMinimum allowed time
max_timeMaximum allowed time

Example:

experiment_start_time: type: time data_collection_time: type: time embargo_lift_time: type: time min_time: "00:00:00" max_time: "23:59:59"

Valid input json example:

{ "experiment_start_time": "09:30:00", "data_collection_time": "14:15:30.500", "embargo_lift_time": "00:00:01" }

Sample search queries:

QueryDescription
experiment_start_time:[09:00:00 TO 10:00:00]Search for experiments starting between 9-10 AM
data_collection_time:>=14:00:00Search for afternoon data collection
embargo_lift_time:"00:00:01"Search for embargoes lifting at midnight
experiment_start_time:<12:00:00Search for morning experiments

edtf-time

Data type for Extended Date/Time Format (EDTF) supporting flexible and imprecise date/time representations. EDTF handles uncertain dates, approximate dates, date ranges, and incomplete dates commonly found in historical, cultural, or scientific contexts. Examples include “1984?”, “1984~”, “1984/1985”, “198X”. Uses cached validation for performance and supports both standard dates and EDTF-specific notations. Essential for digital humanities, archives, museums, and any domain dealing with imprecise temporal information.

PropertyDescription
source codedate.py 
jsonschema typestring with format: date-time
mappingdate with EDTF datetime formats
Property in YAML schemaDescription
marshmallow_field_classmarshmallow_utils.fields.edtfdatestring.EDTFDateTimeString
ui_marshmallow_field_classmarshmallow_utils.fields.FormatEDTF

Example:

manuscript_date: type: edtf-time historical_event_date: type: edtf-time archaeological_dating: type: edtf-time

Valid input json example:

{ "manuscript_date": "1984?", "historical_event_date": "1945-05~", "archaeological_dating": "198X" }

Sample search queries:

QueryDescription
manuscript_date:"1984?"Search for manuscripts with uncertain 1984 dating
historical_event_date:"1945-05~"Search for events approximately in May 1945
archaeological_dating:198*Search for archaeological finds from 1980s
manuscript_date:[1980 TO 1990]Search for manuscripts from 1980s decade

edtf

Data type for Extended Date/Time Format (EDTF) focused on date values without time components. Supports uncertain dates (“1984?”), approximate dates (“1984~”), date ranges (“1984/1985”), decades (“198X”), centuries (“19XX”), and other flexible date representations. Ideal for historical records, archaeological data, manuscript dating, and any scholarly work requiring nuanced temporal expressions. The implementation optimizes for common date formats while supporting the full EDTF specification for complex cases.

PropertyDescription
source codedate.py 
jsonschema typestring with format: date
mappingdate with EDTF date formats
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.String
ui_marshmallow_field_classmarshmallow_utils.fields.FormatEDTF

Example:

artifact_dating: type: edtf cultural_period: type: edtf specimen_collection_date: type: edtf

Valid input json example:

{ "artifact_dating": "1450~", "cultural_period": "15XX", "specimen_collection_date": "2023-06?" }

Sample search queries:

QueryDescription
artifact_dating:"1450~"Search for artifacts dated approximately 1450
cultural_period:"15XX"Search for 15th century cultural periods
specimen_collection_date:"2023-06?"Search for specimens possibly collected in June 2023
cultural_period:1[45]*Search for 14th or 15th century periods

edtf-interval

Data type for EDTF interval representations, specifically designed for date ranges and temporal spans. Supports complex interval notations like “1984/1985”, “1984-01/1985-12”, and open-ended intervals (“1984/..”). Maps to OpenSearch’s date_range field type, enabling efficient range queries and interval-based aggregations. Perfect for project durations, historical periods, employment terms, and any time spans that may have uncertain or flexible boundaries typical in scholarly and archival contexts.

PropertyDescription
source codedate.py 
jsonschema typestring with format: date
mappingdate_range with EDTF date formats
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.String
ui_marshmallow_field_classmarshmallow_utils.fields.FormatEDTF

Example:

research_period: type: edtf-interval data_collection_period: type: edtf-interval funding_period: type: edtf-interval

Valid input json example:

{ "research_period": "2022/2024", "data_collection_period": "2022-06/2023-12", "funding_period": "2022/.." }

Sample search queries:

QueryDescription
research_period:"2022/2024"Search for research conducted from 2022 to 2024
data_collection_period:"2022-06/2023-12"Search for data collected in specific period
funding_period:"2022/.."Search for funding starting 2022 with open end
research_period:[2022 TO 2025]Search for research periods overlapping with range

Composite data types

object

Data type for structured objects with defined properties, creating nested document structures. Each object requires a properties definition specifying the nested fields and their types. OpenSearch stores objects as flattened structures, making them searchable but not independently queryable (use nested for that). Automatically generates marshmallow schemas for validation and serialization. Perfect for addresses, contact information, metadata structures, and any logical grouping of related fields. The dynamic: strict mapping prevents unexpected fields from being indexed.

PropertyDescription
source codecollections.py 
jsonschema typeobject
mappingobject with dynamic: strict
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Nested
ui_marshmallow_field_classmarshmallow.fields.Nested (with UI schema)
propertiesDictionary of nested field definitions
marshmallow_schema_classCustom Marshmallow schema class (optional)
ui_marshmallow_schema_classCustom UI Marshmallow schema class (optional)

Example:

author: type: object properties: name: type: fulltext+keyword required: true orcid: type: keyword affiliation: type: fulltext+keyword funding_info: type: object properties: agency: type: fulltext+keyword grant_number: type: keyword amount: type: double

Valid input json example:

{ "author": { "name": "Dr. Jane Smith", "orcid": "0000-0002-1825-0097", "affiliation": "Massachusetts Institute of Technology" }, "funding_info": { "agency": "National Science Foundation", "grant_number": "NSF-2023-1234", "amount": 250000.00 } }

Sample search queries:

QueryDescription
author.name:"Jane Smith"Search for documents by specific author
author.orcid:"0000-0002-1825-0097"Search by ORCID identifier
author.affiliation:"Massachusetts Institute of Technology"Search by author affiliation
funding_info.agency:"National Science Foundation"Search by funding agency
funding_info.amount:[100000 TO 500000]Search by funding amount range

nested

Data type for nested objects that maintain their structure independently in OpenSearch, unlike regular objects which are flattened. Each nested object is indexed as a separate document, enabling complex queries that preserve the relationship between nested fields. Essential for arrays of objects where you need to query specific combinations within individual array items (e.g., “authors where role=editor AND name=Smith”). More resource-intensive than regular objects but provides powerful querying capabilities. Supports specialized nested faceting and aggregations.

PropertyDescription
source codecollections.py 
jsonschema typeobject
mappingnested
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Nested
ui_marshmallow_field_classmarshmallow.fields.Nested (with UI schema)
propertiesDictionary of nested field definitions
marshmallow_schema_classCustom Marshmallow schema class (optional)
ui_marshmallow_schema_classCustom UI Marshmallow schema class (optional)

Example:

contributors: type: nested properties: name: type: fulltext+keyword role: type: keyword affiliation: type: fulltext+keyword measurements: type: nested properties: parameter: type: keyword value: type: double unit: type: keyword uncertainty: type: double

Valid input json example:

{ "contributors": { "name": "Prof. John Doe", "role": "supervisor", "affiliation": "Harvard University" }, "measurements": { "parameter": "temperature", "value": 23.5, "unit": "celsius", "uncertainty": 0.1 } }

Sample search queries:

QueryDescription
contributors:(name:"John Doe" AND role:supervisor)Search for specific contributor with role (preserves nested relationship)
measurements:(parameter:temperature AND value:[20 TO 25])Search for temperature measurements in range
contributors.affiliation:"Harvard University"Search by contributor affiliation
measurements:(unit:celsius AND uncertainty:<=0.5)Search for precise celsius measurements

array

Data type for ordered collections of homogeneous items, where each item must conform to the same schema defined in the items property. OpenSearch doesn’t store arrays as a separate mapping type; instead, it inherits the mapping from the item type and stores multiple values. Supports validation for minimum/maximum item counts and uniqueness constraints. Essential for lists like tags, authors, keywords, or any repeating elements. The UI automatically handles array item management and validation for each element.

PropertyDescription
source codecollections.py 
jsonschema typearray
mapping(inherits from items type)
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.List
ui_marshmallow_field_classmarshmallow.fields.List (with item UI field)
itemsDefinition of array item type
min_itemsMinimum number of items
max_itemsMaximum number of items
unique_itemsWhether items must be unique

Example:

keywords: type: array items: type: keyword min_items: 1 max_items: 50 unique_items: true subject_classifications: type: array items: type: keyword unique_items: true related_identifiers: type: array items: type: keyword

Valid input json example:

{ "keywords": [ "machine learning", "bioinformatics", "computational biology", "data analysis" ], "subject_classifications": [ "Computer Science", "Biology", "Statistics" ], "related_identifiers": [ "10.1038/nature12373", "10.1126/science.1234567", "arXiv:2023.1234" ] }

Sample search queries:

QueryDescription
keywords:"machine learning"Search for documents with specific keyword
keywords:("machine learning" AND bioinformatics)Search for documents with multiple keywords
subject_classifications:"Computer Science"Search by subject classification
related_identifiers:"10.1038/nature12373"Search by related DOI identifier
keywords:machine*Search for keywords starting with “machine”

dynamic-object

Data type for objects with unpredictable or variable property names, such as multilingual content keyed by language codes (e.g., {“en”: “English text”, “fi”: “Finnish text”}). Uses a permissive schema that accepts any properties and maps to OpenSearch with dynamic: true, allowing arbitrary fields to be indexed automatically. Perfect for internationalization, user-defined metadata, flexible configuration objects, and any scenario where the structure cannot be predefined. No UI transformation is applied since the structure is inherently dynamic.

PropertyDescription
source codecollections.py 
jsonschema typeobject with additionalProperties: true
mappingobject with dynamic: true
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Nested (with PermissiveSchema)
ui_marshmallow_field_class(no UI transformation)

Example:

custom_metadata: type: dynamic-object instrument_settings: type: dynamic-object user_defined_fields: type: dynamic-object

Valid input json example:

{ "custom_metadata": { "experiment_id": "EXP-2023-001", "protocol_version": "v2.1", "lab_conditions": { "temperature": 22.5, "humidity": 45 } }, "instrument_settings": { "microscope_magnification": "40x", "laser_power": 75, "scan_speed": "medium" }, "user_defined_fields": { "internal_code": "BIO-ML-2023", "project_phase": "pilot_study" } }

Sample search queries:

QueryDescription
custom_metadata.experiment_id:"EXP-2023-001"Search by experiment ID
custom_metadata.protocol_version:v2*Search by protocol version pattern
instrument_settings.microscope_magnification:"40x"Search by instrument setting
user_defined_fields.project_phase:pilot_studySearch by user-defined field
custom_metadata.lab_conditions.temperature:[20 TO 25]Search by nested dynamic field value

polymorphic

Data type for discriminated union types where a field can represent different object types based on a discriminator field (typically “type”). Each variant in the oneof array specifies both a discriminator value and its corresponding schema. At runtime, the discriminator field determines which schema to use for validation and serialization. OpenSearch merges all possible properties into a single object mapping. Perfect for modeling heterogeneous entities like different types of creators (person vs organization), various content types, or flexible configuration options where the structure depends on a type field.

PropertyDescription
source codepolymorphic.py 
jsonschema typeoneOf with discriminator constraints
mappingobject with merged properties
Property in YAML schemaDescription
marshmallow_field_classPolymorphicField
ui_marshmallow_field_classPolymorphicField
discriminatorField name used to determine schema variant (defaults to “type”)
oneofArray of schema variants with discriminator values

Example:

creator: type: polymorphic discriminator: creator_type oneof: - discriminator: person type: object properties: name: type: fulltext+keyword orcid: type: keyword affiliation: type: fulltext+keyword - discriminator: organization type: object properties: name: type: fulltext+keyword ror_id: type: keyword country: type: keyword resource: type: polymorphic discriminator: resource_type oneof: - discriminator: dataset type: object properties: format: type: keyword size_bytes: type: long - discriminator: publication type: object properties: journal: type: fulltext+keyword doi: type: keyword

Valid input json example:

{ "creator": { "creator_type": "person", "name": "Dr. Jane Smith", "orcid": "0000-0002-1825-0097", "affiliation": "MIT" }, "resource": { "resource_type": "dataset", "format": "CSV", "size_bytes": 1048576 } }

Sample search queries:

QueryDescription
creator.creator_type:personSearch for resources created by persons
creator.creator_type:person AND creator.name:"Jane Smith"Search for resources by specific person
resource.resource_type:datasetSearch for dataset resources
resource.resource_type:dataset AND resource.format:CSVSearch for CSV datasets
creator.orcid:"0000-0002-1825-0097"Search by creator ORCID (works for person type)

Vocabularies and relations

pid-relation

Data type for creating relationships between records using Persistent Identifiers (PIDs). Stores selected fields from related records locally while maintaining a reference to the source record via its PID. The keys property specifies which fields to cache locally (e.g., id, title), enabling efficient searches and display without requiring joins. Supports automatic resolution, caching, and updating of related record data. Essential for modeling relationships like citations, dependencies, or any cross-record references where you need both performance and data consistency.

PropertyDescription
source coderelations.py 
jsonschema typeobject (inherited from ObjectDataType)
mappingobject (inherited from ObjectDataType)
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Nested
ui_marshmallow_field_classmarshmallow.fields.Nested (with UI schema from ObjectDataType)
keysList of keys to include in the relation (e.g., ["id", "metadata.title"])
record_clsTarget record class (e.g., "my_other_model.records:record")
pid_fieldPID field getter function or PID field instance
cache_keyOptional cache key for caching the resolved record
relation_field_kwargsAdditional kwargs for the relation field

Example:

cited_publication: type: pid-relation keys: ["id", "metadata.title", "metadata.doi"] record_cls: "publications.records:PublicationRecord" related_dataset: type: pid-relation keys: ["id", "metadata.title", "metadata.resource_type"] record_cls: "datasets.records:DatasetRecord" parent_collection: type: pid-relation keys: ["id", "metadata.title"] record_cls: "collections.records:CollectionRecord"

Valid input json example:

{ "cited_publication": { "id": "pub-12345", "metadata": { "title": "Advanced Machine Learning Techniques", "doi": "10.1038/nature12373" } }, "related_dataset": { "id": "ds-67890", "metadata": { "title": "Bioinformatics Training Data", "resource_type": "dataset" } }, "parent_collection": { "id": "col-11111", "metadata": { "title": "Machine Learning Research Collection" } } }

Sample search queries:

QueryDescription
cited_publication.id:"pub-12345"Search by cited publication ID
cited_publication.metadata.title:"Machine Learning"Search by cited publication title
cited_publication.metadata.doi:"10.1038/nature12373"Search by cited publication DOI
related_dataset.metadata.resource_type:datasetSearch for documents with related datasets
parent_collection.metadata.title:*Research*Search by parent collection title pattern

vocabulary

Data type for references to controlled vocabularies, extending pid-relation with vocabulary-specific functionality. Automatically configures appropriate fields and schemas based on the vocabulary-type (languages, affiliations, funders, awards, subjects). Each vocabulary type has predefined field sets (e.g., affiliations include identifiers and name fields). Provides seamless integration with Invenio’s vocabulary system, automatic PID field resolution, and specialized marshmallow schemas. Essential for maintaining data consistency and enabling faceted search across standardized terms and entities.

PropertyDescription
source codevocabularies.py 
jsonschema typeobject (inherited from PIDRelation)
mappingobject (inherited from PIDRelation)
Property in YAML schemaDescription
marshmallow_field_classmarshmallow.fields.Nested
ui_marshmallow_field_classmarshmallow.fields.Nested (with UI schema from ObjectDataType)
vocabulary-typeType of vocabulary (e.g., "languages", "affiliations", "funders", "awards", "subjects")
keysList of keys to include (auto-populated based on vocabulary type if not specified)
record_cls(inherited from pid-relation, auto-determined from vocabulary-type)
pid_field(inherited from pid-relation, auto-determined from vocabulary-type)
cache_key(inherited from pid-relation, defaults to vocabulary-type if not specified)

Example:

language: type: vocabulary vocabulary-type: languages author_affiliation: type: vocabulary vocabulary-type: affiliations research_subject: type: vocabulary vocabulary-type: subjects funding_agency: type: vocabulary vocabulary-type: funders related_award: type: vocabulary vocabulary-type: awards

Valid input json example:

{ "language": { "id": "eng", "title": { "en": "English", "cs": "Angličtina" } }, "author_affiliation": { "id": "mit", "title": { "en": "Massachusetts Institute of Technology" }, "identifiers": { "ror": "042nb2s44" } }, "research_subject": { "id": "computer-science", "title": { "en": "Computer Science" } }, "funding_agency": { "id": "nsf", "title": { "en": "National Science Foundation" } }, "related_award": { "id": "nsf-2023-1234", "title": { "en": "Machine Learning for Scientific Discovery" } } }

Sample search queries:

QueryDescription
language.id:engSearch for English language documents
author_affiliation.id:mitSearch by affiliation ID
author_affiliation.title.en:"Massachusetts Institute of Technology"Search by affiliation name
author_affiliation.identifiers.ror:"042nb2s44"Search by ROR identifier
research_subject.title.en:"Computer Science"Search by subject title
funding_agency.id:nsf AND related_award.id:nsf*Search for NSF funding and awards
Last updated on