Initial data for vocabularies easy
You have two options to add vocabularies to your repository:
- Put the required vocabulary files into the
app_datafolder inside your repository - Create a python library with the vocabulary data and install it into your repository
Using app_data folder
Files in the app_data folder are automatically detected and loaded during the
./run.sh services setup command and have the highest priority. The structure of the
app_data folder is as follows:
app_data
├── vocabularies # folder containing vocabulary files
│ ├── languages.yaml # vocabulary file for 'languages' vocabulary
│ └── names.yaml # vocabulary file for 'names' vocabulary
└── vocabularies.yaml # catalogue file defining the vocabularies to be importedCatalogue file
The vocabularies that will be imported are defined in a fixture catalogue file. The file is located at
`app_data/vocabularies.yaml` and contains a list of vocabularies to be imported.
```yaml
languages:
pid-type: v-lng
data-file: vocabularies/languages.yaml
names:
pid-type: names
data-file: vocabularies/names.yamlThe languages: is the vocabulary identifier. It must be unique within the vocabularies.yaml file.
The next line, pid-type: lng specifies the pid type to be used for the vocabulary (must be unique within the repository and at most 6 characters long). The data-file: line specifies the path to the vocabulary file, relative to the app-data folder.
Vocabulary file
Vocabulary file is a YAML document file containing the vocabulary records.
# located at app_data/vocabularies/languages.yaml
- id: eng
props:
alpha_2: en
tags:
- individual
- living
title:
en: English
- id: ces
props:
alpha_2: cs
tags:
- individual
- living
title:
en: CzechEach member of the array represents a single vocabulary item. The first line specifies the id of the vocabulary item, which must be unique within the particular vocabulary (languages in this case). The last section specifies the title of the vocabulary item in different languages. You might add text props, tags or custom fields here as well. See Invenio vocabularies schema for more information.
Using specific python package
TODODevelopment tips
For development purposes, you might want to speed up the initial data loading by reducing
the number of vocabulary terms loaded. You can do so by modifying the
app_data/vocabularies.yaml file to only include the vocabulary terms you need.
The vocabularies not listed in the catalogue will be loaded from the default Invenio RDM vocabularies package.
Suggested changes:
# app_data/vocabularies.yaml
languages:
pid-type: v-lng
data-file: vocabularies/languages.yaml
names:
pid-type: v-names
data-file: vocabularies/names.yaml
licenses:
pid-type: v-lic
data-file: vocabularies/licenses.csv
code:developmentStatus:
pid-type: v-devs
data-file: vocabularies/development_status.yaml
code:programmingLanguages:
pid-type: v-prg
data-file: vocabularies/programming_languages.yaml# app_data/vocabularies/languages.yaml
- id: eng
props:
alpha_2: en
tags:
- individual
- living
title:
en: English
- id: ces
props:
alpha_2: cs
tags:
- individual
- living
title:
en: Czech# app_data/vocabularies/names.yaml
- affiliations:
- name: Northwestern University
family_name: Riesbeck
given_name: Christopher
id: 0000-0001-7673-1000
identifiers:
- identifier: 0000-0001-7673-1000
scheme: orcid
- affiliations:
- name: Helmholtz-Zentrum Dresden-Rossendorf
family_name: Hauser
given_name: Sandra
id: 0000-0001-8206-6000
identifiers:
- identifier: 0000-0001-8206-6000
scheme: orcid# app_data/vocabularies/licenses.csv
id,title__en,description__en,icon,tags,props__url,props__scheme,props__osi_approved
cc-by-4.0,Creative Commons Attribution 4.0 International,The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited.,cc-by-icon,"recommended,all,data",https://creativecommons.org/licenses/by/4.0/,spdx,
cc-by-nc-4.0,Creative Commons Attribution Non Commercial 4.0 International,,cc-by-nc-icon,"all,data",https://creativecommons.org/licenses/by-nc/4.0/,spdx,
cc-by-nc-nd-4.0,Creative Commons Attribution Non Commercial No Derivatives 4.0 International,,cc-by-nc-nd-icon,"all,data",https://creativecommons.org/licenses/by-nc-nd/4.0/,spdx,
cc-by-nc-sa-4.0,Creative Commons Attribution Non Commercial Share Alike 4.0 International,,cc-by-nc-sa-icon,"all,data",https://creativecommons.org/licenses/by-nc-sa/4.0/,spdx,
cc-by-nd-4.0,Creative Commons Attribution No Derivatives 4.0 International,,cc-by-nd-icon,"all,data",https://creativecommons.org/licenses/by-nd/4.0/,spdx,
cc-by-sa-4.0,Creative Commons Attribution Share Alike 4.0 International,Permits almost any use subject to providing credit and license notice. Frequently used for media assets and educational materials. The most common license for Open Access scientific publications. Not recommended for software.,cc-by-sa-icon,"recommended,all,data",https://creativecommons.org/licenses/by-sa/4.0/,spdx,
cc0-1.0,Creative Commons Zero v1.0 Universal,CC0 waives copyright interest in a work you've created and dedicates it to the world-wide public domain. Use CC0 to opt out of copyright entirely and ensure your work has the widest reach.,cc-cc0-icon,"recommended,all,data,software",https://creativecommons.org/publicdomain/zero/1.0/,spdx,# app_data/vocabularies/development_status.yaml
[]# app_data/vocabularies/programming_languages.yaml
[]Not using rdm vocabularies at all
TODOSee also
Invenio RDM documentation on vocabularies