Purpose of this page
This document specifies guidelines for resource depositors who wish to archive their resources in the repository of the SAW Leipzig / CLARIN Center Leipzig. Resource depositors are intended to carefully read this document and to check whether they are able to fulfill all criteria stated below. The repository offers support to resource depositors for all mentioned steps. In case of questions please contact us via email@example.com.
The archiving process accepts digital resources (including data and tools) for depositing on the servers and follows a defined workflow for their deposition.
The repository has a focus on written text corpora, reference corpora, general lexical resources and linguistic resources for lesser resourced languages. Preferably resources from these fields are integrated into the repository. Yet, the repository might also accept language-related resources from other fields as long as they are of high scientific value for the respective communities. In any other case, the repository is glad to help finding an adequate archiving facility at another institution.
The repository will only accept a resource that
- is the result of research projects,
- comes with exhaustive metadata,
- that follow an established and standardized data format or for whose data format extensive documentation is available,
- for which information on their creation and on their legal situation is available.
- metadata has to be provided in CMDI or at least in Dublin Core,
- it is recommended to use formats listed in the CLARIN standard recommendations,
- if no recommended format is used, an exhaustive documentation of the data has to be provided,
- only data that is freely available to everyone or that comes with a limited license which allows people working in research institutions will be added to the repository. Access to metadata must not be limited in any way.
We additionally encourage resource depositors to:
- provide reference to publications on the resource in the metadata and/or to make those publications part of the resource-bundle archived in the repository.
- provide a list of usage scenarios for which the resource is intended to be used.
Metadata has to be provided in CMDI
or at least in Dublin Core
. There is exhaustive documentation available on how to create CMDI compliant metadata profiles and instances. Metadata is checked for compliance according to CMDI standards in the following way:
- Check if XML Metadata is well-formed and valid.
- Are the used CMDI components and profiles stored in the Component Registry and publicly available
- Are the data categories/concepts used in those components/profiles present in the CLARIN Concept Registry or similar registries?
- Do the provided CMDI records contain enough and consistent information (e.g. consistent specification of the data producer's “name”) according to the needs of search platforms like the CLARIN VLO? Please also refer to our metadata requirements for resource deposition.
The metadata has to contain information on the data depositor and/or producer (name and URL of the person/institution, contact information) and a statement on the legal status of the resource.
The data depositor agrees to make this metadata publicly and 100% freely available via technical interfaces of the repository such as OAI-PMH and to allow the computation, reuse and redistribution of this metadata by third parties.
Data & Formats
It is recommended to use formats listed in the CLARIN standard recommendations (see for example the Standards document
or the Standards and Formats section
on the CLARIN Web page).
At the moment, the CLARIN-D Centre Leipzig actively uses and endorses the following formats:
- For documentation: DCMI, CMDI, PDF/A, XHTML/HTML5
- For data serialization: Plaintext, CoNLL-X/U, Formats based on the TEI guidelines, RDF (RDF/XML, Turtle) on the basis of standardised ontologies like OntoLex-Lemon amd Lexinfo
In case no recommended/well known and documented format is used, an exhaustive documentation on the syntax and semantic of the data (e.g. database dumps: names of tables and columns; specifications and examples on the contents of each column; examples on how to retrieve different types of data) has to be provided. This documentation (English, PDF) is stored on the repository along with the data and metadata and is provided to everyone who wishes to download/access the resource.
Only data that is freely available to everyone or that comes with a limited license which allows people working in research institutions will be added to the repository. Access to metadata must not be limited in any way.
In the future, restriction of technical access to data might be supported to limit access to users working in research institutions by using Shibboleth and only accepting log-ins from users inside of the CLARIN-AAI / DFN-AAI.
In case privacy of subjects is a concern, this needs to be addressed by contracts signed by those subjects (e.g. interviewed people explicitly state that the data may be provided freely to researchers/teaching purposes).
The specific procedure to deposit resources at the repository contains the following steps:
- signing the depositors agreement (or in a first stage stating to do so in case the request is accepted by the repository)
- filling out the resource deposition request form
- mailing these documents to firstname.lastname@example.org
The following documents contain all relevant information in more detail. If you have any questions, send us an email to email@example.com