Purpose of this page
This document specifies guidelines for resource depositors who wish to archive their resources in the repository of the SAW Leipzig / CLARIN Center Leipzig. Resource depositors are intended to carefully read this document and to check whether they are able to fulfill all criteria stated below. The repository offers support to resource depositors for all mentioned steps. In case of questions please contact us via clarin@saw-leipzig.de.
The archiving process accepts digital resources (including data and tools) for depositing on the servers and follows a defined workflow for their deposition.
Accepted Resources
The repository has a focus on written text corpora, reference corpora, general lexical resources and linguistic resources for lesser resourced languages. Preferably resources from these fields are integrated into the repository. Yet, the repository might also accept language-related resources from other fields as long as they are of high scientific value for the respective communities. In any other case, the repository is glad to help finding an adequate archiving facility at another institution.
The repository will only accept a resource that
- is the result of research projects,
- comes with exhaustive metadata,
- follows an established and standardized data format or for whose data format extensive documentation is available,
- has information on their creation and on their legal situation available.
Furthermore:
- metadata has to be provided in CMDI or at least in Dublin Core,
- data formats used in the submission should follow the SAW format recommendations,
- if no recommended format is used, an exhaustive documentation of the data has to be provided,
- only data that is freely available to everyone or that comes with a limited license which allows people working in research institutions will be added to the repository. Access to metadata must not be limited in any way.
We additionally encourage resource depositors to:
- provide reference to publications on the resource in the metadata and/or to make those publications part of the resource-bundle archived in the repository.
- provide a list of usage scenarios for which the resource is intended to be used.
Metadata has to be provided in
CMDI or at least in
Dublin Core. There is exhaustive documentation available on how to create CMDI compliant metadata profiles and instances. Metadata is checked for compliance according to CMDI standards in the following way:
- Check if XML Metadata is well-formed and valid.
- Are the used CMDI components and profiles stored in the Component Registry and publicly available
- Are the data categories/concepts used in those components/profiles present in the CLARIN Concept Registry or similar registries?
- Do the provided CMDI records contain enough and consistent information (e.g. consistent specification of the data producer's “name”) according to the needs of search platforms like the CLARIN VLO? Please also refer to our metadata requirements for resource deposition.
The metadata has to contain information on the data depositor and/or producer (name and URL of the person/institution, contact information) and a statement on the legal status of the resource.
The data depositor agrees to make this metadata publicly and 100% freely available via technical interfaces of the repository such as OAI-PMH and to allow the computation, reuse and redistribution of this metadata by third parties.
It is recommended to use formats as listed in the
SAW section of the Standards Information System.
At the moment, the CLARIN-D Centre Leipzig actively uses and endorses the following formats:
- For documentation: DCMI, CMDI, PDF/A, XHTML/HTML5
- For data serialization: Plaintext, CoNLL-X/U, Formats based on the TEI guidelines, RDF (RDF/XML, Turtle) on the basis of standardised ontologies like OntoLex-Lemon amd Lexinfo
In case no recommended/well known and documented format is used, an exhaustive documentation on the syntax and semantic of the data (e.g. database dumps: names of tables and columns; specifications and examples on the contents of each column; examples on how to retrieve different types of data) has to be provided. This documentation (English, PDF) is stored on the repository along with the data and metadata and is provided to everyone who wishes to download/access the resource.
Access Rights
Only data that is freely available to everyone or that comes with a limited license which allows people working in research institutions will be added to the repository. Access to metadata must not be limited in any way.
In the future, restriction of technical access to data might be supported to limit access to users working in research institutions by using Shibboleth and only accepting log-ins from users inside of the CLARIN-AAI / DFN-AAI.
In case privacy of subjects is a concern, this needs to be addressed by contracts signed by those subjects (e.g. interviewed people explicitly state that the data may be provided freely to researchers/teaching purposes).
Depositing Procedure
The specific procedure to deposit resources at the repository contains the following steps:
- Submission of a filled out Request form for resource deposition (RDRF) (to clarin@saw-leipzig.de) by the depositor, with possible rounds of feedback and resubmission.
- After the RDRF was accepted, acknowledgment and signing of the depositor's agreement by the depositor.
- Preparation of a Submission Information Package (SIP) by the depositor, consisting of:
- the signed depositor's agreement,
- metadata corresponding to the submitted resource,
- an archive file containing the data and adhering to the BagIt format.
- Appraisal and verification of the SIP by the data center, with possible rounds of feedback and resubmission.
- Preparation and ingestion of the Archival Information Package (AIP), consisting of the BagIt archive file and the resource's metadata.
Additional information concerning the depositing and ingest workflows can be found under
Workflows.
Documents
The following documents contain all relevant information in more detail. If you have any questions, send us an email to
clarin@saw-leipzig.de.