Purpose of this page
This document specifies the preservation policy at the repository. If you have any questions, send us an email to clarin@saw-leipzig.de.
Data Storage
Data is stored on a RAID system and all contents are regularly copied to separate hardware. The deterioration of storage media is monitored continously via Icinga probes (e.g. using S.M.A.R.T. – Self-Monitoring, Analysis and Reporting Technology – data). Access to the archive system is limited to a small group of people. Write access to data storage systems running on the archive machine is limited to the machine itself and dedicated personnel.
Data Preservation
Depositors are encouraged to use standardized formats (UTF-8 encoding, documented XML formats, ...) when submitting their data for deposition. In case custom / proprietary formats are used, an exhaustive documentation has to be provided. The repository will check twice a year:
- whether the data that was stored is unchanged (via checksums)
- whether an update of the deposited data due to the obsolescence of the used format is necessary
- whether an update of the available metadata is necessary
In case an update is necessary, the depositor is contacted and asked to provide an updated version. In some cases the center may also decide to do the update on its own. In this case the original depositor will be informed (in case the person/institution is still available).
Backup
Two independent backup strategies are used to strengthen redundancies and enable separate fallback recoveries:
- backups of the relevant virtual machines are created via dedicated mechanisms of our virtualisation solution every week, always preserving the five most recent versions.
- backups of the underlying OCFL repository are created every month, always storing the three most recent versions. From the contents of this OCFL repository, the entirety of the repository can also be recreated.
Besides those continuous backup versions, semiannual backups are preserved as long-term versions for both strategies, to enable the option of restorations of older snapshots.
Backups are held on hardware that is situated on locations that are separated from the live system and is monitored for deterioration.
Software Stack Preservation
The center utilizes widely used open source software stacks (Linux, Fedora Repository, Apache Tomcat, MariaDB) to facilitate all repository services (data storage, OAI-PMH, ...). This maximizes the probability of long term support (updates, security fixes) for the tools being used and improves the ability to run installations of these software stacks independent from the underlying hardware and/or operating system.
The update status of installed software is monitored automatically; available updates are installed by dedicated personnel.
The repository will check at least twice a year:
- whether major updates of software components are available and necessary
- whether software components are still updated / actively developed or abandoned and switching to alternatives should be considered
- whether providing access to the data/metadata stored in the repository should be made available via additional interfaces (or updated versions of existing ones)