Purpose of this page
This document specifies the preservation policy at the repository. If you have any questions, send us an email
to clarin@saw-leipzig.de.
Data Storage
Data is stored on a RAID system and all contents are regularly copied to separate hardware. The deterioration of
storage media is monitored continously via Icinga probes (e.g. using S.M.A.R.T. – Self-Monitoring, Analysis and
Reporting Technology – data). Access to the archive system is limited to a small group of people. Write access
to data storage systems running on the archive machine is limited to the machine itself and dedicated personnel.
Data Preservation
Depositors are encouraged to use standardized formats (UTF-8 encoding, documented XML formats, ...) when
submitting their data for deposition. In case custom / proprietary formats are used, an exhaustive documentation
has to be provided. The repository will check twice a year:
- whether the data that was stored is unchanged (via checksums)
- whether an update of the deposited data due to the obsolescence of the used format is necessary
- whether an update of the available metadata is necessary
In case an update is necessary, the depositor is contacted and asked to provide an updated version. In some
cases the center may also decide to do the update on its own. In this case the original depositor will be
informed (in case the person/institution is still available).
Backup
Two independent backup strategies are used to strengthen redundancies and enable separate fallback recoveries:
- backups of the relevant virtual machines are created via dedicated mechanisms of our virtualisation
solution every week, always preserving the five most recent versions.
- backups of the underlying OCFL repository are created
every month, always storing the three most recent versions. From the contents of this OCFL repository,
the entirety of the repository can also be recreated.
Besides those continuous backup versions, semiannual backups are preserved as long-term versions for both
strategies, to enable the option of restorations of older snapshots.
Backups are held on hardware that is situated on locations that are separated from the live system and is
monitored for deterioration.
Software Stack Preservation
The center utilizes widely used open source software stacks (Linux, Fedora Repository, Apache Tomcat, MariaDB)
to facilitate all repository services (data storage, OAI-PMH, ...). This maximizes the probability of long term
support (updates, security fixes) for the tools being used and improves the ability to run installations of
these software stacks independent from the underlying hardware and/or operating system.
The update status of installed software is monitored automatically; available updates are installed by dedicated
personnel.
The repository will check at least twice a year:
- whether major updates of software components are available and necessary
- whether software components are still updated / actively developed or abandoned and switching to
alternatives should be considered
- whether providing access to the data/metadata stored in the repository should be made available via
additional interfaces (or updated versions of existing ones)