Wortschatz Leipzig

Deutscher Wortschatz / Leipzig Corpora Collection

The Leipzig Corpora Collection collects and processes documents available from the Internet. The results are corpus-based dictionaries for more than 250 languages, in which for every word statistical information, example sentences, and links to related words are provided. CLARIN-D Centre Leipzig makes available corpora of the Leipzig Corpora Collection / Project 'Deutscher Wortschatz', based on newspaper, Wikipedia and Web text. The data can be discovered and accessed using CLARIN's central search facilities Virtual Language Observatory and Federated Content Search. Furthermore, several webservices are provided to allow for the execution of different NLP-relevant tasks in the CLARIN research infrastructure.


European Research Infrastructure Consortium

CLARIN makes digital language resources available to scholars, researchers, students and citizen-scientists from all disciplines, especially in the humanities and social sciences, through single sign-on access. CLARIN offers long-term solutions and technology services for deploying, connecting, analyzing and sustaining digital language data and tools. CLARIN-D Centre Leipzig is part of CLARIN and closely collaborates with the respective ERIC, especially on a technical level. The Leipzig Centre is involved in the development of the central search systems Virtual Language Observatory (VLO) and Federated Content Search (FCS) and operates the monitoring of the CLARIN-infrastructure.

Serbski institut / Sorbian Institute

The Serbski institut / Sorbian Institute is a research facility based in Bautzen with a branch office in Cottbus. The Sorbian Institute conducts research on the language, history and culture of the Sorbs (Wends) in Upper and Lower Lusatia. It collects and archives the necessary resources, prepares them for research and makes them available to the public. In addition, the Institute’s interdisciplinary research focuses on the current situation, the specifics and the comparison of other small languages and cultures in Europe. The three departments, cultural studies, linguistics and regional development and protection of minorities, organise the core work of the institute at both sites. Each department develops its own specific tasks in an interdisciplinary perspective and with a wide range of cooperation partners.

Verba Alpina

LMU München

The project selectively and analytically indexes the Alpine space, which is highly fragmented in terms of language and dialects, in its cultural and linguistic cohesiveness. It thereby overcomes the traditional restriction to current political units (national states). The selected subject areas concern nature, cultural history and the cultural present. CLARIN-D Centre Leipzig collaborates with Verba Alpina to offer access to data and services developed in the project.

SFB 1199

Processes of Spatialization under the Global Condition

The Collaborative Research Centre (SFB) addresses what characterizes spaces made by people, how they relate to one another, and whether resulting spatial orders are becoming increasingly complex within the context of globalization processes. SFB1199 and CLARIN-D Centre Leipzig collaborate to make the resources developed in the SFB available to the scientific communities.

Open Source International Arabic News (OSIAN)

The Open Source International Arabic News (OSIAN) corpus has been collected from international Arabic news websites like CNN, DW, RT, Aljazeera, among others. The OSIAN corpus consits of 477,556 articles comprising 2,861,944 sentences and roughly 157 million words. It is encoded in XML, each article is annotated with metadata information, which gives the information about its web location and the date of its extraction. Moreover, Each word is annotated with lemma and part-of-speech. CLARIN-D Centre Leipzig supported the authors in crawling and processing the corpora and offers free access to the data through their repository.

Canonical Text Services (CTS)

Canonical Text Services protocol defines services for identifying and retrieving fragments of texts. CLARIN-D Centre Leipzig hosts an instance of a CTS repository containing several digital text collections.

Digital Muqtabas

The Digital Muqtabas is the digital edition (in format TEI XML) of the Arabic monthly journal al-Muqtabas, published by Muḥammad Kurd ʿAlī in Cairo and Damascus between 1906 and 1917/18. The Leipzig Repository is hosting the Digital Muqtabas on its CTS-server and makes data and metadata available to interested researchers.