Several types of data beyond traditional publications are used and created throughout the CRC. We are committed to publish our research data and software wherever possible under an Open Access license. By sharing our resources, we want to enable reproducibility and re-use by the research community.
BeDiaCo – Berlin Dialogue Corpus
The corpus consists of acoustic recordings of spontaneous dialogues of German native speakers with both task-free and task-based parts and additional read word lists.
Available from: https://rs.cms.hu-berlin.de/phon
Cite as: Malte Belz & Christine Mooshammer (2020): Berlin Dialogue Corpus (BeDiaCo). Version 1.0. Humboldt-Universität zu Berlin: DOI: 10.18452/21361
A web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.
Available from: https://corpus-tools.org/annis/
Cite as: Krause, Thomas & Zeldes, Amir (2016): ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). http://dsh.oxfordjournals.org/content/31/1/118
A highly extensible platform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.
Available from: https://corpus-tools.org/pepper/
Cite as: F. Zipser & L. Romary (2010). A model oriented approach to the mapping of annotation formats using standards. In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010. Malta. URL: http://hal.archives-ouvertes.fr/inria-00527799/en/