Several types of data beyond traditional publications are used and created throughout the CRC. We are committed to publish our research data and software wherever possible under an Open Access license. By sharing our resources, we want to enable reproducibility and re-use by the research community.
Corpora
BeDiaCo – Berlin Dialogue Corpus
The corpus consists of acoustic recordings of spontaneous dialogues of German native speakers with both task-free and task-based parts and additional read word lists.
Available from: https://rs.cms.hu-berlin.de/phon
Documentation: http://doi.org/10.5281/zenodo.4593351
Cite as: Malte Belz & Christine Mooshammer (2021): Berlin Dialogue Corpus (BeDiaCo). Version 2.0. Humboldt-Universität zu Berlin: DOI: 10.5281/zenodo.4593351
Software
ANNIS
A web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.
Available from: https://corpus-tools.org/annis/
Documentation: https://corpus-tools.org/annis/documentation.html
Cite as: Krause, Thomas & Zeldes, Amir (2016): ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). http://dsh.oxfordjournals.org/content/31/1/118
Pepper
A highly extensible platform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.
Available from: https://corpus-tools.org/pepper/
Documentation: https://corpus-tools.org/pepper/userGuide.html
Cite as: F. Zipser & L. Romary (2010). A model oriented approach to the mapping of annotation formats using standards. In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010. Malta. URL: http://hal.archives-ouvertes.fr/inria-00527799/en/