Resources (re-)used by the CRC

Several types of data beyond traditional publications are used and created throughout the CRC. We are committed to publish our research data and software wherever possible under an Open Access license.
We are also listing here research data and research software produced by others to appropriately acknowledge their work. By sharing our resources, we want to enable reproducibility and re-use by the research community.

Filter resources:  

» Corpora
» Experimental research data
» Documents/Other
» Software


A Grammatically Annotated Corpus of the Old Latvian Postil of Georg Mancelius

Type: Corpus
Status: created
Details: [ViVo] [DOI] [URL]

This grammatically annoted corpus aims at facilitating linguistic research on Old Latvian based on the Postil of Georg Mancelius from the year 1654. The corpus is divided into two subcorpora, "pericopes" and "homilies" to make register related research easier.

The pericopes were annotated using SIL Toolbox and converted to be used in the search-tool ANNIS using the conversion tool PEPPER.

Three formats are provided in this release: 1. the Toolbox files, 2. the transitional Excel files and 3. a zipped folder to be imported into ANNIS.

Created in the project B02, Emergence and change of registers: The case of Lithuanian and Latvian of the CRC 1412 "Register" (funded by the Deutsche Forschungsgemeinschaft: DFG, German Research Foundation: 416591334).

Used by: B02

BeDiaCo - Berlin Dialogue Corpus

Type: Corpus
Status: re-used
Details: [ViVo] [URL]

The corpus consists of acoustic recordings of spontaneous dialogues of German native speakers with both task-free and task-based parts and additional read word lists.
Malte Belz, Christine Mooshammer, Alina Zöllner, and Lea‑Sophie Adam. Berlin Dialogue Corpus
(BeDiaCo): Version 2, 2021.

Used by: C06

BeMeCo v1

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
lina Zöllner, Christine Mooshammer, and Silke Hamann. Berlin Menutask Corpus (BeMeCo): Version
1, 2021. URL‑
Used by: C06

BiNoKo V. 1.0 Birgitta-Notker-Korpus

Type: Corpus
Status: created
Details: [ViVo] [DOI] [URL]

The Birgitta-Notker-Korpus (BiNoKo) is a resource dedicated to comparative research on historical registers. The corpus comprises two sources: The Old High German Book of Psalms by Notker III of Saint Gall and the Old Swedish Revelations of Birgitta of Sweden. The subcorpus of Birgitta's Revelations and the subcorpus of Notker's Psalms are available as separate zip files. The corpus format is ANNIS. For local installation, use ANNIS Desktop. The documentation for ANNIS can be found here:

The guidelines (see 'related identifiers') are published in REALIS 2/3 and include information about the corpus design, annotation layers, meta data, and annotation principles.

Used by: B04

Bislama Spoken Corpus

Type: Corpus
Status: created
Details: [ViVo]

Used by: A02

Bonner Totenbuchprojekt

Type: Corpus
Status: used
Details: [ViVo] [URL]
Das Totenbuch bildete im Alten Ägypten über 1500 Jahre hinweg einen Wissensschatz für den Verstorbenen, der ihm in Schriftform mit ins Grab gegeben wurde.
Used by: B03

British National Corpus (BNC)

Type: Corpus
Status: used
Details: [ViVo] [URL]
The British National Corpus (BNC) was originally created by Oxford University press in the 1980s - early 1990s, and it contains 100 million words of text from a wide range of genres (e.g. spoken, fiction, magazines, newspapers, and academic).
Used by: A05


Type: Corpus
Status: used
Details: [ViVo] [URL]

Corpus of Ancient Egyptian Multimodal Communication. 

CAEMmCom – Corpus of Ancient Egyptian Multimodal Communication: Getting Started [pdf]

2020 Multimodale graphische Kommunikation im pharaonischen Ägypten: Entwurf einer Analysemethode, Lingua Aegyptia 28: 81-116.
2020 (mit Rebecca Döhl und Jens-Martin Loebel) CaeMmCom – Corpus altaegyptischer multimodaler Communication. Der Aufbau einer multimodalen Datensammlung altägyptischer Kommunikate, Zeitschrift für digitale Geisteswissenschaft, [open access].

Used by: B03


Type: Corpus
Status: created
Details: [ViVo] [URL]
Language: Yucatec, Maya
Size: 159.00 tokens
Description: natural language production (spoken) and elicited data
Features: morpheme, glosses, translations, comments
Access: CC-BY-NC-ND

The Collective Corpus of Yucatec Maya (CoCoYum) is a collection of data from various researchers about the Yucatec Mayan language. It contains transcriptions of recordings (e.g. story telling, dialogue, public events), written data as well as elicited data. The corpus will be enlarged in time with fresh data collections and when further researchers add their data to the corpus.
Used by: A06

Corpus of Non-Native Addressee Register (CoNNAR). Version 1

Type: Corpus
Status: created
Details: [ViVo] [URL]

Used by: C06

Czech corpus Koditex

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
A synchronic, representative and reference 9‑million‑word corpus (excl. punctuation)
compiled for the purpose of conducting a multidimensional analysis (MDA) of Czech.

Zasina, A. J. – Lukeš, D. – Komrsková, Z. – Poukarová, P. – Řehořková, A.: Koditex: A corpus of diversified texts. Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague 2018. Available at WWW:
Used by: A03

DNam corpus + DNam Wenker corpus

Type: Corpus
Status: used
Details: [ViVo] [DOI] [URL]
The corpus "German in Namibia" („Deutsch in Namibia“ –DNam) was created in the period 2016-2021, in the DFG project „NamDeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias“ ("NamDeutsch: The Dynamics of German in Namibia's Multilingual Context" – WI 2155/9-1 and SI 750/4-1, directed by Heike Wiese and Horst Simon in cooperation with Marianne Zappen-Thomson) at the University of Potsdam (until 2019) and at HU Berlin (since 2019), at the FU Berlin and at UNAM Windhoek.

Article PDF
Used by: C07


Type: Corpus
Status: re-used
Details: [ViVo] [URL]
Access: (free access for academic use)
Engines: NoSkE and RStudio Server
Used by: A04, B01


Type: Corpus
Details: [ViVo]

Used by: B01

Eye-Tracking Corpus

Type: Corpus
Status: created
Details: [ViVo]

Used by: C03

FOLK excerpt

Type: Corpus
Status: used
Details: [ViVo]
Language: German
Size: 194,716 tokens
Description: conversations in various situations
Features: rich metadata lemma, POS, speech unit segmentation, some dependencies
Access: internal
Used by: A06

Falko Corpus

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
L1 and L2- authored argumentative essays collected in a controlled setting.
Further information about the Falko-project:
Used by: C04

GeWISS excerpt

Type: Corpus
Status: used
Details: [ViVo] [URL]

GeWiss is a research project in spoken academic language. It provides a multilingual (German/English/Polish/Italian) corpus of audio recordings and transcriptions of academic communications, as an empirical foundation for comparative research.

To this end, the GeWiss corpus focusses on two main genres of spoken adademic language:

  • talks including discussions, and
  • oral exams,

and it explicitly distinguishes between L1 and L2 subcorpora. The corpus is enlarged and developed continuously.

Used by: A06


Type: Corpus
Status: used
Details: [ViVo] [URL]
GermaNet ist ein lexikalisch-semantisches Wortnetz, das deutsche Nomina, Verben und Adjektive semantisch zueinander in Beziehung setzt, indem es lexikalische Einheiten, die dasselbe Konzept ausdrücken, in Synsets zusammenfasst und semantische Relationen zwischen diesen Synsets definiert. GermaNet hat viel mit dem Englischen WordNet®  gemeinsam und kann als ein Online-Thesaurus oder als eine Lightweight-Ontologie betrachtet werden.
Used by: A01

GermaParl Corpus of Plenary Protocols

Type: Corpus
Status: used
Details: [ViVo] [DOI] [URL]
The GermaParl Corpus has been prepared in the PolMine Project ( and comprises all protocols of plenary sessions in the German Bundestag (1996 - 2016). This version of the corpus is based on plain text documents issued by the German Bundestag. For a period between 2008 and 2010, txt files are not available. To fill the gap, pdf documents were processed. As part of the corpus preparation pipeline, the data has been linguistically annotated (using the TreeTagger) and imported into the Corpus Workbench (CWB). See the GermaParl documentation website ( for further information.
Used by: A01

Icelandic Parsed Historical Corpus (IcePaHC)

Type: Corpus
Status: used
Details: [ViVo] [URL]
The Icelandic Parsed Historical Corpus (IcePaHC) is a project that has built a diachronic corpus with samples of written Icelandic from all periods from the 12th century to modern times. The corpus is mostly compatible with the corpora of historical English developed at UPenn. For historical texts spelling is modernized for phonological change.
Used by: A05

Kobalt_RST: Die Annotation von rhetorischen Strukturen im Kobalt-DaF-Korpus

Type: Corpus
Status: created
Details: [ViVo] [DOI]

Das Kobalt-DaF-Korpus ist ein systematisch erhobenes und tief annotiertes Deutschlernerkorpus, welches 80 deutschsprachige argumentative Texte von deutschen L1-Sprecher:innen und Deutschlerner:innen unterschiedlicher L1 enthält. Dieses Repositorium stellt eine zusätzliche Annotation des Kobalt-DaF-Korpus bzgl. rhetorischer Strukturen frei zur Verfügung. Folgende Informationen sind hier zu finden: (1) Die Darstellung des Annotationsprozesses (Annotationsframework, -richtlinie, und -verfahren). (2) Die annotierten rs3-Dateien.

*Versionshinweise: Bislang sind ausschließlich die Texte der chinesischen Deutschlerner:innen und der deutschen L1-Sprecher:innen (insgesamt 40 Texte) verfügbar. Die Annotation der übrigen Texte folgt demnächst. 

*Die Annotationsarbeit wurde gefördert durch das Chinese Scholarship Council und die Deutsche Forschungsgemeinschaft (DFG) – SFB 1412, 416591334.

Used by: C04

Lang*Reg: A multi-lingual corpus of intra-individual variation across situations

Type: Corpus
Status: created
Details: [ViVo] [DOI]
Language: German, Persian, Yucatec Maya, Kurdish, Javanese
Size: 36 hours
Description: same speakers varied by mode, acquaintance, professionalism, and expertise
Features: transcription, syntactic segmentation, normalization, token, glossing or POS-tags, some syntax
Access: transcription or annotation in progress; CC-BY-NC-ND
Used by: A06

Lithuanian Corpus

Type: Corpus
Status: re-used
Details: [ViVo]
The Old Lithuanian corpus is the postil of Jonas Bretkūnas published as a facsimile edition by Ona Aleknavičienė (Jono Bretkūno Postilė, parengė Ona Aleknavičienė. Vilnius: Lietuvių kalbos institutas, 2005. ISBN 9986-668-96-4).
The text files used in the research were generated from the facsimiles.
Used by: B02

Morisien Spoken Corpus

Type: Corpus
Status: created
Details: [ViVo]

Used by: A02

Online production experiment on Imprecision

Type: Corpus
Status: created
Details: [ViVo]

Used by: A05

Penn-Helsinki Corpus of Early Modern English

Type: Corpus
Details: [ViVo]

Used by: B01

Penn-Helsinki Corpus of Middle English

Type: Corpus
Details: [ViVo]

Used by: B01

Potsdam Commentary Corpus

Type: Corpus
Status: used
Details: [ViVo] [URL]
The Potsdam Commentary Corpus (PCC) is a corpus of 220 German newspaper commentaries (2.900 sentences, 44.000 tokens) taken from the online issues of the Märkische Allgemeine Zeitung (MAZ subcorpus) and Tagesspiegel (ProCon subcorpus) and is annotated with a range of different types of linguistic information.

[Bourgonje & Stede 2020] Bourgonje, Peter and Stede, Manfred (2020). The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing Proc. of the Language Resources and Evaluation Conference (LREC), Marseille.
Used by: A01

PreCOXX25: Register-annotated German webcorpus

Type: Corpus
Status: re-used
Details: [ViVo]
Access: (free access for academic use)
Engines: NoSkE and RStudio Server
Used by: A04

Prestudy ”situational context” in Czech

Type: Corpus
Status: re-used
Details: [ViVo]
Ibex farm project (Zehr and Schwarz, 2018)
Used by: A03

Ramsès Project

Type: Corpus
Status: used
Details: [ViVo] [URL]
Morphologically annotated, lemmatized text corpus of Late Egyptian texts (c. 1550 – 1000 BCE) by the University of Liège (
Used by: B03


Type: Corpus
Status: created
Details: [ViVo]
The ReFlexAE corpus (Register Flexibility in Academic Education) is a longitudinal corpus of written grammatical explanations built to investigate late register development in the context of higher education. The data are collected through a longitudinal written elicitation study with German L1 students enrolled in programs for primary school teachers. According to a repeated measures design, the longitudinal written study elicits data at three time points: before and after linguistic courses and before graduation. Each participant completes the same test battery comprising four written elicitation tasks, a grammar test, a demographic questionnaire and standard psychological questionnaires assessing personal traits and motivation for learning.
Used by: C05

Russian National Corpus

Type: Corpus
Status: used
Details: [ViVo] [URL]

The Russian National Corpus is a representative collection of texts in Russian, counting about 1,5 bln tokens and completed with linguistic annotation and search tools

Used by: A03

SENIE Corpus

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
Latvian texts provided by the SENIE project of the University
of Latvia.

Andronova, Everita (2007). The Corpus of Early Written Latvian: current state and future tasks. Proceedings of the Corpus Linguistics Conference. CL2007. University of Birmingham, UK. 27-30 July 2007. Edited by Matthew Davies, Paul Rayson, Susan Hunston, Pernilla Danielsson. ISSN 1747-9398. (
Used by: B02

Simulated Zoom-Corpus

Type: Corpus
Status: created
Details: [ViVo]
Simulated zoom interaction with choreographed videos (variation of interlocutor persona [formality] & variation of topic / atstakeness) Simultaneous laboratory recordings of audio and video.
Used by: C02

The grammatically annotated corpus of the pericopes of the Old Lithuanian Postil of Jonas Bretkūnas

Type: Corpus
Status: created
Details: [ViVo] [DOI] [URL]

This grammatically annoted corpus aims at facilitating linguistic research on Old Lithuanian based on the Postil of Jonas Bretkūnas from the year 1591. The corpus is divided into two subcorpora, "pericopes" and "homilies" to make register related research easier.

The pericopes were annotated using SIL Toolbox and converted to be used in the search-tool ANNIS using the conversion tool PEPPER.

Three formats are provided in this release: 1. the Toolbox files, 2. the transitional Excel files and 3. a zipped folder to be imported into ANNIS.

Created in the project B02, Emergence and change of registers: The case of Lithuanian and Latvian of the CRC 1412 "Register" (funded by the Deutsche Forschungsgemeinschaft: DFG, German Research Foundation: 416591334).

Used by: B02

Thesaurus Linguae Aegyptiae (TLA)

Type: Corpus
Status: used
Details: [ViVo] [URL]
Digital text corpus of ancient Egyptian and Demotic language, morphosyntactic annotation & lemmatized. Largest corpus of Egyptian texts of different types and times (c. 2500 BCE – 450 AD)
Used by: B03

WroDiaCo v2

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
Sarah Wesolek, Malte Belz, and Christine Mooshammer. Wroclaw Dialogue Corpus (WroDiaCo):
Version 2, 2020. URL‑
Used by: C06

sgs corpus

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
Language: Persian
Size: 26 h
Description: free spoken dialogues with interviewer on fictive crime scenario
Features: social metadata, syntax
Access: internal
Used by: A06


Big Five Inventory (BFI-10)

Type: Document
Status: used
Details: [ViVo] [DOI] [URL]
The BFI-10 is a highly economic scale that allows the personality to be recorded according to the five-factor model. The scale is easy to administer in different survey modes. The empirical evidence of the validation studies suggests that the BFI-10 allows not only an economic but also a reliable and valid recording of the Big Five. The BFI-10 allows a rough measurement of the individual personality structure of adult interviewees from the German-speaking general population.

Rammstedt, B., Kemper, C. J., Klein, M. C., Beierlein, C., & Kovaleva, A. (2014). Big Five Inventory (BFI-10).
Zusammenstellung sozialwissenschaftlicher Items und Skalen (ZIS).

Used by: A07, C05

Depressions-Angst-Stress Skalen (DASS 21)

Type: Document
Status: used
Details: [ViVo] [DOI] [URL]
Die DASS eignen sich zur Erfassung von Belastungen durch Depression, Angst und Stress ohne konfundierende somatische Faktoren, wie beispielsweise chronische Schmerzprobleme. Die Skalen sind allerdings auch für Klienten ohne somatische Beschwerden brauchbar. Die DASS sind in der Kurzversion mit 21 Items sowie der Langversion mit 42 Items jeweils auf Deutsch und Englisch verfügbar.
© Lovibond, P.F., Lovibond, S.H., Nilges, P. & Essau, C.
Used by: A07

Epidemic - Pandemic Impacts Inventory (EPII)

Type: Document
Status: used
Details: [ViVo] [URL]
The EPII is a tool designed to assess tangible impacts of epidemics and pandemics across personal and social life domains.
Used by: A07

Interpersonal Reactivity Index (IRI)

Type: Document
Status: used
Details: [ViVo] [DOI]
Davis, M. (1983). Measuring individual differences in empathy: Evidence for a multidimensional approach. Journal of Personality and Social Psychology, 44, 1114–1126.
Used by: C05

Skalen zur motivationalen Regulation beim Lernen im Studium (SMR-LS)

Type: Document
Status: used
Details: [ViVo] [DOI] [URL]

Used by: C05

Experimental research data

Experimental data: Addressee identification study

Type: Experimental research data
Status: created
Details: [ViVo]

Rating data (on a 9-point scale) of the probable addobressee of spoken texts in three conditions: a) entirely Standard German, b) with Namibian-specific lexical, and c) with non-standard grammatical features. The open-guise method was used to collect the data.

Participants:adults and adolescents in Namibia
Used by: C07

Experimental data: Newspaper correction study

Type: Experimental research data
Status: created
Details: [ViVo]

Data containing corrections of Namibian-German vs. Standard German features (lexical, morpho-syntactic, and grammatical) presented in a written mock newspaper article.
Participants: Adults and adolescents in Namibia and Germany

Used by: C07

Experimental data: speaker evaluation study

Type: Experimental research data
Status: created
Details: [ViVo]

Ratings (on a 9-point scale) of social meaning (competence and solidarity assessments) and inferences (origin, place of residence) regarding speakers of spoken texts in three conditions: a) entirely Standard German, b) with Namibian-specific lexical, and c) non-standard grammatical features, collected using in the open-guise method.
Participants: adults and adolescents in Namibia

Used by: C07


Stanford Log-linear Part-Of-Speech Tagger

Type: Software publication
Status: used
Details: [ViVo] [URL]

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech taggers described in these papers (if citing just one paper, cite the 2003 one):

Kristina Toutanova and Christopher D. Manning. 2000. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70.

Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.
Used by: A01, INF


Type: Software publication
Status: used
Details: [ViVo] [DOI] [URL]

A web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.

Available from:
Cite as: Krause, Thomas & Zeldes, Amir (2016): ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31).

Used by: A03, A06, B04, C04, C06, INF


Type: Software publication
Status: used
Details: [ViVo] [URL]
A freeware corpus analysis toolkit for concordancing and text analysis.

Used by: B02


Type: Software publication
Details: [ViVo]

Used by: B01


Type: Software publication
Status: used
Details: [ViVo] [DOI] [URL]
ELAN is computer software, a professional tool to manually and semi-automatically annotate and transcribe audio or video recordings. It has a tier-based data model that supports multi-level, multi-participant annotation of time-based media.

Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, The Netherlands

Lausberg, H., & Sloetjes, H. (2009). Coding gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, Instruments, & Computers, 41(3), 841-849. doi:10.3758/BRM.41.3.841.
Used by: A06


Type: Software publication
Status: used
Details: [ViVo] [URL]
EXMARaLDA is a system for working with oral corpora on a computer. It consists of a transcription and annotation tool (Partitur-Editor), a tool for managing corpora (Corpus-Manager) and a query and analysis tool (EXAKT). Further parts of EXMARaLDA are FOLKER and OrthoNormal, which were both developed in and for the FOLK project.
Schmidt T and Wörner K (2014), „EXMARaLDA“, In Handbook on Corpus Phonology, pp. 402-419. Oxford University Press.
Used by: A06, C05

Field Linguist's Toolbox

Type: Software publication
Status: used
Details: [ViVo] [URL]
Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data. Toolbox is free to download and use.
Used by: A06, B02


Type: Software publication
Status: used
Details: [ViVo] [URL]
In this paper, we present, SALT, a framework for mapping heterogeneous linguistic formats from one another based on a model-based approach, i.e. independently of the actual formats in which the corresponding linguistic data is being expressed. While we describe the underlying concept of this framework, we identify how it echoes past ongoing standardisation activities within ISO committee TC 37/SC 4, and in particular, the possible conceptual equivalences with ISO CD 24612 (LAF) combined with ISO 24610-1 (FSR), as well as the possible role of the central data category registry (ISOCat), currently under deployment. We thus show the adequacy of our methodology and its capacity to integrate a wide range of possible linguistic annotation models.
Used by: A03


Type: Software publication
Status: used
Details: [ViVo] [URL]
We introduce INCEpTION, a new annotation platform for tasks including interactive and semantic annotation (e.g., concept linking, fact linking, knowledge base population, semantic frame annotation). These tasks are very time consuming and demanding for annotators, especially when knowledge bases are used. We address these issues by developing an annotation platform that incorporates machine learning capabilities which actively assist and guide annotators. The platform is both generic and modular. It targets a range of research domains in need of semantic annotation, such as digital humanities, bioinformatics, or linguistics. INCEpTION is publicly available as open-source software.

INF is hosting INCEpTION at (Intranet only)
Used by: A01, A06, B04, INF

PennController for Internet Based Experiments (IBEX)

Type: Software publication
Status: used
Details: [ViVo] [URL]
PennController for Internet Based Experiments (“PennController” or “PCIbex” for short) provides the tools to build and run online experiments, from familiar paradigms like self-paced reading to completely custom-designed paradigms.
Used by: A03, A06, A07, C03


Type: Software publication
Status: used
Details: [ViVo] [URL]

A highly extensible platform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.

Available from:
Cite as: F. Zipser & L. Romary (2010). A model oriented approach to the mapping of annotation formats using standards. In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010. Malta. URL:

Used by: A01, A06, INF

RStudio Server

Type: Software publication
Status: used
Details: [ViVo] [URL]

RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

Used by: A04, A07, C03, INF


Type: Software publication
Status: used
Details: [ViVo] [URL]
Textstat is an easy to use library to calculate statistics from text. It helps determine readability, complexity, and grade level.
Used by: B02


Type: Software publication
Status: created
Details: [ViVo] [URL]

This project provides tools to read and write files in the Toolbox format.

Used by: B02

Tools rPraat and mPraat - Interfacing Phonetic Analyses with Signal Processing

Type: Software publication
Status: used
Details: [ViVo] [DOI]
The paper presents the rPraat package for R/mPraat toolbox for Matlab which constitutes an interface between the most popular software for phonetic analyses, Praat, and the two more general programmes. The package adds on to the functionality of Praat, it is shown to be superior in terms of processing speed to other tools, while maintaining the interconnection with the data structure of R and Matlab, which provides a wide range of subsequent processing possibilities. The use of the proposed tool is demonstrated on a comparison of real speech data with synthetic speech generated by means of dynamic unit selection.
Used by: A03, A06, C06


Type: Software publication
Status: used
Details: [ViVo] [URL]
Raphael Winkelmann, Klaus Jaensch, Steve Cassidy, and Jonathan Harrington. emuR: Main Package of the EMU Speech Database Management System, 2018.
Used by: C06, INF

flairNLP / flair

Type: Software publication
Status: used
Details: [ViVo] [URL]
A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends.

Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual String Embeddings for Sequence Labeling. In Proceedings of the 27th International Conference on Computational Linguistics, pages 1638–1649, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Used by: INF, MGK, Z


Type: Software publication
Status: used
Details: [ViVo] [URL]
spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.
Used by: INF