Dr. Jozina Vander Klok
Institut für deutsche Sprache und Linguistik
Projekte
A06
Modeling register variation across languages
Kontakt
jozina.vander.klok@hu-berlin.de
Veröffentlichungen und Präsentationen
Lehmann, Nico; Mortezapour, Vahid; Vander Klok, Jozina; Farokhnejad, Zahra; Müller, David; Verhoeven, Elisabeth; Adli, Aria (2025) Lang*Reg corpus: Documenting intra-speaker variation across languages and registers In: Language Documentation & Conservation [ViVo] Adli, Aria; Verhoeven, Elisabeth; Lehmann, Nico; Mortezapour, Vahid; Vander Klok, Jozina (2024) Lang*Reg: A multi-lingual corpus of intra-speaker variation across situations [DOI] [ViVo] The Lang*Reg corpus records intra-speaker variation across languages and different situational-functional contexts, presumed to result in different registers. It has been prepared in the SFB1412 Register with data collections taking place in 2021-2022 for the following languages included in this version: German, Persian, Kurdish, Javanese. The data sets for each language comprise the speech of the same language users in a variety of spoken conversations and one written interaction. A minimum of 12 participants per language traversed a course of 6 situations in which they were asked to produce language in three types of activities: telling a story to a friend, talking freely with various interlocutors (friend, stranger, taxi driver) and engaging in an interview with a (university) professor. Moreover, our design included the storytelling in two modes, which allows for the comparison between spoken and written modes of the same language user.
Lang*Reg has a basic syntactic segmentation (one matrix clause and all its dependent clauses per segment). v0.2.0 includes the data sets with transcriptions, normalizations and tokens for each language as well as additional language-specific annotations such as glosses and syntactic annotations. We prepared each data set also for use with the browser-based search and visualization architecture ANNIS. For further language-specific morpho-syntactic and sociolinguistic annotations, refer to the respective data set description. For an overview of all data set characteristics, please see the corpus documentation in each data set.
Lüdeling, Anke; Szucsich, Luka; Zeige, Lars Erik; Adli, Aria; Alexiadou, Artemis; Belz, Malte; Bouzouita, Miriam; Adli, Aria; Dreyer, Malte; Egg, Markus; Feulner, Anna Helene; Fleischer, Jürg; Gagarina, Natalia; Hirschmann , Hagen; Jannedy, Stefanie; Knoeferle, Pia; Krause, Thomas; Kutscher, Silvia; Liu, Mingya; Lütke, Beate; Machicao y Priemer, Antonio; Meyer, Roland; Mooshammer, Christine; Müller, Stefan; Sauerland, Uli; Sauermann, Antje; Schmitt, Viola; Schumacher, Nicole; Serova, Dina; Solt, Stephanie; Vander Klok, Jozina; Verhoeven, Elisabeth; Waltereit, Richard; Weirich, Melanie (2024) Register: Language Users’ Knowledge of Situational-Functional Variation. Frame text of the Second Phase Proposal for the CRC 1412 [DOI] [ViVo] Adli, Aria; Verhoeven, Elisabeth; Lehmann, Nico; Mortezapour, Vahid; Vander Klok, Jozina (2023) Lang*Reg: A multi-lingual corpus of intra-individual variation across situations [DOI] [ViVo] Language: German, Persian, Yucatec Maya, Kurdish, Javanese
Size: 36 hours
Description: same speakers varied by mode, acquaintance, professionalism, and expertise
Features: transcription, syntactic segmentation, normalization, token, glossing or POS-tags, some syntax
Access: transcription or annotation in progress; CC-BY-NC-NDVander Klok, Jozina; Lehmann, Nico () How People-referring Expressions in Javanese Differ (or not) Across Registers In: International Symposium on the Languages of Java [ViVo]