A06
Disentangling cross-linguistic and language-specific aspects of register variation

Fragestellung

Das Projekt hat zum Ziel, besser zu verstehen, wie Registerwissen mit grammatikalischen Aspekten des Sprachwissens zusammenhängt. Dies geschieht auf Basis einer vergleichenden Sichtweise und der Behandlung von Aspekten der universellen und sprachspezifischen Natur der Registervariation. Wir werden unsere Hypothesen testen, indem wir drei typologisch unterschiedliche Sprachen (Persisch, Deutsch, Yucatec Maya) mit sehr unterschiedlichen Registerunterscheidungen durch die parallele Anwendung derselben Methoden direkt vergleichen.

Die Sprachen unterscheiden sich stark in der Registervielfalt. Einige Sprachen (Persisch) zeigen auffälligere Unterschiede zwischen den Registern als andere (Deutsch), obwohl beide sowohl mündlich als auch schriftlich verwendet werden. Wieder andere Sprachen (Yucatec Maya) werden hauptsächlich für die mündliche Kommunikation verwendet, wobei der schriftliche Gebrauch erst am Anfang steht.

Forschungsziele

Zunächst werden wir sprachübergreifende vs. sprachspezifische Eigenschaften von Registern untersuchen (Forschungsziel 1) und dabei der Frage nachgehen, welche Aspekte der syntaktischen Variation sprachübergreifend mit Registern assoziiert und welche Aspekte sprachspezifisch sind.

Zweitens betrachten wir den Einfluss von Unterschieden in der Registervielfalt und von normativen Aspekten auf sprachübergreifende Ähnlichkeiten und Unterschiede in der Registervariation (Forschungsziel 2).

Drittens werden wir uns auf syntaktische Phänomene konzentrieren, die mit der Kodierung von Informationsstruktur zusammenhängen und die aller Voraussicht nach registersensitiv sind, um die sprachspezifischen und die sprachübergreifenden Komponenten der Registervariation zu entflechten (Forschungsziel 3).

Die für diese drei Forschungsziele zu berücksichtigenden Phänomene umfassen (a) Wortfolgeoperationen, die die syntaktische Kompaktheit reduzieren, wie Rechts- und Linksdislokationen, und (b) die Wahl von referentiellen Ausdrücken als pronominal oder null. Genauer gesagt wird untersucht, i) ob es eine sprachübergreifende Assoziation von strukturellen Vorrichtungen gibt, die die syntaktische Kompaktheit reduzieren. Wir werden informelle spontane Sprache mit formaler Sprache und schriftlicher Sprache vergleichen, wobei wir erwarten, dass die bestehende Variation der Wortstellung deutlicher auf Positionen innerhalb der Satzgrenzen beschränkt wird. Wir werden ii) untersuchen, inwieweit sich die Variabilität in der Verwendung von referentiellen Ausdrücken durch Register innerhalb und zwischen den Sprachen unterscheidet.

Da sprachübergreifende Studien zur Registervariation immer noch selten sind, werden wir Methoden zur parallelen Untersuchung der Registervariation über Sprachen hinweg entwickeln, die sowohl die Sprachproduktion als auch die Wahrnehmung betreffen (Forschungsziel 4). Zunächst werden wir ein Lang*Reg-Korpus aufbauen, das auf geführter naturalistischer (spontaner) Sprachproduktion in verschiedenen Situationen basiert. In einem zweiten Schritt werden die Produktionsdaten durch Wahrnehmungsdaten ergänzt, die durch “gradient judgement” Studien und eine situative Klassifizierungsaufgabe zur Assoziation syntaktischer Varianten mit spezifischen situativen Kontexten gesammelt werden.

 

Mitarbeiter*innen

Leitung


Prof. Dr. Aria Adli

Romanisches Seminar
Universität zu Köln

aria.adli@uni-koeln.de

Mitarbeiter*innen


Vahid Mortezapour

Romanisches Seminar
Universität zu Köln

Alumni


Publications & Presentations

    Publications

    2023

  • Adli, Aria; Verhoeven, Elisabeth; Lehmann, Nico; Mortezapour, Vahid; Vander Klok, Jozina  (2023) Lang*Reg: A multi-lingual corpus of intra-individual variation across situations[DOI] [ViVo]
    Language: German, Persian, Yucatec Maya, Kurdish, Javanese
    Size: 36 hours
    Description: same speakers varied by mode, acquaintance, professionalism, and expertise
    Features: transcription, syntactic segmentation, normalization, token, glossing or POS-tags, some syntax
    Access: transcription or annotation in progress; CC-BY-NC-ND
  • Pescuma, Valentina Nicole; Serova, Dina; Lukassek, Julia; Sauermann, Antje; Schäfer, Roland; Adli, Aria; Bildhauer, Felix; Egg, Markus; Hülk, Kristina; Ito, Aine; Jannedy, Stefanie; Kordoni, Valia; Kühnast, Milena; Kutscher, Silvia; Lange, Robert; Lehmann, Nico; Liu, Mingya; Lütke, Beate; Maquate, Katja; Mooshammer, Christine; Mortezapour, Vahid; Müller, Stefan; Norde, Muriel; Pankratz, Elizabeth; Patarroyo, Angela Giovanna; Plesca, Ana-Maria; Ronderos, Camilo R.; Rotter, Stephanie; Sauerland, Uli; Schulte, Britta; Schüppenhauer, Gediminas; Sell, Bianca Maria; Solt, Stephanie; Terada, Megumi; Tsiapou, Dimitra; Verhoeven, Elisabeth; Weirich, Melanie; Wiese, Heike; Zaruba, Kathy; Zeige, Lars Erik; Lüdeling, Anke; Knoeferle, Pia; Schnelle, Gohar  (2023) Situating language register across the ages, languages, modalities, and cultural aspects: Evidence from complementary methods In:  Frontiers in Psychology [DOI] [ViVo]
    In the present review paper by members of the collaborative research center ‘Register: Language Users’ Knowledge of SituationalFunctional Variation’ (CRC 1412), we assess the pervasiveness of register phenomena across different time periods, languages, modalities, and cultures. We define ‘register’ as recurring variation in language use depending on the function of language and on the social situation. Informed by rich data, we aim to better understand and model the knowledge involved in situation- and function-based use of language register. In order to achieve this goal, we are using complementary methods and measures. In the review, we start by clarifying the concept of ‘register’, by reviewing the state of the art, and by setting out our methods and modeling goals. Against this background, we discuss three key challenges, two at the methodological level and one at the theoretical level: 1. To better uncover registers in text and spoken corpora, we propose changes to established analytical approaches. 2. To tease apart between-subject variability from the linguistic variability at issue (intra-individual situation based register variability), we use within-subject designs and the modeling of individuals’ social, language, and educational background. 3. We highlight a gap in cognitive modeling, viz. modeling the mental representations of register (processing), and present our first attempts at filling this gap. We argue that the targeted use of multiple complementary methods and measures supports investigating the pervasiveness of register phenomena and yields comprehensive insights into the cross-methodological robustness of register-related language variability. These comprehensive insights in turn provide a solid foundation for associated cognitive modeling.
  • Lehmann, Nico; Serova, Dina; Lukassek, Julia; Döring, Sophia; Goymann, Frank; Lüdeling, Anke; Akbari, Roodabeh  (2023) Guidelines for the annotation of parameters of narration. In:  REALIS: Register Aspects of Language in Situation [DOI] [ViVo]
    The present guidelines describe the annotation of narrative phenomena on the clause level, using a combination of ideas and methods from linguistics and lit- erary studies. The main categories marking the discourse strategy “narration” in stretches of text have been narrowed down to mediacy, i. e. involving a narrator, and sequentiality of events. This document specifies how to define mediacy, and in turn determine whether a narrator is present, as well as how to identify events and their sequential ordering. Lastly, a functional layer annotation is proposed which allows researchers to compare different types of narrative instances. This offers a basis for investigating a potential narrative register which is said to be important for many kinds of register studies.
  • Varaschin, Giuseppe; Culicover, Peter W.; Winkler, Susanne  (2023) In pursuit of Condition C: (Non-)coreference in grammar, discourse and processing In:  Information Structure and Discourse in Generative Grammar [ViVo]
  • 2022

  • Lehmann, Nico; Verhoeven, Elisabeth  (2022) Discourse-Independent Variation in V-Initial Constituent Order: The Yucatec Mayan Preverbal Domain Revisited In:  ProcLingEvi2020, Universität Tübingen [DOI] [ViVo]
    Contribution to Linguistic Evidence 2020
  • Adli, Aria  (2022) Coherence and implicational hierarchies in the speech of the very old In:  The Coherence of Linguistic Communities Orderly Heterogeneity and Social Meaning [ViVo]
  • 2021

  • Machicao y Priemer, Antonio; Müller, Stefan  (2021) NPs in German: Locality, theta roles, possessives, and genitive arguments In:  Glossa: a journal of general linguistics [DOI] [ViVo]
    Since Abney (1987), the DP-analysis has been the standard analysis for nominal complexes, but in the last decade, the NP analysis has experienced a revival. In this spirit, we provide an NP analysis for German nominal complexes in HPSG. Our analysis deals with the fact that relational nouns assign case and theta role to their arguments. We develop an analysis in line with selectional localism (Sag 2012: 149), accounting for the asymmetry between prenominal and postnominal genitives, as well as for the complementarity between higher arguments and possessives, providing a syntactic and semantic analysis.
  • 2020

  • Alexiadou, Artemis; Lüdeling, Anke; Adli, Aria; Donhauser, Karin; Dreyer, Malte; Egg, Markus; Feulner, Anna Helene; Gagarina, Natalia; Hock, Wolfgang; Jannedy, Stefanie; Kammerzell, Frank; Knoeferle, Pia; Krause, Thomas; Krifka, Manfred; Kutscher, Silvia; Lütke, Beate; McFadden, Thomas; Meyer, Roland; Mooshammer, Christine; Müller, Stefan; Maquate, Katja; Norde, Muriel; Sauerland, Uli; Szucsich, Luka; Verhoeven, Elisabeth; Waltereit, Richard; Wolfsgruber, Anne; Zeige, Lars Erik  (2020) Register: Language Users’ Knowledge of Situational-Functional Variation In:  REALIS: Register Aspects of Language in Situation [DOI] [ViVo]
    The Collaborative Research Center 1412 “Register: Language Users’ Knowledge of Situational-Functional Variation” (CRC 1412) investigates the role of register in language, focusing in particular on what constitutes a language user’s register knowledge and which situational-functional factors determine a user’s choices. The following paper is an extract from the frame text of the proposal for the CRC 1412, which was submitted to the Deutsche Forschungsgemeinschaft in 2019, followed by a successful onsite evaluation that took place in 2019. The CRC 1412 then started its work on January 1, 2020. The theoretical part of the frame text gives an extensive overview of the theoretical and empirical perspectives on register knowledge from the viewpoint of 2019. Due to the high collaborative effort of all PIs involved, the frame text is unique in its scope on register research, encompassing register-relevant aspects from variationist approaches, psycholinguistics, grammatical theory, acquisition theory, historical linguistics, phonology, phonetics, typology, corpus linguistics, and computational linguistics, as well as qualitative and quantitative modeling. Although our positions and hypotheses since its submission have developed further, the frame text is still a vital resource as a compilation of state-of-the-art register research and a documentation of the start of the CRC 1412. The theoretical part without administrative components therefore presents an ideal starter publication to kick off the CRC’s publication series REALIS. For an overview of the projects and more information on the CRC, see https://sfb1412.hu-berlin.de/.
  • Machicao y Priemer, Antonio; Fritz-Huechante, Paola  (2020) Boundaries at play In:  Interfaces in Romance [DOI] [ViVo]
    Summary In this paper, we model the left-bounded state reading and the true reflexive reading of the se clitic in the Spanish psychological domain. We argue that a lexical analysis of se provides us with a more accurate description of the different classes of psychological verbs that occur with the clitic. We provide a unified analysis where the use of the two readings of se are modeled by means of lexical rules. We take the morphologically simple but semantically more complex basic items (e.g. asustar ‘frighten’) as input of the lexical rules, getting as the output a morphologically more complex but semantically simpler verb (e.g asustarse ‘get frightened’). The analysis for psych verbs correctly allows only those verbs assigning accusative to the experiencer or the stimulus to combine with se, hence preventing dative verbs from entering the lexical rules. The analysis also demonstrates how to account for punctual and non-punctual readings of psych verbs with se incorporating ‘boundaries’ into the type hierarchy of eventualities.
  • 2018

  • Verhoeven, Elisabeth; Lehmann, Nico  (2018) Self-embedding and complexity in oral registers In:  Glossa: a journal of general linguistics [DOI] [ViVo]
    This article reports the results of a study on the self-embedding depth of nominal, verbal and clausal projections in spoken corpora of German. We compared two spoken registers featuring public and non-public (i.e. private) conversation by measuring the depth of self-embedding in C, V, and N projections. The findings confirm the hypothesis that the familiarity of the speech situation (public vs. non-public speech) has a significant impact on complexity in terms of self-embedding: speakers use more self-embedding in public speech production in different syntactic projections. In addition, we examined previous assumptions about the differences between right, left, and center embedding in C projections. The results confirm a preference against center embedding in non-public texts, which reflects the complexity of center embedding. Finally, we find evidence that the depth of self-embedding in V and C projections is correlated. This finding suggests that self-embedding depth is part of a general strategy, i.e., speakers select more or less complex structures (of different types) depending on factors of the speech situation.
  • Presentations

    2023

  • Haig, Geoffrey  (2023) Which domains of morphosyntax are sensitive to register variation? Thoughts from Iranian languages. In:  Humboldt-Universität zu Berlin: Kolloquium Syntax und Semantik (2023) [ViVo]
  • Sailer, Manfred  (2023) Explicit or redundant: The social meaning of multiple exponence In:  Humboldt-Universität zu Berlin: Kolloquium Syntax und Semantik (2023) [ViVo]
  • Sailer, Manfred  (2023) Explicit or redundant: The social meaning of multiple exponence In:  Kolloquium SFB1412 (2023) [ViVo]
  • 2022

  • Engel, Eric; Adli, Aria  (2022) Complexity and fluency at the end of the life span In:  Kolloquium SFB1412 (2022) [ViVo]
  • Farokhnejad, Zahra  (2022) A general outlook of Kurdish register data: focusing on Code-switching and post-predicate constituents In:   Humboldt-Universität zu Berlin: Kolloquium Syntax und Semantik (2022) [ViVo]
  • Lehmann, Nico  (2022) Register and the function puzzle: Why register competence is not the whole story In:  Kolloquium SFB1412 (2022) [ViVo]
  • Varaschin, Giuseppe; Machicao y Priemer, Antonio  (2022) Agreement mismatches and register-driven variation in Brazilian Portuguese In:  Oberseminar Syntax and Semantics, Institut für England- und Amerikastudien, Goethe-Universität Frankfurt am Main [ViVo]
  • 2021

  • 2020

Resources (re-)used by A06


Corpora


CoCoYum

Type: Corpus
Status: created
Details: [ViVo] [URL]
Language: Yucatec, Maya
Size: 159.00 tokens
Description: natural language production (spoken) and elicited data
Features: morpheme, glosses, translations, comments
Access: CC-BY-NC-ND

The Collective Corpus of Yucatec Maya (CoCoYum) is a collection of data from various researchers about the Yucatec Mayan language. It contains transcriptions of recordings (e.g. story telling, dialogue, public events), written data as well as elicited data. The corpus will be enlarged in time with fresh data collections and when further researchers add their data to the corpus.
Used by: A06

FOLK excerpt

Type: Corpus
Status: used
Details: [ViVo]
Language: German
Size: 194,716 tokens
Description: conversations in various situations
Features: rich metadata lemma, POS, speech unit segmentation, some dependencies
Access: internal
Used by: A06

GeWISS excerpt

Type: Corpus
Status: used
Details: [ViVo] [URL]

GeWiss is a research project in spoken academic language. It provides a multilingual (German/English/Polish/Italian) corpus of audio recordings and transcriptions of academic communications, as an empirical foundation for comparative research.

To this end, the GeWiss corpus focusses on two main genres of spoken adademic language:

  • talks including discussions, and
  • oral exams,

and it explicitly distinguishes between L1 and L2 subcorpora. The corpus is enlarged and developed continuously.


Used by: A06

Lang*Reg: A multi-lingual corpus of intra-individual variation across situations

Type: Corpus
Status: created
Details: [ViVo] [DOI]
Language: German, Persian, Yucatec Maya, Kurdish, Javanese
Size: 36 hours
Description: same speakers varied by mode, acquaintance, professionalism, and expertise
Features: transcription, syntactic segmentation, normalization, token, glossing or POS-tags, some syntax
Access: transcription or annotation in progress; CC-BY-NC-ND
Used by: A06

sgs corpus

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
Language: Persian
Size: 26 h
Description: free spoken dialogues with interviewer on fictive crime scenario
Features: social metadata, syntax
Access: internal
Used by: A06

Software


ANNIS3

Type: Software publication
Status: used
Details: [ViVo] [DOI] [URL]

A web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.

Available from: https://corpus-tools.org/annis/
Documentation: https://corpus-tools.org/annis/documentation.html
Cite as: Krause, Thomas & Zeldes, Amir (2016): ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). http://dsh.oxfordjournals.org/content/31/1/118


Used by: A03, A06, B04, C04, C06, INF

ELAN

Type: Software publication
Status: used
Details: [ViVo] [DOI] [URL]
ELAN is computer software, a professional tool to manually and semi-automatically annotate and transcribe audio or video recordings. It has a tier-based data model that supports multi-level, multi-participant annotation of time-based media.

Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, The Netherlands

Lausberg, H., & Sloetjes, H. (2009). Coding gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, Instruments, & Computers, 41(3), 841-849. doi:10.3758/BRM.41.3.841.
Used by: A06

EXMARaLDA

Type: Software publication
Status: used
Details: [ViVo] [URL]
EXMARaLDA is a system for working with oral corpora on a computer. It consists of a transcription and annotation tool (Partitur-Editor), a tool for managing corpora (Corpus-Manager) and a query and analysis tool (EXAKT). Further parts of EXMARaLDA are FOLKER and OrthoNormal, which were both developed in and for the FOLK project.
Schmidt T and Wörner K (2014), „EXMARaLDA“, In Handbook on Corpus Phonology, pp. 402-419. Oxford University Press.
Used by: A06, C05

Field Linguist's Toolbox

Type: Software publication
Status: used
Details: [ViVo] [URL]
Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data. Toolbox is free to download and use.
Used by: A06, B02

INCEpTION

Type: Software publication
Status: used
Details: [ViVo] [URL]
We introduce INCEpTION, a new annotation platform for tasks including interactive and semantic annotation (e.g., concept linking, fact linking, knowledge base population, semantic frame annotation). These tasks are very time consuming and demanding for annotators, especially when knowledge bases are used. We address these issues by developing an annotation platform that incorporates machine learning capabilities which actively assist and guide annotators. The platform is both generic and modular. It targets a range of research domains in need of semantic annotation, such as digital humanities, bioinformatics, or linguistics. INCEpTION is publicly available as open-source software.

INF is hosting INCEpTION at https://inception.sfb1412.hu-berlin.de (Intranet only)
Used by: A01, A06, B04, INF

PennController for Internet Based Experiments (IBEX)

Type: Software publication
Status: used
Details: [ViVo] [URL]
PennController for Internet Based Experiments (“PennController” or “PCIbex” for short) provides the tools to build and run online experiments, from familiar paradigms like self-paced reading to completely custom-designed paradigms.
Used by: A03, A06, A07, C03

Pepper

Type: Software publication
Status: used
Details: [ViVo] [URL]

A highly extensible platform for conversion and manipulation of linguistic data between an unbound set of formats. Pepper can be used stand-alone as a command line interface, or be integrated as an API into other software products.

Available from: https://corpus-tools.org/pepper/
Documentation: https://corpus-tools.org/pepper/userGuide.html
Cite as: F. Zipser & L. Romary (2010). A model oriented approach to the mapping of annotation formats using standards. In: Proceedings of the Workshop on Language Resource and Language Technology Standards, LREC 2010. Malta. URL: http://hal.archives-ouvertes.fr/inria-00527799/en/


Used by: A01, A06, INF

Tools rPraat and mPraat - Interfacing Phonetic Analyses with Signal Processing

Type: Software publication
Status: used
Details: [ViVo] [DOI]
The paper presents the rPraat package for R/mPraat toolbox for Matlab which constitutes an interface between the most popular software for phonetic analyses, Praat, and the two more general programmes. The package adds on to the functionality of Praat, it is shown to be superior in terms of processing speed to other tools, while maintaining the interconnection with the data structure of R and Matlab, which provides a wide range of subsequent processing possibilities. The use of the proposed tool is demonstrated on a comparison of real speech data with synthetic speech generated by means of dynamic unit selection.
Used by: A03, A06, C06

Kontakt