Prof. Dr. phil. Roland Schäfer

Friedrich-Schiller-Universität Jena

Institut für Germanistische Sprachwissenschaft

Projekte

A04 Building register into the architecture of language – an HPSG account

Kontakt

Website https://orcid.org/0000-0003-3233-7874

Postanschrift

Ich bin Linguist und konzentriere mich in meiner Forschung auf deutsche Morphosyntax und die Grammatik-Graphematik-Schnittstelle des geschriebenen Deutsch (einschließlich standardferner Varianten). Mir wurde von der Sprach- und literaturwissenschaftlichen Fakultät der Humboldt-Universität zu Berlin die doppelte Venia Legendi für Germanistische Linguistik und Allgemeine Sprachwissenschaft verliehen, und ich bin als Privatdozent Mitglied der Fakultät.

Meine linguistische Forschung ist kognitiv ausgerichtet, theoriegetrieben und gleichzeitig stark empirisch. Ich nutze korpuslinguistische und experimentelle Methoden. Außerdem habe ich ein stark ausgeprägtes Interesse an statistischen Methoden, Erkenntnisteorie und Analysetechniken für große Datenmengen. Die sehr großen und frei verfügbaren COW-Webkorpora (webcorpora.org) wurden von mir hauptverantwortlich entwickelt. Von 2015 bis 2018 habe ich mein eigenes DFG-Projekt zur Grammatik im deutschen WWW an der Freien Universität Berlin geleitet. Außerdem hatte ich 2016 und von 2018 bis 2019 eine Vertretungsprofessur für Deutsche Grammatik an der Freien Universität Berlin.

Nicht zuletzt interessieren mich die Fachwissenschaft-Didaktik-Schnittstelle und Aspekte des Lehramtsstudiums. Dabei fasziniert mich besonders die Rolle expliziten sprachlichen Wissens im späten Bildungsspracherwerb und in der Entwicklung von Registerbewusstsein. Ich habe eine breite Lehrerfahrung in Deutscher und Englischer Sprachwissenschaft sowie in Theoretischer/Allgemeiner Sprachwissenschaft und angewandter Computerlinguistik.

Veröffentlichungen und Präsentationen

    Veröffentlichungen

  • Schäfer, Roland  (2024) Between syntax and morphology: German noun+verb units  In: Glossa: a journal of general linguistics [DOI] [ViVo]
    We show that graphemic variation—at least in some writing systems—can be analysed in terms of grammatical variation given a usage- based probabilistic view of the grammar-graphemics interface. Concretely, we examine a type of noun+verb unit in German, which can be written as one word or two. We argue that the variation in writing is rooted in the units’ ambiguous status in between morphology (one word) and syntax (two words). The major influencing factors are shown to be the semantic relation between the noun and the verb (argument or oblique relation) and the morphosyntactic context. In prototypically nominal contexts, a reinterpretation of the unit as a noun+noun compound is facilitated, which favours spelling as one word, while in prototypically verbal contexts, a syntactic realisation and consequently spelling as two words is preferred. We report the results of two large-scale corpus studies and a controlled production experiment to corroborate our analysis.
  • Pescuma, Valentina Nicole; Serova, Dina; Lukassek, Julia; Sauermann, Antje; Schäfer, Roland; Adli, Aria; Bildhauer, Felix; Egg, Markus; Hülk, Kristina; Ito, Aine; Jannedy, Stefanie; Kordoni, Valia; Kühnast, Milena; Kutscher, Silvia; Lange, Robert; Lehmann, Nico; Liu, Mingya; Lütke, Beate; Maquate, Katja; Mooshammer, Christine; Mortezapour, Vahid; Müller, Stefan; Norde, Muriel; Pankratz, Elizabeth; Patarroyo, Angela Giovanna; Plesca, Ana-Maria; Ronderos, Camilo R.; Rotter, Stephanie; Sauerland, Uli; Schulte, Britta; Schüppenhauer, Gediminas; Sell, Bianca Maria; Solt, Stephanie; Terada, Megumi; Tsiapou, Dimitra; Verhoeven, Elisabeth; Weirich, Melanie; Wiese, Heike; Zaruba, Kathy; Zeige, Lars Erik; Lüdeling, Anke; Knoeferle, Pia; Schnelle, Gohar  (2023) Situating language register across the ages, languages, modalities, and cultural aspects: Evidence from complementary methods  In: Frontiers in Psychology [DOI] [PDF] [ViVo]
    In the present review paper by members of the collaborative research center ‘Register: Language Users’ Knowledge of SituationalFunctional Variation’ (CRC 1412), we assess the pervasiveness of register phenomena across different time periods, languages, modalities, and cultures. We define ‘register’ as recurring variation in language use depending on the function of language and on the social situation. Informed by rich data, we aim to better understand and model the knowledge involved in situation- and function-based use of language register. In order to achieve this goal, we are using complementary methods and measures. In the review, we start by clarifying the concept of ‘register’, by reviewing the state of the art, and by setting out our methods and modeling goals. Against this background, we discuss three key challenges, two at the methodological level and one at the theoretical level: 1. To better uncover registers in text and spoken corpora, we propose changes to established analytical approaches. 2. To tease apart between-subject variability from the linguistic variability at issue (intra-individual situation based register variability), we use within-subject designs and the modeling of individuals’ social, language, and educational background. 3. We highlight a gap in cognitive modeling, viz. modeling the mental representations of register (processing), and present our first attempts at filling this gap. We argue that the targeted use of multiple complementary methods and measures supports investigating the pervasiveness of register phenomena and yields comprehensive insights into the cross-methodological robustness of register-related language variability. These comprehensive insights in turn provide a solid foundation for associated cognitive modeling.
  • Präsentationen

  • Schäfer, Roland  (2020) Grammatische Variation zwischen Individuen und Situationen: Perspektiven für Linguistik und Bildungsspracherwerb  In: Humboldt-Universität zu Berlin: Kolloquium Syntax und Semantik (2020) [ViVo]
  • Schäfer, Roland  (2020) Grammatische Variation zwischen Individuen und Situationen: Perspektiven für Linguistik und Bildungsspracherwerb  In: Humboldt-Universität zu Berlin: Kolloquium Syntax und Semantik (2020) [ViVo]
  • Schäfer, Roland; Bildhauer, Felix  (2020) Beyond Multidimensional Analysis: Probabilistic Register Induction for Large Corpora  In: Humboldt-Universität zu Berlin: Kolloquium Syntax und Semantik (2020) [ViVo]
    The analysis of the register in which a corpus document is written is prominently associated with Biber’s (1988; 1995) Multidimensional Analysis (MDA). We present an approach superficially similar to MDA but which solves three major conceptual problems of MDA by using Bayesian inference to uncover registers or – rather potential registers. First, in Biber’s MDA, registers are associated discretely with documents, and each document can only instantiate one specific register, whereas we allow registers to be associated probabilistically with documents, and we allow mixtures of registers in single documents. Given that many linguistic phenomena are now understood as being probabilistic in nature (cf. Schäfer 2018), we suggest that this is a much more realistic assumption. Second, we assume the surface features to be associated with registers in a probabilistic manner for similar reasons. Third, we do not use a catalogue of registers assumed to exist a priori, but instead we merely infer potential registers (pregisters) via clusters of surface features. The question of which pregisters actually correspond to registers with an identifiable situational communicative setting will be dealt with in a future stage of the project using theory-driven evaluation and experimental validation. Given our assumptions about the nature of the mapping between features and pregisters and pregisters and documents, an obvious algorithm to use is Bayesian inference in the form of Latent Dirichlet Allocation (LDA; Blei et al. 2003; Blei 2012) as used in Topic Modelling. In our approach, we deal with pregisters instead of topics and with distributions of lexico-grammatical surface features instead of lexical words. The LDA algorithm otherwise performs an exactly parallel inference task. We first show how we extended the COReX feature extraction framework (Bildhauer & Schäfer in prep.) developed at FU Berlin and the IDS Mannheim in order to provide a large enough number of features for the LDA algorithm to work. We then present first results and discuss how we tuned the LDA algorithm and the feature set to lead to interpretable results. In order to be able to interpret the pregisters found by LDA, we extract the documents which most strongly instantiate the inferred pregisters. We introduce the PreCOX20 sub-corpus of the DECOW German web corpus, in which those prototypical documents are collected for further analysis w.r.t. their situational communicative setting. References: Biber, D. (1988). Variation across Speech and Writing. CUP. Biber, D. (1995). Dimensions of Register Variation: A Cross-Linguistic Comparison. CUP. Bildhauer, F. & R. Schäfer (in prep.) Automatic register annotation and alternation modelling. Blei, D. M (2012). Probabilistic topic models. Communications of the ACM 55(4), 77-84. Blei, D. M., A. Y. Ng & M. I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993-1022. Schäfer, R. (2018). Probabilistic German Morphosyntax. Habilitation thesis. HU Berlin.
  • Schäfer, Roland; Bildhauer, Felix  (2020) Beyond Multidimensional Analysis: Probabilistic Register Induction for Large Corpora  In: Humboldt-Universität zu Berlin: Kolloquium Syntax und Semantik (2020) [ViVo]
    The analysis of the register in which a corpus document is written is prominently associated with Biber’s (1988; 1995) Multidimensional Analysis (MDA). We present an approach superficially similar to MDA but which solves three major conceptual problems of MDA by using Bayesian inference to uncover registers or – rather potential registers. First, in Biber’s MDA, registers are associated discretely with documents, and each document can only instantiate one specific register, whereas we allow registers to be associated probabilistically with documents, and we allow mixtures of registers in single documents. Given that many linguistic phenomena are now understood as being probabilistic in nature (cf. Schäfer 2018), we suggest that this is a much more realistic assumption. Second, we assume the surface features to be associated with registers in a probabilistic manner for similar reasons. Third, we do not use a catalogue of registers assumed to exist a priori, but instead we merely infer potential registers (pregisters) via clusters of surface features. The question of which pregisters actually correspond to registers with an identifiable situational communicative setting will be dealt with in a future stage of the project using theory-driven evaluation and experimental validation. Given our assumptions about the nature of the mapping between features and pregisters and pregisters and documents, an obvious algorithm to use is Bayesian inference in the form of Latent Dirichlet Allocation (LDA; Blei et al. 2003; Blei 2012) as used in Topic Modelling. In our approach, we deal with pregisters instead of topics and with distributions of lexico-grammatical surface features instead of lexical words. The LDA algorithm otherwise performs an exactly parallel inference task. We first show how we extended the COReX feature extraction framework (Bildhauer & Schäfer in prep.) developed at FU Berlin and the IDS Mannheim in order to provide a large enough number of features for the LDA algorithm to work. We then present first results and discuss how we tuned the LDA algorithm and the feature set to lead to interpretable results. In order to be able to interpret the pregisters found by LDA, we extract the documents which most strongly instantiate the inferred pregisters. We introduce the PreCOX20 sub-corpus of the DECOW German web corpus, in which those prototypical documents are collected for further analysis w.r.t. their situational communicative setting. References: Biber, D. (1988). Variation across Speech and Writing. CUP. Biber, D. (1995). Dimensions of Register Variation: A Cross-Linguistic Comparison. CUP. Bildhauer, F. & R. Schäfer (in prep.) Automatic register annotation and alternation modelling. Blei, D. M (2012). Probabilistic topic models. Communications of the ACM 55(4), 77-84. Blei, D. M., A. Y. Ng & M. I. Jordan (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993-1022. Schäfer, R. (2018). Probabilistic German Morphosyntax. Habilitation thesis. HU Berlin.