INF
Data management and statistical analysis

In enger Zusammenarbeit mit dem Computer- und Medienservice der Humboldt-Universität bietet das INF-Projekt technische und statistische Unterstützung für alle Mitglieder des SFB. Insbesondere wird INF in allen Fragen der Datenerhebung, -verarbeitung, -aufbereitung, -speicherung und -analyse beraten. Neben den Serviceaufgaben wird INF die vorhandene Infrastruktur für die Erfordernisse des SFB weiterentwickeln. Die für die Registeranalyse oft verwendeten multifaktoriellen Methoden beschreiben die Daten nicht immer angemessen und müssen verändert und durch andere Methoden ergänzt werden.

Material und Templates

Ein Template für Beamerpräsentationen mit Latex findet sich zum einen in einem öffentlichen git-Repository und auch als overleaf-Projekt, das angemeldete Benutzer kopieren können.

Ein Template für Konfernenzposter mit Latex findet sich ebenfalls in einem öffentlichen git-Repository und auch als overleaf-Projekt, das angemeldete Benutzer kopieren können.

Das Template für Einreichungen in REALIS findet sich ebenfalls in einem öffentlichen git-Repository und als overleaf-Projekt, das angemeldete Benutzer kopieren können.

 

Mitarbeiter*innen

Leitung

Malte Dreyer

Computer und Medienservice (CMS)
Humboldt-Universität zu Berlin


Dr. Thomas Krause

Sprach- und literaturwissenschaftliche Fakultät
Humboldt-Universität zu Berlin


Prof. Dr. phil. Anke Lüdeling

Sprach- und literaturwissenschaftliche Fakultät
Humboldt-Universität zu Berlin

Mitarbeiter*innen

Felix Golcher

Sprach- und literaturwissenschaftliche Fakultät
Humboldt-Universität zu Berlin

felix.golcher@hu-berlin.de

André Renis

Computer und Medienservice (CMS)
Humboldt-Universität zu Berlin

andre.renis@hu-berlin.de

Studentische Hilfskräfte

Arsenii Gulevich

Sprach- und literaturwissenschaftliche Fakultät
Humboldt-Universität zu Berlin

gulevica@informatik.hu-berlin.de

Publications & Presentations

    Publications

    2022

  • Pescuma, Valentina Nicole; Serova, Dina; Lukassek, Julia; Sauermann, Antje; Schäfer, Roland; Adli, Aria; Bildhauer, Felix; Egg, Markus; Hülk, Kristina; Ito, Aine; Jannedy, Stefanie; Kordoni, Valia; Kühnast, Milena; Kutscher, Silvia; Lange, Robert; Lehmann, Nico; Liu, Mingya; Lütke, Beate; Maquate, Katja; Mooshammer, Christine; Mortezapour, Vahid; Müller, Stefan; Norde, Muriel; Pankratz, Elizabeth; Patarroyo, Angela Giovanna; Plesca, Ana-Maria; Ronderos, Camilo R.; Rotter, Stephanie; Sauerland, Uli; Schulte, Britta; Schüppenhauer, Gediminas; Sell, Bianca Maria; Solt, Stephanie; Terada, Megumi; Tsiapou, Dimitra; Verhoeven, Elisabeth; Weirich, Melanie; Wiese, Heike; Zaruba, Kathy; Zeige, Lars Erik; Lüdeling, Anke; Knoeferle, Pia  (2022) Situating language register across the ages, languages, modalities, and cultural aspects: Evidence from complementary methods In:  Frontiers in Psychology [DOI] [ViVo]
    In the present review paper by members of the collaborative research center ‘Register: Language Users’ Knowledge of SituationalFunctional Variation’ (CRC 1412), we assess the pervasiveness of register phenomena across different time periods, languages, modalities, and cultures. We define ‘register’ as recurring variation in language use depending on the function of language and on the social situation. Informed by rich data, we aim to better understand and model the knowledge involved in situation- and function-based use of language register. In order to achieve this goal, we are using complementary methods and measures. In the review, we start by clarifying the concept of ‘register’, by reviewing the state of the art, and by setting out our methods and modeling goals. Against this background, we discuss three key challenges, two at the methodological level and one at the theoretical level: 1. To better uncover registers in text and spoken corpora, we propose changes to established analytical approaches. 2. To tease apart between-subject variability from the linguistic variability at issue (intra-individual situation based register variability), we use within-subject designs and the modeling of individuals’ social, language, and educational background. 3. We highlight a gap in cognitive modeling, viz. modeling the mental representations of register (processing), and present our first attempts at filling this gap. We argue that the targeted use of multiple complementary methods and measures supports investigating the pervasiveness of register phenomena and yields comprehensive insights into the cross-methodological robustness of register-related language variability. These comprehensive insights in turn provide a solid foundation for associated cognitive modeling.
  • 2021

  • Lüdeling, Anke; Hirschmann , Hagen; Shadrova, Anna; Wan, Shujun  (2021) Tiefe Analyse von Lernerkorpora In:   Deutsch in Europa [DOI] [ViVo]
    Die Sprache von Lerner/-innen einer Fremdsprache unterscheidet sich auf allen linguistischen Ebenen von der Sprache von Muttersprachler/-innen. Seit einigen Jahrzehnten werden Lernerkorpora gebaut, um Lernersprache quantitativ und qualitativ zu analysieren. Hier argumentieren wir anhand von drei Fallbeispielen (zu Modifikation, Koselektion und rhetorischen Strukturen) für eine linguistisch informierte, tiefe Phänomenmodellierung und Annotation sowie für eine auf das jeweilige Phänomen passende formale und quantitative Modellierung. Dabei diskutieren wir die Abwägung von tiefer, mehrschichtiger Analyse einerseits und notwendigen Datenmengen für bestimmte quantitative Verfahren andererseits und zeigen, dass mittelgroße Korpora (wie die meisten Lernerkorpora) interessante Erkenntnisse ermöglichen, die große, flacher annotierte Korpora so nicht erlauben würden.
  • Shadrova, Anna; Lindscheid, Pia; Lukassek, Julia; Lüdeling, Anke; Schneider, Sarah  (2021) A Challenge for Contrastive L1/L2 Corpus Studies: Large Inter- and Intra-Individual Variation Across Morphological, but Not Global Syntactic Categories in Task-Based Corpus Data of a Homogeneous L1 German Group In:  Frontiers in Psychology [DOI] [ViVo]
    In this paper, we present corpus data that questions the concept of native speaker homogeneity as it is presumed in many studies using native speakers (L1) as a control group for learner data (L2), especially in corpus contexts. Usage-based research on second and foreign language acquisition often investigates quantitative differences between learners, and usually a group of native speakers serves as a control group, but often without elaborating on differences within this group to the same extent. We examine inter-personal differences using data from two well-controlled German native speaker corpora collected as control groups in the context of second and foreign language research. Our results suggest that certain linguistic aspects vary to an extent in the native speaker data that undermines general statements about quantitative expectations in L1. However, we also find differences between phenomena: while morphological and syntactic sub-classes of verbs and nouns show great variability in their distribution in native speaker writing, other, coarser categories, like parts of speech, or types of syntactic dependencies, behave more predictably and homogeneously. Our results highlight the necessity of accounting for inter-individual variance in native speakers where L1 is used as a target ideal for L2. They also raise theoretical questions concerning a) explanations for the divergence between phenomena, b) the role of frequency distributions of morphosyntactic phenomena in usage-based linguistic frameworks, and c) the notion of the individual adult native speaker as a general representative of the target language in language acquisition studies or language in general.
  • 2020

  • Guescini, Rolf Borgen; Krause, Thomas; Odebrecht, Carolin; Schulz, Konstantin  (2020) Laudatio Repository - Long-term Access and Usage of Deeply Annotated Information Docker Images[DOI] [ViVo]
    This is the dockerized images of the Laudatio Repository software described by the following: The management and archiving of digital research data is an overlapping field for linguistics, library and information science (LIS) and computer science: The departments of Corpus Linguistics and the Computer and Media Service (CMS) at Humboldt-Universität zu Berlin and The National Institute for Research in Computer Science and Control (INRIA France) are project partners cooperating with the Berlin School of Library and Information Science (BSLIS). LAUDATIO has developed an open access research data repository for historical corpora. For the access and (re-)use of historical corpora, the LAUDATIO repository uses a flexible and appropriate documentation schema with a subset of TEI customized by TEI ODD. The extensive metadata schema contains information about the preparation and checking methods applied to the data, tools, formats and annotation guidelines used in the project, as well as bibliographic metadata, and information on the research context (e.g. the research project). To provide complex and comprehensive search in the annotation data, the search and visualization tool ANNIS is integrated in the LAUDATIO-Repository.
  • Oikonomou, Despina; Golcher, Felix; Alexiadou, Artemis  (2020) Quantifier scope and information structure in Greek In:  Glossa: a journal of general linguistics [DOI] [ViVo]
    In this paper, we investigate the availability of inverse scope interpretation in doubly-quantified sentences in Greek. A rather coarse and, as we show, inaccurate empirical generalization is that languages with relatively free word order do not have inverse scope readings, since movement is always spelled-out. In Greek there is little experimental work testing inverse scope with DP-quantifiers and there is considerable disagreement among linguists regarding its availability. Our goal is two-fold: i) to contribute towards a better understanding of the empirical facts and ii) to explore the relation between inverse scope availability and the syntax and semantics of different configurations. As we show, inverse scope is generally acceptable by Greek speakers, with the exception of environments with Clitic Left Dislocation. Our data add up to recent studies in other languages which suggest that the critical factor for the (non)-availability of inverse scope is the properties of each individual construction and not a dichotomy between different types of languages.
  • Presentations

    2022

  • Lüdeling, Anke  (2022) Variability in Grammatical Categories and Structures: The Case of Word Formation, Ghent, Belgium In:  Grammar and Corpora (GaC) [ViVo]
  • 2021

  • Lüdeling, Anke; Lukassek, Julia  (2021) Zum Erwerb von Registerwissen bei Lerner:innen des Deutschen als Fremdsprache. Registerstudien in Lernerkorpora In:  Colloquium Uni Gießen [ViVo]
  • Lüdeling, Anke; Lukassek, Julia  (2021) Registerwissen und morphologische Struktur. Eine Studie zu komplexen Wörtern bei Lerner:innen des Deutschen als Fremdsprache und Muttersprachler:innen In:  SPIGL [ViVo]
  • 2020

  • Lüdeling, Anke  () Zum Umgang mit Variation in der Lernersprachenanalyse. Perspektiven aus und für DaF/DaZ In:  LCR2022 6th Learner Corpus Research Conference, Padua, September [ViVo]

Kontakt

Felix Golcher

Humboldt-Universität zu Berlin

030 / 2093 91330

felix.golcher@hu-berlin.de