A03
"Expressive" dislocation and register in Czech vs. Russian

Project A03 investigates syntactic variation in Czech and Russian in connection with register by analyzing different types of dislocated constituents. The project will fill important gaps in prior research on syntactic register variation in the two languages. It will first approach register by a bottom-up analysis of the (non-)occurrence of selected linguistic features across text types. Then it will experimentally investigate the appropriateness of register-related dimensions ascribed to particular linguistic behavior with a matched-guise study. A further step will explore the acceptability of the dislocation phenomena in a series of experiments building on the initial corpus analysis, followed by theoretical modeling in a way which makes explicit relations between register-related, grammatical and information-structural restrictions on word order.

Members

Project leader


Members


Student assistant

Alumni


Publications & Presentations

    Publications

    2023

  • Meyer, Roland  (2023) Control Constructions In:  Grenoble, L. et al. (eds.): Brill’s Encyclopedia of Slavic Languages and Linguistics [ViVo]
  • Meyer, Roland  (2023) Raising Contstructions In:  Grenoble, L. et al. (eds.): Brill’s Encyclopedia of Slavic Languages and Linguistics [ViVo]
  • Meyer, Roland  (2023) Detecting Authorship, Hands, and Corrections in Historical Manuscripts. A Mixed-Methods Approach Towards the Unpublished Writings of an 18th Century Czech Emigré Community in Berlin In:  Schneider, B. et al. (eds.). Mixed Methods in the Humanities. Digital Humanities Research, transcript. [ViVo]
  • Maquate, Katja; Buchmüller, Olga; Reul, Guendalina; Tanis-Cosgun, Esma; Knoeferle, Pia  (2023) Situational-functional settings affect evaluation of linguistic register In:  AMLaP 2023 [ViVo]
    Although real-time studies point to a tight link between social context and lexical processing, evidence regarding a) explicit sentence judgements, b) social context and grammatical processing and c) socially-situated context and sentence register processing is less clear.  In three rating studies (N=32 each), we investigated the influence of the socially-situated context on sentences that (mis)matched in (in)formal register or that (mis)matched in verb-argument / subject-verb congruence. Moreover, exploratory analyses investigated whether context presentation modality (written vs. auditory vs. pictorial) affects ratings as markers of linguistic performance. Participants rated register matching (vs. mismatching) sentences sign. higher in acceptability for pictorial and sign. higher in grammaticality for auditory contexts. Crucially, the interaction between register and semantic verb-argument congruence was significant, revealing that participants rated semantically incongruent sentences as more grammatical only when the register matched with the visually-depicted context. Hence, the situational-functional context seems to influence the assessment of sentence register, which, in turn, seems to affect the perception of semantic congruence of a sentence. Moreover, a pictorial situational-functional context seems to be integrated more easily with a following sentence when participants rated the acceptability of the sentence, while an auditory context can be integrated more easily when participants rated its grammaticality.
  • Veenstra, Tonjes; Krifka, Manfred; Akbari, Roodabeh; Buchmüller, Olga; Chark, Jordan; Döring, Sophia; Golcher, Felix; Schmidt, Peter  (2023) Podcast: Sprachen aus dem Schnellkochtopf: Register in Kreols (Teil 1) [ViVo]
    Kreolsprachen sind ein Wunder der Linguistik. Innerhalb weniger Generationen entstehen diese Sprachen dort, wo Menschen ohne gemeinsame Sprache miteinander kommunizieren müssen. Unser Projekt A02 "Speaker's choices in a creole context: Bislama and Morisien" untersucht zwei Kreolsprachen aus Melanesien und Mauritius. Wir sprechen mit Manfred Krifka und Tonjes Veenstra.
  • Lüdeling, Anke; Akbari, Roodabeh; Buchmüller, Olga; Chark, Jordan; Döring, Sophia; Golcher, Felix; Schmidt, Peter  (2023) Podcast: Was ist ein Register? [ViVo]
    Was ist damit gemeint, wenn wir in der Linguistik von "Registern" sprechen und warum ist das überhaupt interessant? Wir zeigen Beispiele dafür, wie Sprecher*innen zwischen Registern wechseln und was passiert, wenn das falsche Register gewählt wird. Anke Lüdeling erzählt uns in einem Interview, wie die Idee zu dem Sonderforschungsbereich entstanden ist.
    (Vielen Dank an Onur Özsoy, der uns das Telefon-Beispiel eingesprochen hat und an Andreas Nolda  für die Orgeleinspielungen!)
  • 2022

  • Wiese, Heike; Alexiadou, Artemis; Shanley, Allen; Bunk, Oliver; Gagarina, Natalia; Iefremenko, Kateryna; Martynova, Maria; Pashkova, Tatiana; Rizou, Vicky; Schroeder, Christoph; Shadrova, Anna; Szucsich, Luka; Tracy, Rosemarie; Wintai, Tsehaye; Zerbian, Sabine; Zuban, Yulia  (2022) Heritage Speakers as Part of the Native Language Continuum In:  Frontiers in Psychology [DOI] [ViVo]
    We argue for a perspective on bilingual heritage speakers as native speakers of both their languages and present results from a large-scale, cross-linguistic study that took such a perspective and approached bilinguals and monolinguals on equal grounds. We targeted comparable language use in bilingual and monolingual speakers, crucially covering broader repertoires than just formal language. A main database was the open-access RUEG corpus, which covers comparable informal vs. formal and spoken vs. written productions by adolescent and adult bilinguals with heritage-Greek, -Russian, and -Turkish in Germany and the United States and with heritage-German in the United States, and matching data from monolinguals in Germany, the United States, Greece, Russia, and Turkey. Our main results lie in three areas. (1) We found non-canonical patterns not only in bilingual, but also in monolingual speakers, including patterns that have so far been considered absent from native grammars, in domains of morphology, syntax, intonation, and pragmatics. (2) We found a degree of lexical and morphosyntactic inter-speaker variability in monolinguals that was sometimes higher than that of bilinguals, further challenging the model of the streamlined native speaker. (3) In majority language use, non-canonical patterns were dominant in spoken and/or informal registers, and this was true for monolinguals and bilinguals. In some cases, bilingual speakers were leading quantitatively. In heritage settings where the language was not part of formal schooling, we found tendencies of register leveling, presumably due to the fact that speakers had limited access to formal registers of the heritage language. Our findings thus indicate possible quantitative differences and different register distributions rather than distinct grammatical patterns in bilingual and monolingual speakers. This supports the integration of heritage speakers into the native-speaker continuum. Approaching heritage speakers from this perspective helps us to better understand the empirical data and can shed light on language variation and change in native grammars. Furthermore, our findings for monolinguals lead us to reconsider the state-of-the art on majority languages, given recurring evidence for non-canonical patterns that deviate from what has been assumed in the literature so far, and might have been attributed to bilingualism had we not included informal and spoken registers in monolinguals and bilinguals alike.
  • Haider, Hubert; Szucsich, Luka  (2022) Slavic languages are Type 3 languages: replies In:  Theoretical Linguistics [DOI] [ViVo]
  • Haider, Hubert; Szucsich, Luka  (2022) Slavic languages – “SVO” languages without SVO qualities? In:  Theoretical Linguistics [DOI] [ViVo]
    Abstract Slavic languages are commonly classified as SVO languages, with an exceptional property, though, namely an atypically extensive variability of word order. A systematic comparison of Slavic languages with uncontroversial SVO languages reveals, however, that exceptional properties are the rule. Slavic languages are ‘exceptional’ in so many syntactic respects that SVO appears to be a typological misnomer. This fact invites a fresh look. Upon closer scrutiny, it turns out that these languages are not exceptional, but regular members of a different type. They are representative of a yet unrecognised type of clause structure organisation. The dichotomy of ‘head-final’ and ‘head-initial’ does not exhaustively cover the system space of the make-up of phrases. In addition, there arguably exists a third option (T3). This is the type of phrasal architecture in which the head of the verb phrase is directionally unconstrained. It may precede, as in VO, it may follow, as in OV, and it may be sandwiched by its arguments within the phrase. From this viewpoint, the Slavic languages cease to be exceptional. They are regular representatives of the latter type, and, crucially, their collateral syntactic properties predictably match the properties of this type.
  • Tikhonov, Aleksej  (2022) Sprachen der Exilgemeinde in Rixdorf (Berlin): Autorenidentifikation und linguistische Merkmale anhand von tschechischen Manuskripten aus dem 18./19. Jahrhundert[ViVo]
  • Demian, Christoph; Buchmüller, Olga; Meyer, Roland; Szucsich, Luka  (2022) Syntactic Complexity and Register in Russian[ViVo]
    Syntactic complexity is often thought to systematically interact with register (Biber and Gray, 2010), ultimately because both are closely connected to processing load (e.g., Liu 2008). At the same time, it is far from clear how exactly to frame syntactic complexity, and what aspect of syntactic complexity is actually most sensitive to distinctions in register. The present paper compares two basic measures of syntactic complexity as applied to a corpus of Russian: (a) a simple frequency measure of clausal subordination, and (b) a measure of internal complexity of dependency trees, the average dependency distance (Liu 2008; Proisl et al. 2019). We show that traditional preconceptions about the amount of clausal subordination per register are often unwarranted, and that frequency of clausal subordination shows a very different register profile from dependency-based complexity measures. The classification of registers in Russian is a matter of debate in itself. The traditional and still most widespread approach relies on an inventory of so-called functional styles, which are distinguished by a mixture of situational, contents-based, and communicative-intentional characteristics (e.g., Warditz 2019). These found their way into tagged corpora such as the one we used, the 1.25 M tokens so-called Russkij standart (“Russian National Corpus” 2003), which has been hand-corrected for part-of-speech and grammatical tagging. (Note that functional styles are called spheres in this corpus and in the remainder.) While a desirable methodologically sound register analysis is in preparation, we can still use some uncontroversial register-related distinctions, such as spoken vs. written mode, fictional vs. technical prose, scientific prose vs. official announcements etc. Several methodological precautions must be mentioned: (i) We systematically excluded punctuation from token counts and from the leaves of dependency trees, which greatly improved the adequacy of our measures. (ii) Since text lengths differ dramatically in the spheres under consideration, we could not simply compare relative frequencies of types (Evert 2006, among many others). Instead, we approximated the frequencies of (lexical and part of speech) types by drawing a large number (50) of equally sized random samples of 4K tokens per sphere. The figures below depict the distribution of frequencies over these samples. (iii) The correctness of tagging and parsing by UDPipe (Straka and Straková 2017) was inspected systematically; about 94 % of the parses were found to be correct. First, consider the distribution of the subordinating complementizer čto ‘that’ across spheres (fig. 1). In line with Biber’s (1988) observations for English and contrary to Kožina (2011) for Russian, the spoken subcorpora, especially the oral public communication, show the highest relative frequency of this most widespread subordinator. This fact generalizes to all subordinating sentence connectors (= tagged SCONJ) (cf. fig. 2), rendering čto more or less prototypical. Second, official and business communication, technical documents and scientific prose are located at the other end of the scale, containing relatively few hypotactic structures. Fictional and private spoken communication take a medium position. We tentatively attribute this distribution to the net effect of a factor [±spoken] and a factor [±narrative], with nonfictional texts being less narrative in character than fictional texts. It is also plausible that fictional texts and public oral statements tend to contain more expressions of attitude and perception, which serve as embedding predicates for čto-clauses. Third, we observed that the frequency of čto-complementizers across spheres formed an almost perfect mirror image of the frequency of nouns (fig. 3). We attribute this to the two acting, at least partially, as complementary variants of a variable. 1 As a dependency-based measure of syntactic complexity, we chose average dependency distance (Liu 2008; cf. Proisl et al. 2019 for evaluation), which comprises the length of dependency links per sentence. Interestingly, this measure showed a profile across spheres which differed clearly from the clausal subordination measure: Here, all three spoken subcorpora are at the low end of the scale, while all written subcorpora more or less pool in the middle (fig. 4). Educational/scientific texts are among the highest in dependency-based complexity. This finding supports Biber & Grey’s (2010) conclusion that English academic writing is structurally complex, but not in the sense of frequent clausal subordination. By contrast, they found subordinate clauses to be more common in conversation, which is in line with our above finding on public oral communication in Russian. A potential problem for most dependency-based measures is their close correlation with sentence length (Proisl et al. 2019). This becomes especially cumbersome here, because sentence length varies across registers independently, confounding structural complexity. Furthermore, spoken subcorpora rely on (loosely defined) communicative units rather than on sentences delimited by punctuation. In order to strengthen our conclusions, we therefore plan to run a careful comparison of samples of equally long sentences across spheres, in order to reveal the effect of structural complexity proper.

    References
    Biber, D. (1988). Variation across speech and writing. Cambridge UP.
    Biber, D., & Gray, B. (2010). Challenging stereotypes about academic writing: Complexity, elaboration, explicitness. Journal of English for Academic Purposes, 9(1), 2–20.
    Evert, S. (2006). How random is a corpus? The library metaphor. Zeitschrift für Anglistik und Amerikanistik, 54(2), 177–190. Kožina, M. N. (2011). Stilistika russkogo jazyka (4th ed.). Nauka.
    Liu, H. (2008). Dependency distance as a metric of language comprehension difficulty. Journal of Cognitive Science, 9, 159–191. https://doi.org/10.17791/jcs.2008.9.2.159
    Proisl, T., Konle, L., Evert, S., & Jannidis, F. (2019). Dependenzbasierte syntaktische Komplexitätsmaße. In P. Sahle (Ed.), DHd 2019 Digital Humanities: multimedial & multimodal. Konferenzabstracts (pp. 270–273). https://doi.org/10.5281/zenodo.4622254 Russian national corpus: Offline disambiguated version of the corpus. (2003). http://ruscorpora. ru/new/
    Straka, M., & Straková, J. (2017). Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDpipe. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 88–99.
    Warditz, V. (2019). Varianz im Russischen: von funktionalstilistischer zur soziolinguistischen Perspektive. Peter Lang.
  • 2020

  • Szucsich, Luka; Agnes, Kim; Yazhinova, U.  (2020) Areal Convergence in Eastern Central European Languages and Beyond[DOI] [ViVo]
  • Alexiadou, Artemis; Lüdeling, Anke; Adli, Aria; Donhauser, Karin; Dreyer, Malte; Egg, Markus; Feulner, Anna Helene; Gagarina, Natalia; Hock, Wolfgang; Jannedy, Stefanie; Kammerzell, Frank; Knoeferle, Pia; Krause, Thomas; Krifka, Manfred; Kutscher, Silvia; Lütke, Beate; McFadden, Thomas; Meyer, Roland; Mooshammer, Christine; Müller, Stefan; Maquate, Katja; Norde, Muriel; Sauerland, Uli; Szucsich, Luka; Verhoeven, Elisabeth; Waltereit, Richard; Wolfsgruber, Anne; Zeige, Lars Erik  (2020) Register: Language Users’ Knowledge of Situational-Functional Variation In:  REALIS: Register Aspects of Language in Situation [DOI] [ViVo]
    The Collaborative Research Center 1412 “Register: Language Users’ Knowledge of Situational-Functional Variation” (CRC 1412) investigates the role of register in language, focusing in particular on what constitutes a language user’s register knowledge and which situational-functional factors determine a user’s choices. The following paper is an extract from the frame text of the proposal for the CRC 1412, which was submitted to the Deutsche Forschungsgemeinschaft in 2019, followed by a successful onsite evaluation that took place in 2019. The CRC 1412 then started its work on January 1, 2020. The theoretical part of the frame text gives an extensive overview of the theoretical and empirical perspectives on register knowledge from the viewpoint of 2019. Due to the high collaborative effort of all PIs involved, the frame text is unique in its scope on register research, encompassing register-relevant aspects from variationist approaches, psycholinguistics, grammatical theory, acquisition theory, historical linguistics, phonology, phonetics, typology, corpus linguistics, and computational linguistics, as well as qualitative and quantitative modeling. Although our positions and hypotheses since its submission have developed further, the frame text is still a vital resource as a compilation of state-of-the-art register research and a documentation of the start of the CRC 1412. The theoretical part without administrative components therefore presents an ideal starter publication to kick off the CRC’s publication series REALIS. For an overview of the projects and more information on the CRC, see https://sfb1412.hu-berlin.de/.
  • Meyer, Roland  (2020) Die tschechischen Wenkerbögen: Deutsch und seine Kontaktsprachen in der Dokumentation der Wenker-Materialien In:  Minderheitensprachen und Sprachminderheiten [ViVo]
  • Tikhonov, Aleksej; Meyer, Roland; Müller, K.  (2020) LiViTo: Linguistic and Visual features Tool for assisted analysis of historic manuscripts In:  Proceedings of the 12th Language Resources and Evaluation Conference [ViVo]
  • Szucsich, Luka  (2020) Burgenland Croatian as a Contact Language In:  Areal Convergence in Eastern Central European Languages and Beyond [ViVo]
  • Szucsich, Luka  (2020) Die burgenlandkroatischen Wenkerbögen: Deutsch und seine Kontaktsprachen in der Dokumentation der Wenker-Materialien In:  Minderheitensprachen und Sprachminderheiten [ViVo]
  • Buchmüller, Olga  (2020) Influence of a Speaker’s Visible Social Status on the Evaluation of Morphosyntax in Native Germans[ViVo]
    Previous rating studies have shown that deviations from the standard language can result in
    lower ratings of the speaker’s social status characteristics by listeners, compared to the
    social status ratings of speakers of standard language. It was also indicated that certain
    speech characteristics led to assumptions and expectations about speaker characteristics.
    Recent EEG studies have shown that the brain processes deviations from standard language
    differently, depending on the information given to the listener about the speaker. The state of
    research throws up the question whether characteristics indicating the social status of the
    speaker can influence the listener’s evaluation of his or her language characteristics.
    This master thesis investigates in a rating experiment whether there can be an effect
    on the rating of sentence grammaticality depending on the high or low social status
    indications of a speaker communicated by her or his clothing and posture. Participants see a
    picture of a person who represents either high or low social status through clothing and
    posture. During the image presentation, a sentence is presented auditorily. This sentence
    has either a standard or a deviant morphosyntactic structure. Finally, the respondent is asked
    to rate the sentence in terms of grammaticality on three different scales.
    The rating data analysis reveals a low influence of the social status factor on the
    evaluation of the grammaticality of sentences in terms of acceptability in the context given.
    Sentences with deviating morphosyntax are rated slightly higher in the low rather than in the
    high social context. Sentences with standard morphosyntax, on the other hand, are rated
    higher in the high than in the low social context. The social status factor seems not to
    influence the ratings of grammaticality in terms of morphosyntax and self-use likelihood.
    From this, we can deduce that the social status factor only influences certain aspects of
    grammaticality assessment.
    This framework can be used in a further development of the study to present the
    social status factor more concisely in order to achieve a more meaningful significance of the
    results. By measuring rating times, it is possible to examine the participation of both types of
    sentences in different contexts. Developing the present study into an EEG study can
    determine whether the brain's response to processing both types of sentences differs
    depending on the context of the picture.
  • Presentations

    2022

  • Buchmüller, Olga  (2022) Presentation in the framework of the seminar: Ausgewählte Sprachphänomene des Ukrainischen - Multilingualismus in der Ukraine. In:  Seminar HU Slawistik BA WiSe 2021/2022 [ViVo]
  • Buchmüller, Olga  (2022) Presentation in the framework of the seminar: Sprache und Intuition. Sprachphänomene im deutsch-slawischen Vergleich erschließen: Methoden der Korpuslinguistik. In:  Seminar HU Slawistik BA SoSe 2022  [ViVo]
  • Buchmüller, Olga  (2022) Presentation in the framework of the seminar: Sprache und Intuition. Sprachphänomene im deutsch-slawischen Vergleich erschließen: Experimentelles Arbeiten In:  Seminar HU Slawistik BA SoSe 2022  [ViVo]
  • Delucchi Danhier, Renate; Marklová, Anna  (2022) Spatial asymmetry of mental representation in Czech,German and Spanish In:  14. Deutschen Slavistiktags 2022 [ViVo]
  • 2021

  • Marklová, Anna  (2021) Vidět jazykem: Eye-tracking v lingvistickém výzkumu In:  Evropský den jazyků 2021 [ViVo]

Resources (re-)used by A03


Corpora


Czech corpus Koditex

Type: Corpus
Status: re-used
Details: [ViVo] [URL]
A synchronic, representative and reference 9‑million‑word corpus (excl. punctuation)
compiled for the purpose of conducting a multidimensional analysis (MDA) of Czech.

Zasina, A. J. – Lukeš, D. – Komrsková, Z. – Poukarová, P. – Řehořková, A.: Koditex: A corpus of diversified texts. Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague 2018. Available at WWW: www.korpus.cz
Used by: A03

Prestudy ”situational context” in Czech

Type: Corpus
Status: re-used
Details: [ViVo]
Ibex farm project (Zehr and Schwarz, 2018)
Used by: A03

Russian National Corpus

Type: Corpus
Status: used
Details: [ViVo] [URL]

The Russian National Corpus is a representative collection of texts in Russian, counting about 1,5 bln tokens and completed with linguistic annotation and search tools


Used by: A03

Software


ANNIS3

Type: Software publication
Status: used
Details: [ViVo] [DOI] [URL]

A web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.

Available from: https://corpus-tools.org/annis/
Documentation: https://corpus-tools.org/annis/documentation.html
Cite as: Krause, Thomas & Zeldes, Amir (2016): ANNIS3: A new architecture for generic corpus query and visualization. in: Digital Scholarship in the Humanities 2016 (31). http://dsh.oxfordjournals.org/content/31/1/118


Used by: A03, A06, B04, C04, C06, INF

HAL-Inria

Type: Software publication
Status: used
Details: [ViVo] [URL]
In this paper, we present, SALT, a framework for mapping heterogeneous linguistic formats from one another based on a model-based approach, i.e. independently of the actual formats in which the corresponding linguistic data is being expressed. While we describe the underlying concept of this framework, we identify how it echoes past ongoing standardisation activities within ISO committee TC 37/SC 4, and in particular, the possible conceptual equivalences with ISO CD 24612 (LAF) combined with ISO 24610-1 (FSR), as well as the possible role of the central data category registry (ISOCat), currently under deployment. We thus show the adequacy of our methodology and its capacity to integrate a wide range of possible linguistic annotation models.
Used by: A03

PennController for Internet Based Experiments (IBEX)

Type: Software publication
Status: used
Details: [ViVo] [URL]
PennController for Internet Based Experiments (“PennController” or “PCIbex” for short) provides the tools to build and run online experiments, from familiar paradigms like self-paced reading to completely custom-designed paradigms.
Used by: A03, A06, A07, C03

Tools rPraat and mPraat - Interfacing Phonetic Analyses with Signal Processing

Type: Software publication
Status: used
Details: [ViVo] [DOI]
The paper presents the rPraat package for R/mPraat toolbox for Matlab which constitutes an interface between the most popular software for phonetic analyses, Praat, and the two more general programmes. The package adds on to the functionality of Praat, it is shown to be superior in terms of processing speed to other tools, while maintaining the interconnection with the data structure of R and Matlab, which provides a wide range of subsequent processing possibilities. The use of the proposed tool is demonstrated on a comparison of real speech data with synthetic speech generated by means of dynamic unit selection.
Used by: A03, A06, C06

Contact