Dr. Thomas Krause

Humboldt-Universität zu Berlin

Institut für deutsche Sprache und Linguistik

Projects

INF Data management, modeling and exploration

Contact

Humboldt-Universität zu Berlin, Dorotheenstraße 24, 10117 Berlin

+49 (0)30 2093 9720

Website https://orcid.org/0000-0003-3731-2422

Publications & Presentations

    Publications

  • Krause, Thomas; Krause, Thomas  (2023) Hexatomic: An extensible, OS-independent platform for deep multi-layer linguistic annotation of corpora  In: Journal of Open Source Software [DOI] [ViVo]
  • Guescini, Rolf Borgen; Krause, Thomas; Krause, Thomas; Odebrecht, Carolin; Guescini, Rolf Borgen  (2020) Laudatio Repository - Long-term Access and Usage of Deeply Annotated Information Docker Images [DOI] [ViVo]
    This is the dockerized images of the Laudatio Repository software described by the following: The management and archiving of digital research data is an overlapping field for linguistics, library and information science (LIS) and computer science: The departments of Corpus Linguistics and the Computer and Media Service (CMS) at Humboldt-Universität zu Berlin and The National Institute for Research in Computer Science and Control (INRIA France) are project partners cooperating with the Berlin School of Library and Information Science (BSLIS). LAUDATIO has developed an open access research data repository for historical corpora. For the access and (re-)use of historical corpora, the LAUDATIO repository uses a flexible and appropriate documentation schema with a subset of TEI customized by TEI ODD. The extensive metadata schema contains information about the preparation and checking methods applied to the data, tools, formats and annotation guidelines used in the project, as well as bibliographic metadata, and information on the research context (e.g. the research project). To provide complex and comprehensive search in the annotation data, the search and visualization tool ANNIS is integrated in the LAUDATIO-Repository.
  • Kutscher, Silvia; Alexiadou, Artemis; Adli, Aria; Donhauser, Karin; Dreyer, Malte; Egg, Markus; Feulner, Anna Helene; Gagarina, Natalia; Hock, Wolfgang; Jannedy, Stefanie; Kammerzell, Frank; Knoeferle, Pia; Krause, Thomas; Krause, Thomas; Krifka, Manfred; Lüdeling, Anke; Maquate, Katja; McFadden, Thomas; Meyer, Roland; Mooshammer, Christine; Lütke, Beate; Müller, Stefan; Norde, Muriel; Sauerland, Uli; Szucsich, Luka; Verhoeven, Elisabeth; Waltereit, Richard; Wolfsgruber, Anne; Adli, Aria  (2020) Register: Language Users’ Knowledge of Situational-Functional Variation  In: REALIS: Register Aspects of Language in Situation [DOI] [PDF] [ViVo]
    The Collaborative Research Center 1412 “Register: Language Users’ Knowledge of Situational-Functional Variation” (CRC 1412) investigates the role of register in language, focusing in particular on what constitutes a language user’s register knowledge and which situational-functional factors determine a user’s choices. The following paper is an extract from the frame text of the proposal for the CRC 1412, which was submitted to the Deutsche Forschungsgemeinschaft in 2019, followed by a successful onsite evaluation that took place in 2019. The CRC 1412 then started its work on January 1, 2020. The theoretical part of the frame text gives an extensive overview of the theoretical and empirical perspectives on register knowledge from the viewpoint of 2019. Due to the high collaborative effort of all PIs involved, the frame text is unique in its scope on register research, encompassing register-relevant aspects from variationist approaches, psycholinguistics, grammatical theory, acquisition theory, historical linguistics, phonology, phonetics, typology, corpus linguistics, and computational linguistics, as well as qualitative and quantitative modeling. Although our positions and hypotheses since its submission have developed further, the frame text is still a vital resource as a compilation of state-of-the-art register research and a documentation of the start of the CRC 1412. The theoretical part without administrative components therefore presents an ideal starter publication to kick off the CRC’s publication series REALIS. For an overview of the projects and more information on the CRC, see https://sfb1412.hu-berlin.de/.
  • Presentations

  • Krause, Thomas; Krause, Thomas  (2023) The four elements of achieving research software sustainability for long tail projects  In: deRSE23 - Conference for Research Software Engineering in Germany [ViVo]
    At deRSE19 we presented Hexatomic, a project to investigate what small research software projects need to make their software more sustainable. This talk reports our results. Developing annotation software for multi-layer linguistic corpora, we found that there is a minimal infrastructure that needs to be in place to activate a potential for sustainability in the first place, and that four elements play key roles in reducing the risk of software collapse. These four elements are: a clearly defined, resourced maintainer role; multi-modal documentation to guide maintenance and use; automated tests; code review and triage processes supported by static code analysis. We have tested this by performing changes in maintainership and observing success in reviving development and maintenance activities. Our results furthermore clearly point to the necessity for RSEs to be involved in software projects as part of research projects. In this talk, we describe our research project, the minimal infrastructure as well as development, maintenance, and release and publication workflows we implemented, and how the four elements helped maintainers take up their roles.
  • Krause, Thomas; Krause, Thomas  (2023) The four elements of achieving research software sustainability for long tail projects  In: deRSE23 - Conference for Research Software Engineering in Germany [ViVo]
    At deRSE19 we presented Hexatomic, a project to investigate what small research software projects need to make their software more sustainable. This talk reports our results. Developing annotation software for multi-layer linguistic corpora, we found that there is a minimal infrastructure that needs to be in place to activate a potential for sustainability in the first place, and that four elements play key roles in reducing the risk of software collapse. These four elements are: a clearly defined, resourced maintainer role; multi-modal documentation to guide maintenance and use; automated tests; code review and triage processes supported by static code analysis. We have tested this by performing changes in maintainership and observing success in reviving development and maintenance activities. Our results furthermore clearly point to the necessity for RSEs to be involved in software projects as part of research projects. In this talk, we describe our research project, the minimal infrastructure as well as development, maintenance, and release and publication workflows we implemented, and how the four elements helped maintainers take up their roles.