Workshops

06/20/2024 -
06/21/2024 Mohrenstraße 40/41, room 415

Methods in Historical Corpus Building

The workshop features aspects of corpus building (sampling, architecture, pipeline, digitization, OCR), annotation (conception, tagset design, tagging, parsing) and corpus use (search, re-use, re-annotation), throwing a spotlight on a number of historical languages (Old High German, Old Lithuanian, Early New High German, Belarusian) and corpora (RIDGES, Referenzkorpus Altdeutsch, PosTiMe, SLIEKKAS, Lutherkorpus).
In the designated interactive slots there will be plenty of opportunities to discuss your own data issues, try out software and/or adapt presented state-of-the-art techniques to your own research.

Invited Speakers

  • Dr. Loïc Boizou (Universität zu Köln)
  • Prof. Dr. Jolanta Gelumbeckaitė (Goethe-Universität Frankfurt am Main)
  • Ercong Nie (Ludwig-Maximilian-Universität München)

Program

20.06.2024 
  • 9:00 – 9:30:    opening, introduction to project context (B04)
  • 9:30 – 10:15:   Martin Klotz: Introduction to corpora, research questions, terminology, and corpus infrastructure
  • 10:15 – 10:45: coffee break
  • 10:45 – 11:30: Loïc Boizou: How to build an NLP pipeline on free tools for relatively under-resourced languages with available textual resources
  • 11:30 – 13:00: interactive session, focus on corpus building
  • 13:00 – 14:00: lunch break
  • 14:00 – 15:00: Ercong Nie: Corpus annotation
  • 15:00 – 15:30: coffee break
  • 15:30 – 17:00: interactive session, focus on (semi-)automatic annotation
  • 17:00 – 18:00: wrap-up
  • 19:00                conference dinner (not included)
21.06.2024
  • 9:00 – 10:30: Anke Lüdeling, Thomas Krause: Introduction to the RIDGES corpus
  • 10:30 – 11:00: coffee break
  • 11:00 – 12:00: Jolanta Gelumbeckaitė: SLIEKKAS – Developing a standard tagset for Old Lithuanian
  • 12:00 – 13:00: formation of working groups; topics of interest such as flexible corpus (re)use, finding data, Toolbox annotation…
  • 13:00 – 14:00: lunch break
  • 14:00 – 17:30: discussion, coaching, task-solving within the working groups
  • 17:30 – 18:00: wrap-up, final discussion

 

Registration

Please e-mail your name, affiliation and (if applicable) a short description of your project (research project, PhD project, student project) to Gohar Schnelle (gohar.schnelle@hu-berlin.de).

If you already have concrete ideas, you are welcome to give further information on your own data-based research, so that we can tailor the discussion to your specific issues:

  • a short characterisation of the data, like: 
    • language(s)
    • data type (e.g. texts, text length, text type, historical source [handwritten, printed] etc.)
    • data formats (e.g. spreadsheet, txt, xml etc.)
    • data size 
    • data complexity
  • software you use, or would like to use
  • specific questions, topics or problems you would like to address

 

We look forward to hearing from you!

The organising committee,
Mortimer Drach (B04)
Anna Helene Feulner (B04)
Jürg Fleischer (B04)
Martin Klotz (INF)
Thomas Krause (INF)
Gohar Schnelle (B04)
Lars Erik Zeige (B04)