09.01.2023
14:00 Uhr
-
15:30 Uhr
Mohrenstraße 40, room 415, Berlin-Mitte
Genres, registers and text functions
Talk by Dr. Serge Sharoff (University of Leeds)
An important aspect of the Digital Humanities is about using Big Data. Web corpora measuring in billions of words, for example, Common Crawl, provide a good window for „looking into a lot of language“ simply because they offer much more data in comparison to traditional national corpora. However, Web corpora lack curated categories, even though they contain texts varying with respect to their functions (for example, texts providing reference information, news reporting or expressing opinions), with respect to their difficulty (for reading by lay public or experts, translators or language learners) or sociodemographic profiling (for age or education). Interpretability of Deep Learning models is the key to understanding that they make the right decisions for the right reasons. While topic-related text classification tasks rely on the use of keywords, I will show a way to interpret the decisions of non-topical classification models using stylistic features.