Preparation and Depositing









Diskursanalytische Fragestellungen mithilfe des DeReKo (Teil 1)

Im ersten Teil dieses Screencasts erklärt Simon Meier, Mitglied der CLARIN F1, wie für diskursanalytische Fragestellungen das Deutsche Referenzkorpus (DeReKo) in COSMAS II mit der integrierten Funktion der Themenannotation genutzt werden kann. Hier (folgt in Kürze) geht es zu Teil 2. 

Als Beispiel wählt Meier populärwissenschaftlichen Diskurs in denen Formulierungen wie 

  • heute weiß man
  • wir wissen heute

genutzt werden. 

Die These lautet: 

Im populärwissenschaftlichen Diskurs als medial vermittelter Experten-Laien-Kommunikation sind wir wissen heute, heute weiß man usw. formelhafte Mittel zur Präsentation von Wissen und Wissensfortschritt.  

Die dazugehörige Forschungsfrage:

Sind die Formulierungen tatsächlich typisch für populärwissenschaftliche Diskurse?

Seine Vorgehensweise: 

Um seine Forschungsfrage zu beantworten und die These zu bestätigen bedient sich Meier am DeReKo, auf das mithilfe der Webanwendung COSMAS II zugegriffen werden kann. Hierfür wählt er nach einem login in COSMAS II das Archiv w2, welches Pressetexte aus (Lokal-)Zeitungen enthält aus und wählt daraus alle öffentlichen Korpora aus.

Nun kann nach die Suche starten. Er gibt den Suchbegriff, beziehungsweise den Suchbegriff wir wissen heute in die Suchmaske ein und wählt Wortabstand /^w1, sodass ausschließlich und exakt diese Anfrage gesucht wird. Lässt man sich nun alle Suchergebnisse anzeigen, kommen 763 Treffer zusammen, welche zunächst nach Quellen und alphabetisch sortiert sind. Da Meier interessiert, ob der Suchbegriff in populärwissenschaftlichen Diskursen besonders häufig vorkommt, wählt er die Sortierung nach Themen in absteigender Reihenfolge der Häufigkeit der Treffer in diesen aus. Dieser Schritt ist möglich, da jeder Text, der in das DeReKo eingespeist wird, eine automatisierte Themenannotation durchläuft, die auf Grundlage von Wortverteilungen abschätzt, welchem Thema ein Text vermutlich entstammt.

Die Ergebnisse:

Geht man nun ans Ende der Liste ist zu sehen, dass Texte aus Wissenschaft, mit Unterthema Populärwissenschaft, die zweitmeisten Treffer haben.

Read more

WebLicht Tutorial

This video tutorial shows one of multiple ways how you can use WebLicht. WebLicht is a web application provided by CLARIN-D that allows you to build toolchains for linguistic annotations on different layers like Morphology, Syntax or Named Entity Recognition.

To get started you have to log in via your CLARIN- or any other university affiliation account. After clicking the Start-Button and selecting Easy Mode which supplies you with a pre-defined toolchain, you can either analyze text that you directly type in or copy-paste to the corresponding window, use a sample text provided by WebLicht or upload a text file. Now you can select your preferred layer of annotation and hit run to get a detailed analysis for the selection you have made. It is then possible to download the complete file or parts of it as .csv or .xml.

In case the pre-defined toolchain does not satisfy your needs, you can also switch to the Advanced Mode where you can build your own customized toolchain. You can always refer to the Helpdesk if you have any questions, suggestions, or problems that you want to report. 

Read more

How to use COALA

COALA is a tool to convert simple text tables into CLARIN Metadata-Files (CMDI) for multimodal corpora. If you want to learn more about CMDI in general we refer to this page. As many other CLARIN tools, COALA is a free web service that can be found on the website of the Bavarian Archive for Speech Signals (BAS)

To get your CMDI you just have to upload your files, give your corpus a name and a title and hit the green COALA button to convert your files. Within a few seconds you can download your zipped file which contains all the metadata for your corpus.   

If you are encountering any problems, you first might want to check the logging messages at the top to see whether something went wrong. If you have further problems or questions there is a detailed description of the web service along with templates that you can download to see example inputs. Maybe there is something wrong with your tables? In case you can't find what you are looking for there is also the possibility to get help at the Helpdesk.

Read more

Online Perception Experiments with Percy

What is it?

Percy is a device-independent tool to perform online perception experiments. Researchers can learn something about spoken language via setting up an experiment design where participants listen to audio stimuli and can give their judgment about it afterwards.

For Whom is it?

Percy is a tool that can be used by researchers who want to know something about spoken language but it is also quite interesting for the participants as they can give judgements about the stimuli and manipulate them.  

And the Details?

To define an experiment design a researcher first has to think about what stimuli, input options and questions he or she wants to present to the participant. The researcher can chose among three options for setting up the experiment design: He or she can (1) use the inbuilt editor, (2) use the default user interface or (3) contact for a more advanced experiment design. There is also the possibility to choose from a set of experiments that were already conducted. 

Read more

How to use WebMAUS

This video tutorial about WebMAUS - the Munich AUtomatic Segmentation explains how you can easily generate a textgrid file that aligns an audio signal to a transcription out of the application. If you want to learn more about WebMAUS in general click here. The procedure to receive the textgrid is quite simple. You just need your text file containing the transcription and your corresponding audio file with spoken language and feed it into the application via drag-and-drop (careful! the files need to have the same name.

After this step, a menu drops down where you can select your preferences and hit the 'run' button. After a few seconds, WebMAUS has created a textgrid for you which you can download and open in PRAAT along with your audio file and check where WebMAUS has segmented your file and further process it. 

Read more

WebMAUS Introduction

This video tutorial gives a brief introduction to the Munich AUtomatic Segmentation -- or WebMAUS. It is a tool to align speech signals to linguistic categories which makes it, amongst other things, possible to align the audio signal of a video to its transcript. As input, WebMAUS needs a video signal and some kind of a transcription of the spoken text. 

To get the actual output, the input text first needs to be normalized. With the Balloon tool, the expected pronunciation is created in SAMPA (a phonetic alphabet). In a next step, all other possible variants of pronunciation are made along with their probability. All those other possible pronunciations are visualized in a probabilistic graph where finally WebMAUS searches for the path of phonetic units that have truly been spoken. The outcome is a transcript of the real pronunciation along with its segmentation. 

There is an open source download and a web application. The usage is free for all academic members of Europe.

Read more

Learning DH and Networking across Europe with CLARIN

Learning DH and Networking across Europe with CLARIN

Take 70 international young scholars in the digital humanities (DH), 11 different classes taught by experienced experts, a couple of presentations by scholars showing their work from various DH subfields, add a social program with excursions to museums and sites of culture: Voilà. In the summer 2017, "Culture & Technology" - The European Summer University in Digital Humanities (ESU) was an excellent venue for scholars to learn about and practice DH methods, expand their horizon to different research questions in the DH and create international networks of expertise.

Existing tools and data sets were utilized to demonstrate usecases and to work on classroom projects based on participants’ interests. CLARIN, as a major contributor to the DH infrastructure in Europe, strongly supported these activities by sponsoring classes related to the CLARIN services, which provide tools, data sets, and workflows.

ESU 2017 organizers: Elisabeth Burr and her team organized the summer school at Leipzig University
ESU 2017 organizers: Elisabeth Burr and her team organized the summer school at Leipzig University

Organized by an enthusiastic team around Elisabeth Burr, the summer school, which was established at the University of Leipzig, Germany in 2009, was again cosponsored by CLARIN – besides receiving funding from Leipzig University, the German Academic Exchange Service (DAAD), and other national and international institutions.  This allowed about 70 participants from almost all over the world to take part in the proceedings of the summer school, including intensive courses in small groups applying DH methods and working on research questions. From Russia to the USA, with the majority of participants coming from European countries ranging from Bulgaria to France, the summer school was an international networking event for young scholars and international experts in DH.

Participants of ESU 2017 listening to a presentation on an international art project
Participants of ESU 2017 listening to a presentation on an international
Read more

ESU European Summer School for Digital Humanities, Leipzig 2015

The European Summer University in Digital Humanities has brought together Digtal Humanities students and researchers, to discuss different topics and to learn about new methods. CLARIN-D a research infrastructure for the Digital Humanities, wich works with language data, was part of the Summer School.

This clip shows interviews with participants, scholars and organizers of the summer school.

Read more