CLARIN-D Blog

BAS WebServices are awarded a Google research prize of $5000

The CLARIN-D centre “Bavarian Archive for Speech Signals” (BAS) in Munich applied for a Google Research Credit Grant and was honoured with the award. From now on, a greater amount of processing power for the automatic transcription of audio-visual speech data will be made available for the users. We congratulate Florian Schiel and his colleagues at the BAS on being awarded the Google research prize, which includes a prize money of 5,000 dollars. The funding started on January 1st, 2020.

What are Google Research Credit Grants?

The US-based technology company Google has got a large division called “Google Cloud Processing” (GCP), which offers several KI-applications as (fee-based) web services. In this division, the programme “GCP Education” is located. This programme allows students to apply for so-called “credits”, which can be used for GCP applications. Researchers, too, can apply for small grants (so-called “research credits”) with their project proposal. There is no application deadline, as applications can be handed in on a running basis. Further information on the application process can be found here: https://edu.google.com/programs/credits/faqs/?modal_active=none#research-credits 

Usage Scenarios of Google Services at the BAS

The BAS WebServices use, among other things, the Google Cloud Automatic Speech Recognition for the fully-automated annotation of audio-visual data (cf. services “ASR” and “Pipeline”). To provide their users with more processing power in this field, Florian Schiel applied for a Google “research credit grant”. He decided to put a thematic focus on the development of the BAS WebServices and the integration of Google Cloud applications.

BAS users benefit from the award

Since the awarding on January 1st, 2020, users of the BAS WebServices have been able to use about 1,7 million seconds of automatic transcription each

Read more

How to use COALA

https://youtu.be/yB090931YdM

COALA is a tool to convert simple text tables into CLARIN Metadata-Files (CMDI) for multimodal corpora. If you want to learn more about CMDI in general we refer to this page. As many other CLARIN tools, COALA is a free web service that can be found on the website of the Bavarian Archive for Speech Signals (BAS)

To get your CMDI you just have to upload your files, give your corpus a name and a title and hit the green COALA button to convert your files. Within a few seconds you can download your zipped file which contains all the metadata for your corpus.   

If you are encountering any problems, you first might want to check the logging messages at the top to see whether something went wrong. If you have further problems or questions there is a detailed description of the web service along with templates that you can download to see example inputs. Maybe there is something wrong with your tables? In case you can't find what you are looking for there is also the possibility to get help at the Helpdesk.

Read more

How to use WebMAUS

https://youtu.be/G-TVDx5KQBs

This video tutorial about WebMAUS - the Munich AUtomatic Segmentation explains how you can easily generate a textgrid file that aligns an audio signal to a transcription out of the application. If you want to learn more about WebMAUS in general click here. The procedure to receive the textgrid is quite simple. You just need your text file containing the transcription and your corresponding audio file with spoken language and feed it into the application via drag-and-drop (careful! the files need to have the same name.

After this step, a menu drops down where you can select your preferences and hit the 'run' button. After a few seconds, WebMAUS has created a textgrid for you which you can download and open in PRAAT along with your audio file and check where WebMAUS has segmented your file and further process it. 

Read more

WebMAUS Introduction

https://youtu.be/7lI-gOShtFA

This video tutorial gives a brief introduction to the Munich AUtomatic Segmentation -- or WebMAUS. It is a tool to align speech signals to linguistic categories which makes it, amongst other things, possible to align the audio signal of a video to its transcript. As input, WebMAUS needs a video signal and some kind of a transcription of the spoken text. 

To get the actual output, the input text first needs to be normalized. With the Balloon tool, the expected pronunciation is created in SAMPA (a phonetic alphabet). In a next step, all other possible variants of pronunciation are made along with their probability. All those other possible pronunciations are visualized in a probabilistic graph where finally WebMAUS searches for the path of phonetic units that have truly been spoken. The outcome is a transcript of the real pronunciation along with its segmentation. 

There is an open source download and a web application. The usage is free for all academic members of Europe.

Read more