Multiple-Level Annotation for a Gǝʿǝz (Classical Ethiopic) Diachronic Corpus

Cristina Vertan University of Hamburg

Classical Ethiopic (Gǝʿǝz) plays a non-secondary role within the literatures of Eastern Christianity. Documents in Gǝʿǝz are sometimes the solely extant witnesses to lost Greek and/or Arabic texts. Thus, besides its intrinsic signficance, Gǝʿǝz texts are also of broader importance.
Basic linguistic resources and tools enabling a computer-aided investigation of Gǝʿǝz, its particularities and changes over the time, are still missing.
The particular writing system (a syllabary of its own), the complex no-concatenative morphology and various specificities in syntax make practically impossible the adaptation of ready-at.hand resources and annotation tools.
In this contribution, we will present an innovative annotation tool, which enables:

  • Multiple levels of annotation
  • Corrections in the text during the annotation
  • Synchronic processing of original script and transliteration
  • Annotation at smaller linguistic units than the classical token (string separated by two white spaces).

The architecture of the tool considers each unit to be annotated as an object containing heterogeneous information like the graphical representation in various forms (original form, transliteration), morphological annotation, and links to authority lists or external resources. Handling of larger corpora is ensured through nested indexing.
The contribution will be accompanied by a demonstration of the tool.
The development of the tool is realized within the ERC-project: “TraCES – From Translation to Creation: Changes in Ethiopic Style and Lexicon from Late Antiquity to the Middle Ages” (, conducted at the University of Hamburg, Hiob Ludolf Centre for Ethiopian Studies, at the Asien-Afrika-Institut.