Rule-based Annotation Tools for Modern Standard Arabic

Rule-based Annotation Tools for Modern Standard Arabic, an IRCSET EMPOWER Initiative-funded postdoctoral fellowship, is a 2 year project (December 2009 - December 2011) with 1 academic partner (Dublin City University).

In this research Dr. Mohammed Attia proposes the development of a suite of annotation tools for unrestricted Modern Standard Arabic (MSA) text using Finite State Morphology (FSM) and Constraint Grammar (CG) formalisms including morphological analysis, lemmatization, tokenization, part-of-speech (POS) tagging and partial parsing. In order to develop these tools, a representative corpus of MSA texts will be created, and a gold standard will be manually annotated for development and evaluation purposes. His work can support deep parsing of free text, as for example with the ATB-based LFG grammars which need a lot of morphological information to generate the required f-structure annotations. Deep parsing is required for meaning sensitive applications that analyse search queries, index documents and general semantic representations.

Please contact

for further information on this project.

All comments are submitted to the feedback forum in the members area.