Panacea: Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies

Panacea: Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies, a 3-year (Jan 2010 – Dec 2012) EU-FP7-funded STREP, comprises 5 academic partners (UPF (Coordinator, Barcelona), CNR-ILC (Pisa), ILSP (Athens), Cambridge University, DCU) and 2 industrial partners (Linguatec (Germany), ELDA (Paris)).

A strategic challenge for Europe in today's globalised economy is to overcome language barriers through technological means. In particular, MT systems are expected to have a significant impact on the management of multilingualism in Europe, making it possible to translate the huge quantity of (written or oral) data produced to cover the needs of hundreds of millions of citizens.

PANACEA addresses the most critical aspect for MT: the so-called language-resource bottleneck. Although MT technologies may consist of language independent engines, they depend on the availability of language-dependent knowledge for their real-life implementation, i.e., they require Language Resources. In order to supply MT for every pair of European languages, for every domain, and for every text genre, appropriate language resources covering all these aspects should be found, processed and supplied to MT developers. These should be provided in the format and with the information demanded by their systems. At present, this is mostly done by hand.

PANACEA aims to build a factory of Language Resources that progressively automates the stages involved in the acquisition, production, updating and maintenance of language resources required by MT systems and in the time required. This automation will cut down the cost, time and human effort significantly. These reductions of costs and time, which will be assessed, are the only way to guarantee sustainability in the supply of the Language Resources that Machine Translation and other Language Technologies will be demanding in the multilingual Europe. Evaluation in an industrial scenario will be the proof of concept of the benefits offered by PANACEA and its potential impact.

In order to address these issues, PANACEA will work around four main pillars: 1) the creation of a language resource production platform, which will be designed as a dedicated workflow manager for the composition of a number of technologies/tools/systems based on combinations of different web services; 2) the automatic production of massive amounts of LRs for MT and other Language Technologies exploiting this platform and the registered tools, 3) the evaluation of this platform and the LR production chain within the framework of both R&D and industrial settings, and 4) the validation of the resources produced within the project for the selected languages and scenarios.

PANACEA will incorporate different technology components that will make possible a step-by-step automation of the whole process of producing LRs. The success of the project will be measured in two ways: (i) via a comparison of time and costs saved with PANACEA in comparison with a current industrial scenario; (ii) with a quantitative evaluation comparing automatically acquired resources with gold-standard manually prepared Language Resources in order to evaluate the usability of the produced resources.

Please contact Prof. Andy Way for further information on this project.

All comments are submitted to the feedback forum in the members area.