Substantial increases in localisation automation in order to tackle the rapidly growing volume of material to be localised (in both EL and PL) can only be achieved through advances in the quality of machine translation (MT) and the concerted deployment of MT into the localisation workflow in the Next Generation Localisation Factory. For PL, MT is the only option as all material needs to be translated on-the-fly with no delay. Unlike in EL, this excludes human intervention in the translation process. Speech interfaces are crucial for PL to support mobile devices, to extract important personalisation-relevant information from spoken input and to generate personalised output. Because of the centrality of MT for both EL and PL, the ILT basic research track is firmly centred around MT research, with Speech Technology (ST) and Text Analytics (TA)2 tightly integrated into or directly serving MT.
Machine Translation (MT). A huge demand for MT exists already: web service providers process millions of requests for automatic translation every day. Until recently, the service offered by Google was powered by BabelFish, a version of the successful Systran system. However, this older rule based and hand-crafted technology is in the process of being replaced by a new generation of data driven and machine-learning-based MT (already for Arabic-English and Chinese-English). Most MT research today is corpus-based and such systems are gradually making their way to market, e.g. LanguageWeaver's SMT system. The demand for and potential impact of high quality MT is enormous. Translation quality is crucial in automated localisation workflows. MT translation quality will be further improved through fundamental advances resulting from combining EBMT and SMT paradigms, from the introduction of syntactic information in EBMT and SMT to better capture global reordering, and from fine-tuning machine-learning-based systems to text type and genre.
Speech Technologies (ST). Flexible, non-keyboard-dependent, on-the-move voice access and response is a core enabling technology for intelligent access to digital content. Speech interfaces to mobile devices are essential in eyes-busy, hands-busy scenarios. In the multilingual application scenario addressed by the Next Generation Localisation Factory, a tight integration of ST and MT is imperative to achieve optimal results. Speech carries information on multiple levels, e.g. gender and age is communicated by voice characteristics; prosody and voice quality carry crucial grammatical information; emotional state or mood is communicated by prosody and tone-of-voice; sound qualities and systematic patterns distinguish between native and non-native users. This information is available only to a very limited extent in a purely textual representation but is crucial for PL.
Text Analytics (TA). The Next Generation Localisation Factory defines two core tasks for Text Analytics: (i) automatic annotation of localisation data with metadata and (ii) text classification. Reliable automatic multilingual text classification is required to optimally tune suites of novel MT and ST systems to text type and genre in EL and PL workflows. Automatic labelling is required to annotate multilingual input with standardised metadata to automate localisation workflows and to annotate multilingual corpora with dependency information to induce novel probabilistic transfer-based MT systems and to provide syntactic information for syntax-boosted SMT and EBMT systems for both EL and PL.
The specific objectives of the Integrated Language Technologies basic research track are: