Research Overview
Three Global Challenges to Localisation
Localisation is the process of adapting digital content to culture, locale and linguistic environment. Localisation brings products and services to markets that are otherwise inaccessible. Because of this, localisation is a core multiplier and value-adding component of the global software, services, manufacturing and content distribution industry. Currently, there are three massive challenges facing localisation:
Volume:
The amount of content that needs to be localised into ever more languages is growing steadily and massively outstrips current translation and localisation capacities. As a consequence, only a fraction of the content that needs to be localised is localised and usually only into a limited set of languages. Many business opportunities are missed and, what is more, lack of localisation contributes to the digital divide, with essential (e.g. health and hygiene) information, products and services unavailable in languages which currently do not promise ROI on localisation costs.
Access:
Traditionally, localisation assumes print or full screen- and keyboard-based access to content. More recently however, new and evolving generations of small devices (smart phones and PDAs) support on-the-move and instant access to digital content. Novel interaction modalities such as speech-enabled access are not supported by current localisation technologies. Traditional localisation workflows assume predictable, stable, corporate content and localisation is viewed as a well-managed, large-scale, off-line process. Today, however, much digital content is perishable with frequent updates and rapidly increasing volumes of user-generated content (user fora, blogs etc.). Instant access to on-line content requires a new breed of fully automated on-line localisation technologies.
Personalisation:
Traditionally, localisation is coarse-grained according to generic notions of locales and linguistic environments. What is localised is information. Information is most valuable if adapted to personal requirements including task at hand, level of expertise, age-group and personal preferences and expectations. Traditional localisation needs to be overlaid and integrated with fine-grained personal information cutting across traditional notions of locale and linguistic environment: the person is the ultimate locale.
Conceptually, we represent the three challenges in terms of a localisation cube:

Current state-of-the-art localisation technologies instantiate large and well-managed localisation workflows, targeting the lower, front-right part of the localisation cube, with large parts of the cube remaining unaddressed.
Addressing the Challenges: The CNGL Research Strategy
The challenge is to develop next-generation localisation technologies and processes that allow us to address any point in the space defined by the localisation cube, at configurable speed and quality, realising the CNGL vision to enable people to interact with content, products and services in their own language, according to their own culture, and according to their own personal needs. In order to overcome the combined challenges of volume, access and personalisation, the CNGL research programme is structured as follows:

The programme intertwines four research tracks: to a first approximation, two of them, Integrated Language Technologies (ILT) and Digital Content Management (DCM) are basic research tracks, and the remaining two, Next Generation Localisation (LOC) and Systems Framework (SF) are more applied, integrating research tracks.
Integrated Language Technologies (ILT)
ILT focuses on Machine Translation (MT), improving upon current MT technologies through integration of syntactic information in both SMT and example-based MT, the development of novel hybrid MT systems, automatic domain adaptation, novel MT evaluation methods and investigating the impact of controlled language on MT. ILT features a Speech Technology component, closely intertwined with the MT research, to develop Speech Technologies that are less language dependent and can be adapted more easily to multilingual applications and tightly coupled Speech-MT systems where the Speech system can profitably use information provided by the MT system and vice-versa. ILT features a Text Analytics component focusing on automatic annotation of localisation relevant meta-data, text classification (to e.g. support domain tuning of MT) and dependency annotation (to e.g. support syntax-enhanced MT).
Digital Content Management (DCM)
DCM focuses on combining Adaptive Hypermedia (AH) with Information Retrieval (IR) technologies to support the CNGL personalisation agenda in a multilingual setting. In order to achieve its objectives, DCM concentrates on automatic acquisition of domain information and shallow subject ontologies from raw text, as manual construction is time consuming, expensive and difficult to scale. As information queries are often the starting points of an interaction with digital content, DCM focuses on query expansion and optimisation in multi-lingual contexts. Content needs to be sliced and recomposed to deliver personalised information responses. DCM investigates novel methods based on insights from AH and IR for personalised multi-lingual information access and delivery.
Localisation (LOC)
The technological advances from ILT and DCM need to be integrated into the workflows of the Next Generation Localisation Factory. In order to achieve optimal integration, LOC researches the whole life-cycle of digital content, including content development and design for internationalisation. Standards are a crucial factor in achieving reusable and modular components in localisation workflows, and ensure that localisation-relevant information can be exploited optimally by those components. Sophisticated language and digital content management technologies need to be evaluated and integrated into workflows and combined with existing localisation technologies (such as Translation Memories (TMs) and Terminology Management Systems) and human pre- and post-processing, including crowdsourcing. Finally, LOC develops the blue-prints for the Next Generation Localisation Factory, which will be able to respond flexibly to localisation requirements addressing different points in the localisation cube at configurable speed and quality.
Systems Framework (SF)
To date, software engineering aspects of complex language and digital content management technology based systems are underexplored. The Next Generation Localisation Factory will be highly modular and adaptive with easily and on the fly reconfigurable workflows. SF investigates rapid prototyping systems and designs supporting adaptive workflows, using web-based service architectures. User interfaces are a crucial component in such systems and novel interfaces need to be developed (to e.g. optimally support post-editing MT output). Finally, SF coordinates and implements the development of an evolution of CNGL demonstrator systems.


