Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
Introduction and purpose: Language is the medium by which people interact with all aspects of their worlds, whether economics, health, the environment, or technology. In both development programs and technology, however, language is usually given secondary consideration, if any at all. As a result, people who do not speak a major language are excluded from full participation in development programs and from technologies such as ICTs that could enhance their economic and social circumstances. In Africa, for example, where only a small minority speaks English or French, few development programs have the resources to devote to the most basic of language considerations, such as translating health information into local languages. Language technology can be a fast and cost-effective way of overcoming knowledge and communication gaps that underlie many other aspects of the development agenda. Design and methods: The most efficient way to address language development is through public tools and vocabularies that can be reused, revised, and repurposed for multiple domains. We discuss a universal multilingual dictionary that is designed to build a parallel vocabulary of core concepts across languages, with a special focus on languages with few existing resources. The lexicons are built in close cooperation with local partners. Much attention is paid to a data structure that will enable downstream technologies. Further, a system develops domain-specific terminologies through a participatory process, so that complicated concepts can be communicated clearly and consistently. Data is made available to the public for free, with strong efforts to develop systems for access via least-cost technologies with the widest reach along the bottom of the pyramid. Results: When successful, a focus on core language development can improve the outcomes of many other projects. In health, for example, translation is often too expensive and too difficult, because basic resources such as dictionaries do not exist and technical terms do not have adequate local-language equivalents. For the one-time cost and effort of building the lexicons and terminologies, in conjunction with the free tools being created to access those vocabularies, the infrastructure opens for cheap and rapid translation of health material. Similarly, students are able to use the lexicons to access knowledge that has previously been blocked behind linguistic barriers, reducing future language-based inequalities. Participatory data collection methods mean that vocabularies continue to grow in response to the expressed needs of particular linguistic and development communities. The perpetual accessibility of the public resource means that development programs can make sustainable use of the data in multiple languages, for multiple purposes, with no further investment. Conclusion: Language is a hidden aspect of the development equation; language technology in itself does not cure a disease or put food on a table. However, whether communicating agricultural techniques, delivering government services, or performing numerous other activities that fall under the rubric of development, attention to developing language technologies for underserved language populations can be the difference between working together and talking past each other – the difference between failing to communicate and succeeding in expressing the path toward accomplishing common goals.
Guillaume André Fradji Martres
Sylvain Jean-François Harquel, Camille Bonnet