Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of Graph Search.
We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the is-a relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.
Tom Ian Battin, Hannes Markus Peter, Susheel Bhanu Busi, Grégoire Marie Octave Edouard Michoud, Leïla Ezzat, Massimo Bourquin, Tyler Joe Kohler, Stylianos Fodelianakis
Simon Nessim Henein, Florent Cosandier, Hubert Pierre-Marie Benoît Schneegans