Publication

Crosslingual Topic Modeling with WikiPDA

Robert West, Tiziano Piccardi
2021
Conference paper
Abstract

We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics. It leverages the fact that Wikipedia articles link to each other and are mapped to concepts in the Wikidata knowledge base, such that, when represented as bags of links, articles are inherently language-independent. WikiPDA works in two steps, by first densifying bags of links using matrix completion and then training a standard monolingual topic model. A human evaluation shows that WikiPDA produces more coherent topics than monolingual text-based latent Dirichlet allocation (LDA), thus offering crosslinguality at no cost. We demonstrate WikiPDA's utility in two applications: a study of topical biases in 28 Wikipedia language editions, and crosslingual supervised document classification. Finally, we highlight WikiPDA's capacity for zero-shot language transfer, where a model is reused for new languages without any fine-tuning. Researchers can benefit from WikiPDA as a practical tool for studying Wikipedia's content across its 299 language editions in interpretable ways, via an easy-to-use library publicly available at https://github.com/epfl-dlab/WikiPDA.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (17)
Wikipedia
Wikipedia is a free-content online encyclopedia written and maintained by a community of volunteers, collectively known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history, and has consistently been one of the 10 most popular websites. Created by Jimmy Wales and Larry Sanger on January 15, 2001, it is hosted by the Wikimedia Foundation, an American nonprofit organization.
Knowledge base
A knowledge base (KB) is a set of sentences, each sentence given in a knowledge representation language, with interfaces to tell new sentences and to ask questions about what is known, where either of these interfaces might use inference. It is a technology used to store complex structured data used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems. The original use of the term knowledge base was to describe one of the two sub-systems of an expert system.
Knowledge graph
In knowledge representation and reasoning, knowledge graph is a knowledge base that uses a graph-structured data model or topology to integrate data. Knowledge graphs are often used to store interlinked descriptions of entities - objects, events, situations or abstract concepts - while also encoding the semantics underlying the used terminology. Since the development of the Semantic Web, knowledge graphs are often associated with linked open data projects, focusing on the connections between concepts and entities.
Show more
Related publications (22)

The Facets of Intangible Heritage in Southern Chinese Martial Arts: Applying a Knowledge-Driven Cultural Contact Detection Approach

Yumeng Hou

Investigating the intangible nature of a cultural domain can take multiple forms, addressing for example the aesthetic, epistemic and social dimensions of its phenomenology. The context of Southern Chinese martial arts is of particular significance as it c ...
2023

Building a Knowledge Graph of Chinese Kung Fu Masters From Heterogeneous Bilingual Data

Yumeng Hou, Lin Yuan

Various endeavours into semantic web technologies and ontology engineering have been made within the organisation of cultural data, facilitating public access to digital assets. Although models for conceptualising objects have reached a certain level of ma ...
2023

An Ontology-based Engineering system to support aircraft manufacturing system design

Jinzhi Lu, Xiaochen Zheng

During the conceptual design phase of an aircraft manufacturing system, different industrial scenarios need to be evaluated against performance indicators in a collaborative engineering process. Domain experts' knowledge and the motivations for decision-ma ...
ELSEVIER SCI LTD2023
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.