Thematic Annotation: extracting concepts out of documents
Graph Chatbot
Chat with Graph Search
Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.
DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.
The Idiap NLP Group has participated in both DiscoMT 2015 sub-tasks: pronoun-focused translation and pronoun prediction. The system for the first sub-task combines two knowledge sources: gram matical constraints from the hypothesized coreference links, and ...
The bag-of-words (BOW) model is the common approach for classifying documents, where words are used as feature for training a classifier. This generally involves a huge number of features. Some techniques, such as Latent Semantic Analysis (LSA) or Latent D ...
This paper addresses the problem of keyword extraction from conversations, with the goal of using these keywords to retrieve, for each short conversation fragment, a small number of potentially relevant documents, which can be recommended to participants. ...
Shared controlled vocabularies are a prerequisite for collaborative annotation and semantic interchange. The creation and maintenance of such vocabularies is, however, time-consuming and expensive. The diversity of research questions in the humanities make ...
A concept map is a node-link diagram showing the semantic relationships among concepts. The technique for constructing concept maps is called "concept mapping". A concept map consists of nodes, arrows as linking lines, and linking phrases that describe the ...
In this paper, we present an approach for topic-level video snippet-based extractive summarization, which relies on con tent-based recommendation techniques. We identify topic-level snippets using transcripts of all videos in the dataset and indexed these ...
My research focusses on the automatic extraction of canonical references from publications in Classics. Such references are the standard way of citing classical texts and are found in great numbers throughout monographs, journal articles and commentaries. ...
The Web became the central medium for valuable sources of information extraction applications. However, such user-generated resources are often plagued by inaccuracies and misinformation due to the inherent openness and uncertainty of the Web. In this work ...
Blogs are one of the most prominent means of communication on the web. Their content, interconnections and influence constitute a unique socio-technical artefact of our times which needs to be preserved. The BlogForever project has established best practic ...
This paper introduces a new dataset and compares several methods for the recommendation of non-fiction audio visual material, namely lectures from the TED website. The TED dataset contains 1,149 talks and 69,023 profiles of users, who have made more than 1 ...