Concept search

A concept search (or conceptual search) is an automated information retrieval method that is used to search electronically stored unstructured text (for example, digital archives, email, scientific literature, etc.) for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query. TOC Concept search techniques were developed because of limitations imposed by classical Boolean keyword search technologies when dealing with large, unstructured digital collections of text. Keyword searches often return results that include many non-relevant items (false positives) or that exclude too many relevant items (false negatives) because of the effects of synonymy and polysemy. Synonymy means that one of two or more words in the same language have the same meaning, and polysemy means that many individual words have more than one meaning. Polysemy is a major obstacle for all computer systems that attempt to deal with human language. In English, the most frequently used terms have several common meanings. For example, the word fire can mean: a combustion activity; to terminate employment; to launch, or to excite (as in fire up). For the 200 most-polysemous terms in English, the typical verb has more than twelve common meanings, or senses. The typical noun from this set has more than eight common senses. For the 2000 most-polysemous terms in English, the typical verb has more than eight common senses and the typical noun has more than five. In addition to the problems of polysemous and synonymy, keyword searches can exclude inadvertently misspelled words as well as the variations on the stems (or roots) of words (for example, strike vs. striking). Keyword searches are also susceptible to errors introduced by optical character recognition (OCR) scanning processes, which can introduce random errors into the text of documents (often referred to as noisy text) during the scanning process.

Efficient and Effective Multi-Modal Queries Through Heterogeneous Network Embedding

Karl Aberer, Quoc Viet Hung Nguyen, Thanh Trung Huynh, Thành Tâm Nguyên, Chi Thang Duong

The heterogeneity of today's Web sources requires information retrieval (IR) systems to handle multi-modal queries. Such queries define a user's information needs by different data modalities, such as keywords, hashtags, user profiles, and other media. Rec ...

IEEE COMPUTER SOC2022

Efficient and Effective Multi-Modal Queries Through Heterogeneous Network Embedding

Karl Aberer, Quoc Viet Hung Nguyen, Thanh Trung Huynh, Thành Tâm Nguyên, Chi Thang Duong

IEEE COMPUTER SOC2022

impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers

Efficient and Effective Multi-Modal Queries Through Heterogeneous Network Embedding

On Calculating the Minimum Rate for the Cooperative Data Exchange Problem Over Fully Connected Networks

Graph Chatbot

impresso Text Reuse at Scale. An interface for the exploration of text reuse data in semantically enriched historical newspapers

Efficient and Effective Multi-Modal Queries Through Heterogeneous Network Embedding

On Calculating the Minimum Rate for the Cooperative Data Exchange Problem Over Fully Connected Networks