Enterprise search | EPFL Graph Search

Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience. "Enterprise search" is used to describe the software of search information within an enterprise (though the search function and its results may still be public). Enterprise search can be contrasted with web search, which applies search technology to documents on the open web, and desktop search, which applies search technology to the content on a single computer. Enterprise search systems index data and documents from a variety of sources such as: , intranets, document management systems, e-mail, and databases. Many enterprise search systems integrate structured and unstructured data in their collections. Enterprise search systems also use access controls to enforce a security policy on their users. Enterprise search can be seen as a type of vertical search of an enterprise. In an enterprise search system, content goes through various phases from source repository to search results: Content awareness (or "content collection") is usually either a push or pull model. In the push model, a source system is integrated with the search engine in such a way that it connects to it and pushes new content directly to its APIs. This model is used when real-time indexing is important. In the pull model, the software gathers content from sources using a connector such as a web crawler or a database connector. The connector typically polls the source with certain intervals to look for new, updated or deleted content. Content from different sources may have many different formats or document types, such as XML, HTML, Office document formats or plain text. The content processing phase processes the incoming documents to plain text using document filters. It is also often necessary to normalize content in various ways to improve recall or precision. These may include stemming, lemmatization, synonym expansion, entity extraction, part of speech tagging.

ECCE: Entity-centric Corpus Exploration Using Contextual Implicit Networks

Maud Ehrmann, Matteo Romanello, Andreas Oliver Spitz

In the Digital Age, the analysis and exploration of unstructured document collections is of central importance to members of investigative professions, whether they might be scholars, journalists, paralegals, or analysts. In many of their domains, entities ...

ACM Digital Press2022

Contextualized ranking of entity types based on knowledge graphs

Karl Aberer, Philippe Cudré-Mauroux, Michele Catasta, Roman Prokofyev

A large fraction of online queries targets entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet ...

Elsevier Science Bv2016

Wikipedia Chemical Structure Explorer: substructure and similarity searching of molecules from Wikipedia

Luc Patiny, Michaël Giuseppe Zasso

Background: Wikipedia, the world's largest and most popular encyclopedia is an indispensable source of chemistry information. It contains among others also entries for over 15,000 chemicals including metabolites, drugs, agrochemicals and industrial chemica ...

Biomed Central Ltd2015