Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience.
"Enterprise search" is used to describe the software of search information within an enterprise (though the search function and its results may still be public). Enterprise search can be contrasted with web search, which applies search technology to documents on the open web, and desktop search, which applies search technology to the content on a single computer.
Enterprise search systems index data and documents from a variety of sources such as: , intranets, document management systems, e-mail, and databases. Many enterprise search systems integrate structured and unstructured data in their collections. Enterprise search systems also use access controls to enforce a security policy on their users.
Enterprise search can be seen as a type of vertical search of an enterprise.
In an enterprise search system, content goes through various phases from source repository to search results:
Content awareness (or "content collection") is usually either a push or pull model. In the push model, a source system is integrated with the search engine in such a way that it connects to it and pushes new content directly to its APIs. This model is used when real-time indexing is important. In the pull model, the software gathers content from sources using a connector such as a web crawler or a database connector. The connector typically polls the source with certain intervals to look for new, updated or deleted content.
Content from different sources may have many different formats or document types, such as XML, HTML, Office document formats or plain text. The content processing phase processes the incoming documents to plain text using document filters. It is also often necessary to normalize content in various ways to improve recall or precision. These may include stemming, lemmatization, synonym expansion, entity extraction, part of speech tagging.
Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.
Understanding the brain requires an integrated understanding of different scales of organisation of the brain. This Massive Open Online Course (MOOC) will take the you through the latest data, models
Understanding the brain requires an integrated understanding of different scales of organisation of the brain. This Massive Open Online Course (MOOC) will take the you through the latest data, models
Interfaces between peptides and metallic surfaces are the subject of great interest for possible use in technological and medicinal applications, mainly since organic systems present an extensive range of functionalities, are abundant, cheap, and exhibit l ...
Large codebases are routinely indexed by standard Information Retrieval systems, starting from the assumption that code written by humans shows similar statistical properties to written text [Hindle et al., 2012]. While those IR systems are still relativel ...
2019
Concepts associés (5)
, ,
In the Digital Age, the analysis and exploration of unstructured document collections is of central importance to members of investigative professions, whether they might be scholars, journalists, paralegals, or analysts. In many of their domains, entities ...
L’indexation automatique de documents est un domaine de l'informatique et des sciences de l'information et des bibliothèques qui utilise des méthodes logicielles pour organiser un ensemble de documents et faciliter ultérieurement la recherche de contenu dans cette collection. La multiplicité des types de documents (textuels, medias, audiovisuels, Web) donne lieu à des approches très différentes, notamment en termes de représentation des données.
La gestion des connaissances (en anglais knowledge management) est une démarche managériale pluridisciplinaire qui regroupe l'ensemble des initiatives, des méthodes et des techniques permettant de percevoir, identifier, analyser, organiser, mémoriser, partager les connaissances des membres d'une organisation – les savoirs créés par l'entreprise elle-même (marketing, recherche et développement) ou acquis de l'extérieur (intelligence économique) – en vue d'atteindre un objectif fixé. Nous sommes submergés d'informations.
La recherche d'information (RI) est le domaine qui étudie la manière de retrouver des informations dans un corpus. Celui-ci est composé de documents d'une ou plusieurs bases de données, qui sont décrits par un contenu ou les métadonnées associées. Les bases de données peuvent être relationnelles ou non structurées, telles celles mises en réseau par des liens hypertexte comme dans le World Wide Web, l'internet et les intranets. Le contenu des documents peut être du texte, des sons, des images ou des données.