Named-entity recognitionNamed-entity recognition (NER) (also known as (named) entity identification, entity chunking, and entity extraction) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. Most research on NER/NEE systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp.
Entity linkingIn natural language processing, entity linking, also referred to as named-entity linking (NEL), named-entity disambiguation (NED), named-entity recognition and disambiguation (NERD) or named-entity normalization (NEN) is the task of assigning a unique identity to entities (such as famous individuals, locations, or companies) mentioned in text. For example, given the sentence "Paris is the capital of France", the idea is to determine that "Paris" refers to the city of Paris and not to Paris Hilton or any other entity that could be referred to as "Paris".
Text miningText mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al.
Cultural heritageCultural heritage is the heritage of tangible and intangible heritage assets of a group or society that is inherited from past generations. Not all heritages of past generations are "heritage"; rather, heritage is a product of selection by society. Cultural heritage includes tangible culture (such as buildings, monuments, landscapes, archive materials, books, works of art, and artifacts), intangible culture (such as folklore, traditions, language, and knowledge), and natural heritage (including culturally significant landscapes, and biodiversity).
Deep learningDeep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
Historical documentHistorical documents are original documents that contain important historical information about a person, place, or event and can thus serve as primary sources as important ingredients of the historical methodology. Significant historical documents can be deeds, laws, accounts of battles (often given by the victors or persons sharing their viewpoint), or the exploits of the powerful. Though these documents are of historical interest, they do not detail the daily lives of ordinary people, or the way society functioned.
Historical fictionHistorical fiction is a literary genre in which the plot takes place in a setting related to the past events, but is fictional. Although the term is commonly used as a synonym for historical fiction literature, it can also be applied to other types of narrative, including theatre, opera, cinema, and television, as well as video games and graphic novels. An essential element of historical fiction is that it is set in the past and pays attention to the manners, social conditions and other details of the depicted period.
Cultural heritage managementCultural heritage management (CHM) is the vocation and practice of managing cultural heritage. It is a branch of cultural resources management (CRM), although it also draws on the practices of cultural conservation, restoration, museology, archaeology, history and architecture. While the term cultural heritage is generally used in Europe, in the US the term cultural resources is in more general use specifically referring to cultural heritage resources.
Historical criticismHistorical criticism, also known as the historical-critical method or higher criticism, is a branch of criticism that investigates the origins of ancient texts in order to understand "the world behind the text". While often discussed in terms of Jewish and Christian writings from ancient times, historical criticism has also been applied to other religious and secular writings from various parts of the world and periods of history.
Transformer (machine learning model)A transformer is a deep learning architecture that relies on the parallel multi-head attention mechanism. The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team. It is notable for requiring less training time than previous recurrent neural architectures, such as long short-term memory (LSTM), and its later variation has been prevalently adopted for training large language models on large (language) datasets, such as the Wikipedia corpus and Common Crawl, by virtue of the parallelized processing of input sequence.