Natural Language Processing (NLP) driven categorisation and detection of discourse in historical US patents

Jérôme Baudry, Nicolas Christophe Chachereau, Bhargav Srinivasa Desikan, Prakhar Gupta
2022
Discussion par affiche

Résumé

Patents have traditionally been used in the history of technology as an indication of the thinking process of the inventors, of the challenges or “reverse salients” they faced, or of the social groups influencing the construction of technology. More recently, historians of science and technology also read them to interpret the way people described technology and how the specific inscriptions of inventions mattered for the justification and operation of the patent system. The digitization of historical patents opens up unique opportunities to assess the feasibility of unsupervised machine learning and natural language methods for such explorations. In this project, we analyze over a million US historical patents from 1830-1930 using a variety of text-based methods, with two major aims: 1) categorizing patents into coherent technical categories, 2) identifying discourses of safety, reflexivity, and environmental concern in technological innovation. We use both frequency-based and context-based methods, and find that a bag of words-based methods such as TDF-IDF and topic modeling do not perform well on semantic categorization due to the linguistic peculiarities of patent specifications. This suggests that a successful approach to categorizing patents would require contextual semantic representations such as Transformers-based methods (e.g. BERT), or static embedding based methods (e.g Word2Vec, Doc2Vec) which have relatively low computational costs but less expressive in some scenarios. We run early experiments using these methods and find that word embedding models are effective in learning semantics from the descriptions of the patents. In this poster, we will describe our early results, as well as exploratory data analysis on this massive historical patents dataset.

Source officielle

https://infoscience.epfl.ch/record/297711?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Natural Language Processing (NLP) driven categorisation and detection of discourse in historical US patents

Graph Chatbot

Chattez avec Graph Search

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Robust machine learning for neuroscientific inference

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation

Infusing structured knowledge priors in neural models for sample-efficient symbolic reasoning

Robust machine learning for neuroscientific inference

Few-shot Learning for Efficient and Effective Machine Learning Model Adaptation