Publication

Evolution of Topics and Novelty in Science

Orion B Penner
2019
Article de conférence

Résumé

Methods of estimating the similarity between individual publications is an area of long-standing interest in the scientometrics community. Traditional methods have generally relied on references and other metadata, while text mining approaches based on title and abstract text have appeared more frequently in recent years. In principle, Topic Models have great potential in this domain. But in practice, they are often difficult to successfully employ and, in particular, are notoriously inconsistent as latent space dimension grows. That is, running the same model, with the same parameters, on the same data, but with a different random seed produces radically different similarity estimates as the number of topics increase. In this manuscript we develop a simple, but novel, methodology for evaluating the robustness of topic models. Employing that methodology, we find that the neural network based Doc2Vec approach seems capable of providing (statistically) robust estimates of document-document similarities, even for topic spaces far larger than prudent for the most common topic model approach: Latent Dirichlet Allocation. As this is a work in progress, we do not venture deeply into the question of whether these estimates also reflect reality, but do provide some preliminary evidence and future directions for those efforts.

Source officielle

https://infoscience.epfl.ch/record/276184?ln=fr

À propos de ce résultat

Cette page est générée automatiquement et peut contenir des informations qui ne sont pas correctes, complètes, à jour ou pertinentes par rapport à votre recherche. Il en va de même pour toutes les autres pages de ce site. Veillez à vérifier les informations auprès des sources officielles de l'EPFL.

Evolution of Topics and Novelty in Science

Graph Chatbot

Chattez avec Graph Search

Robustness, replicability and scalability in topic modelling

Multi-scale sequential network for semantic text segmentation and localization

The organisation of science: topics, incentives and funding.

Robustness, replicability and scalability in topic modelling

Multi-scale sequential network for semantic text segmentation and localization

The organisation of science: topics, incentives and funding.