Êtes-vous un étudiant de l'EPFL à la recherche d'un projet de semestre?
Travaillez avec nous sur des projets en science des données et en visualisation, et déployez votre projet sous forme d'application sur Graph Search.
We present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using patent data. Complete information on patents is essential to obtain an accurate picture of technological activities across countries and over time. However, the coverage of the database is far from complete. Our data imputation method exploits detailed institutional knowledge about the international patent system, and we codify it in a SQL algorithm. We provide two datasets related to the imputation of missing country codes and missing technology classification. We also release the algorithm that can be easily adapted to impute other pieces of information that are missing in PATSTAT. (C) 2020 The Authors. Published by Elsevier Inc.
Gaétan Jean A de Rassenfosse, Kyle William Higham
David Atienza Alonso, Giulio Masinelli, Adriana Arza Valdes, Fabio Isidoro Tiberio Dell'Agnola
Nathan Quentin Faivre, Inaki Asier Iturrate Gil, Michael Eric Anthony Pereira, Shuo Wang, Xiao Hu, Caroline Peters