Publication

Imputation of missing information in worldwide patent data

Abstract

We present a general method for imputing missing information in the Worldwide Patent Statistical Database (PATSTAT) and make the resulting datasets publicly available. The PATSTAT database is the de facto standard for academic research using patent data. Complete information on patents is essential to obtain an accurate picture of technological activities across countries and over time. However, the coverage of the database is far from complete. Our data imputation method exploits detailed institutional knowledge about the international patent system, and we codify it in a SQL algorithm. We provide two datasets related to the imputation of missing country codes and missing technology classification. We also release the algorithm that can be easily adapted to impute other pieces of information that are missing in PATSTAT. (C) 2020 The Authors. Published by Elsevier Inc.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (33)
Patent
A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention. In most countries, patent rights fall under private law and the patent holder must sue someone infringing the patent in order to enforce their rights. The procedure for granting patents, requirements placed on the patentee, and the extent of the exclusive rights vary widely between countries according to national laws and international agreements.
Database
In computing, a database is an organized collection of data (also known as a data store) stored and accessed electronically through the use of a database management system. Small databases can be stored on a , while large databases are hosted on computer clusters or cloud storage. The design of databases spans formal techniques and practical considerations, including data modeling, efficient data representation and storage, query languages, security and privacy of sensitive data, and distributed computing issues, including supporting concurrent access and fault tolerance.
Missing data
In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. Missing data can occur because of nonresponse: no information is provided for one or more items or for a whole unit ("subject"). Some items are more likely to generate a nonresponse than others: for example items about private subjects such as income.
Show more
Related publications (39)

Synthetic realistic noise-corrupted PPG database and noise generator for the evaluation of PPG denoising and delineation algorithms

David Atienza Alonso, Giulio Masinelli, Adriana Arza Valdes, Fabio Isidoro Tiberio Dell'Agnola

This database is meant to evaluate the performance of denoising and delineation algorithms for PPG signals affected by noise. The noise generator allows applying the algorithms under test to an artificially corrupted reference PPG signal and comparing its ...
2021

Decentralising the patent system

Gaétan Jean A de Rassenfosse, Kyle William Higham

Modern patent systems are slow, inefficient, expensive, and may result in outcomes that actively harm technological progress. This paper proposes a substantive re-think of these systems and lays a foundation upon which practical solutions can be built. Man ...
ELSEVIER INC2021

The Confidence Database

Nathan Quentin Faivre, Inaki Asier Iturrate Gil, Michael Eric Anthony Pereira, Shuo Wang, Xiao Hu, Caroline Peters

Understanding how people rate their confidence is critical for the characterization of a wide range of perceptual, memory, motor and cognitive processes. To enable the continued exploration of these processes, we created a large database of confidence stud ...
NATURE PUBLISHING GROUP2020
Show more
Related MOOCs (12)
Geographical Information Systems 1
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette
Geographical Information Systems 1
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette
Introduction to Geographic Information Systems (part 1)
Organisé en deux parties, ce cours présente les bases théoriques et pratiques des systèmes d’information géographique, ne nécessitant pas de connaissances préalables en informatique. En suivant cette
Show more