Open accessOpen access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. Under some models of open access publishing, barriers to copying or reuse are also reduced or removed by applying an open license for copyright. The main focus of the open access movement is "peer reviewed research literature". Historically, this has centered mainly on print-based academic journals.
Data miningData mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information (with intelligent methods) from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.
Data accessData access is a generic term referring to a process which has both an IT-specific meaning and other connotations involving access rights in a broader legal and/or political sense. In the former it typically refers to software and activities related to storing, retrieving, or acting on data housed in a database or other repository. Two fundamental types of data access exist: sequential access (as in magnetic tape, for example) random access (as in indexed media) Data access crucially involves authorization to access different data repositories.
Data wranglingData wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics. The goal of data wrangling is to assure quality and useful data. Data analysts typically spend the majority of their time in the process of data wrangling compared to the actual analysis of the data.
Research data archivingResearch data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archival of data.
ReproducibilityReproducibility, closely related to replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a statistical analysis of a data set should be achieved again with a high degree of reliability when the study is replicated. There are different kinds of replication but typically replication studies involve different researchers using the same methodology.
Data martA data mart is a structure/access pattern specific to data warehouse environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each department or business unit is considered the owner of its data mart including all the hardware, software and data.
Reproducibility ProjectThe Reproducibility Project: Psychology was a crowdsourced collaboration of 270 contributing authors to repeat 100 published experimental and correlational psychological studies. This project was led by the Center for Open Science and its co-founder, Brian Nosek, who started the project in November 2011. The results of this collaboration were published in August 2015. Reproducibility is the ability to produce the same findings, using the same methodologies as the original work, but on a different dataset (for instance, collected from a different set of participants).
File sharingFile sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include removable media, centralized servers on computer networks, Internet-based hyperlinked documents, and the use of distributed peer-to-peer networking. File sharing technologies, such as BitTorrent, are integral to modern media piracy, as well as the sharing of scientific data and other free content.
Replication crisisThe replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method, such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.