Training, validation, and test data setsIn machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.
Molecular propertyMolecular properties include the chemical properties, physical properties, and structural properties of molecules, including drugs. Molecular properties typically do not include pharmacological or biological properties of a chemical compound.
Simple random sampleIn statistics, a simple random sample (or SRS) is a subset of individuals (a sample) chosen from a larger set (a population) in which a subset of individuals are chosen randomly, all with the same probability. It is a process of selecting a sample in a random way. In SRS, each subset of k individuals has the same probability of being chosen for the sample as any other subset of k individuals. A simple random sample is an unbiased sampling technique. Simple random sampling is a basic type of sampling and can be a component of other more complex sampling methods.
Artificial neural networkArtificial neural networks (ANNs, also shortened to neural networks (NNs) or neural nets) are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.
Chemical propertyA chemical property is any of a material's properties that becomes evident during, or after, a chemical reaction; that is, any quality that can be established only by changing a substance's chemical identity. Simply speaking, chemical properties cannot be determined just by viewing or touching the substance; the substance's internal structure must be affected greatly for its chemical properties to be investigated. When a substance goes under a chemical reaction, the properties will change drastically, resulting in chemical change.
Molecular modelA molecular model is a physical model of an atomistic system that represents molecules and their processes. They play an important role in understanding chemistry and generating and testing hypotheses. The creation of mathematical models of molecular properties and behavior is referred to as molecular modeling, and their graphical depiction is referred to as molecular graphics. The term, "molecular model" refer to systems that contain one or more explicit atoms (although solvent atoms may be represented implicitly) and where nuclear structure is neglected.
Stratified samplingIn statistics, stratified sampling is a method of sampling from a population which can be partitioned into subpopulations. In statistical surveys, when subpopulations within an overall population vary, it could be advantageous to sample each subpopulation (stratum) independently. Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should define a partition of the population.
Multi-objective optimizationMulti-objective optimization or Pareto optimization (also known as multi-objective programming, vector optimization, multicriteria optimization, or multiattribute optimization) is an area of multiple-criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. Multi-objective is a type of vector optimization that has been applied in many fields of science, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives.
Molecular geometryMolecular geometry is the three-dimensional arrangement of the atoms that constitute a molecule. It includes the general shape of the molecule as well as bond lengths, bond angles, torsional angles and any other geometrical parameters that determine the position of each atom. Molecular geometry influences several properties of a substance including its reactivity, polarity, phase of matter, color, magnetism and biological activity. The angles between bonds that an atom forms depend only weakly on the rest of molecule, i.
Ensemble learningIn statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.