Training, validation, and test data setsIn machine learning, a common task is the study and construction of algorithms that can learn from and make predictions on data. Such algorithms function by making data-driven predictions or decisions, through building a mathematical model from input data. These input data used to build the model are usually divided into multiple data sets. In particular, three data sets are commonly used in different stages of the creation of the model: training, validation, and test sets.
Head and neck cancerHead and neck cancer develops from tissues in the lip and oral cavity (mouth), larynx (throat), salivary glands, nose, sinuses, or skin of the face. The most common types of head and neck cancer occur in the lips, mouth, and larynx. Symptoms predominantly include a sore that does not heal or a change in the voice. In those with advanced disease, there may be unusual bleeding, facial pain, numbness or swelling, and visible lumps on the outside of the neck or oral cavity.
Colorectal cancerColorectal cancer (CRC), also known as bowel cancer, colon cancer, or rectal cancer, is the development of cancer from the colon or rectum (parts of the large intestine). Signs and symptoms may include blood in the stool, a change in bowel movements, weight loss, and fatigue. Most colorectal cancers are due to old age and lifestyle factors, with only a small number of cases due to underlying genetic disorders. Risk factors include diet, obesity, smoking, and lack of physical activity.
Lung cancerLung cancer, also known as lung carcinoma, is a malignant tumor that begins in the lung. Lung cancer is caused by genetic damage to the DNA of cells in the airways, often caused by cigarette smoking or inhaling damaging chemicals. Damaged airway cells gain the ability to multiply unchecked, causing the growth of a tumor. Without treatment, tumors spread throughout the lung, damaging lung function. Eventually lung tumors metastasize, spreading to other parts of the body.
Cross-validation (statistics)Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice.
GenomicsGenomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dimensional structural configuration. In contrast to genetics, which refers to the study of individual genes and their roles in inheritance, genomics aims at the collective characterization and quantification of all of an organism's genes, their interrelations and influence on the organism.
MicrobiomeA microbiome () is the community of microorganisms that can usually be found living together in any given habitat. It was defined more precisely in 1988 by Whipps et al. as "a characteristic microbial community occupying a reasonably well-defined habitat which has distinct physio-chemical properties. The term thus not only refers to the microorganisms involved but also encompasses their theatre of activity". In 2020, an international panel of experts published the outcome of their discussions on the definition of the microbiome.
Esophageal webEsophageal webs are thin membranes occurring anywhere along the esophagus. Its main symptoms are pain and difficulty in swallowing (dysphagia). Esophageal webs are thin membranes of normal esophageal tissue consisting of mucosa and submucosa that can partially protrude/obstruct the esophagus. They can be congenital or acquired. Congenital webs commonly appear in the middle and inferior third of the esophagus, and they are more likely to be circumferential with a central or eccentric orifice.
Regression validationIn statistics, regression validation is the process of deciding whether the numerical results quantifying hypothesized relationships between variables, obtained from regression analysis, are acceptable as descriptions of the data. The validation process can involve analyzing the goodness of fit of the regression, analyzing whether the regression residuals are random, and checking whether the model's predictive performance deteriorates substantially when applied to data that were not used in model estimation.
MetabolismMetabolism (məˈtæbəlɪzəm, from μεταβολή metabolē, "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run cellular processes; the conversion of food to building blocks for proteins, lipids, nucleic acids, and some carbohydrates; and the elimination of metabolic wastes. These enzyme-catalyzed reactions allow organisms to grow and reproduce, maintain their structures, and respond to their environments.