Lung cancerLung cancer, also known as lung carcinoma, is a malignant tumor that begins in the lung. Lung cancer is caused by genetic damage to the DNA of cells in the airways, often caused by cigarette smoking or inhaling damaging chemicals. Damaged airway cells gain the ability to multiply unchecked, causing the growth of a tumor. Without treatment, tumors spread throughout the lung, damaging lung function. Eventually lung tumors metastasize, spreading to other parts of the body.
Active learning (machine learning)Active learning is a special case of machine learning in which a learning algorithm can interactively query a user (or some other information source) to label new data points with the desired outputs. In statistics literature, it is sometimes also called optimal experimental design. The information source is also called teacher or oracle. There are situations in which unlabeled data is abundant but manual labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels.
Cancer researchCancer research is research into cancer to identify causes and develop strategies for prevention, diagnosis, treatment, and cure. Cancer research ranges from epidemiology, molecular bioscience to the performance of clinical trials to evaluate and compare applications of the various cancer treatments. These applications include surgery, radiation therapy, chemotherapy, hormone therapy, immunotherapy and combined treatment modalities such as chemo-radiotherapy.
Exome sequencingExome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subset of DNA that encodes proteins. These regions are known as exons—humans have about 180,000 exons, constituting about 1% of the human genome, or approximately 30 million base pairs. The second step is to sequence the exonic DNA using any high-throughput DNA sequencing technology.
Shotgun sequencingIn genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The chain-termination method of DNA sequencing ("Sanger sequencing") can only be used for short DNA strands of 100 to 1000 base pairs. Due to this size limit, longer sequences are subdivided into smaller fragments that can be sequenced separately, and these sequences are assembled to give the overall sequence.
Hereditary cancer syndromeA hereditary cancer syndrome (familial/family cancer syndrome, inherited cancer syndrome, cancer predisposition syndrome, cancer syndrome, etc.) is a genetic disorder in which inherited genetic mutations in one or more genes predispose the affected individuals to the development of cancer and may also cause early onset of these cancers. Hereditary cancer syndromes often show not only a high lifetime risk of developing cancer, but also the development of multiple independent primary tumors.
Treatment of cancerCancer can be treated by surgery, chemotherapy, radiation therapy, hormonal therapy, targeted therapy (including immunotherapy such as monoclonal antibody therapy) and synthetic lethality, most commonly as a series of separate treatments (e.g. chemotherapy before surgery). The choice of therapy depends upon the location and grade of the tumor and the stage of the disease, as well as the general state of the patient (performance status). Cancer genome sequencing helps in determining which cancer the patient exactly has for determining the best therapy for the cancer.
Sanger sequencingSanger sequencing is a method of DNA sequencing that involves electrophoresis and is based on the random incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. After first being developed by Frederick Sanger and colleagues in 1977, it became the most widely used sequencing method for approximately 40 years. It was first commercialized by Applied Biosystems in 1986. More recently, higher volume Sanger sequencing has been replaced by next generation sequencing methods, especially for large-scale, automated genome analyses.
Automated machine learningAutomated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready for deployment. AutoML was proposed as an artificial intelligence-based solution to the growing challenge of applying machine learning. The high degree of automation in AutoML aims to allow non-experts to make use of machine learning models and techniques without requiring them to become experts in machine learning.
Whole genome sequencingWhole genome sequencing (WGS), also known as full genome sequencing, complete genome sequencing, or entire genome sequencing, is the process of determining the entirety, or nearly the entirety, of the DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast. Whole genome sequencing has largely been used as a research tool, but was being introduced to clinics in 2014.