Text corpusIn linguistics and natural language processing, a corpus (: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated. Annotated, they have been used in corpus linguistics for statistical hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. In search technology, a corpus is the collection of documents which is being searched.
Visual StudioVisual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platforms such as Windows API, Windows Forms, Windows Presentation Foundation, Windows Store and Microsoft Silverlight. It can produce both native code and managed code. Visual Studio includes a code editor supporting IntelliSense (the code completion component) as well as code refactoring.
Corpus callosumThe corpus callosum (Latin for "tough body"), also callosal commissure, is a wide, thick nerve tract, consisting of a flat bundle of commissural fibers, beneath the cerebral cortex in the brain. The corpus callosum is only found in placental mammals. It spans part of the longitudinal fissure, connecting the left and right cerebral hemispheres, enabling communication between them. It is the largest white matter structure in the human brain, about in length and consisting of 200–300 million axonal projections.
Behavioural geneticsBehavioural genetics, also referred to as behaviour genetics, is a field of scientific research that uses genetic methods to investigate the nature and origins of individual differences in behaviour. While the name "behavioural genetics" connotes a focus on genetic influences, the field broadly investigates the extent to which genetic and environmental factors influence individual differences, and the development of research designs that can remove the confounding of genes and environment.
Pattern recognitionPattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
Data lakeA data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).
Extract, transform, loadIn computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on reoccurring schedules either as single jobs or aggregated into a batch of jobs.
Speech recognitionSpeech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
MicrophoneA microphone, colloquially called mic (maɪk), is a transducer that converts sound into an electrical signal. Microphones are used in many applications such as telephones, hearing aids, public address systems for concert halls and public events, motion picture production, live and recorded audio engineering, sound recording, two-way radios, megaphones, and radio and television broadcasting. They are also used in computers for recording voice, speech recognition, VoIP, and for other purposes such as ultrasonic sensors or knock sensors.
Raw image formatA camera raw image file contains unprocessed or minimally processed data from the of either a digital camera, a motion picture film scanner, or other . Raw files are so named because they are not yet processed, and contain large amounts of potentially redundant data. Normally, the image is processed by a raw converter, in a wide-gamut internal color space where precise adjustments can be made before to a viewable file format such as JPEG or PNG for storage, printing, or further manipulation.