Emotion recognitionEmotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables.
Speaker recognitionSpeaker recognition is the identification of a person from characteristics of voices. It is used to answer the question "Who is speaking?" The term voice recognition can refer to speaker recognition or speech recognition. Speaker verification (also called speaker authentication) contrasts with identification, and speaker recognition differs from speaker diarisation (recognizing when the same speaker is speaking).
Gender roleA gender role, also known as a sex role, is a social role encompassing a range of behaviors and attitudes that are generally considered acceptable, appropriate, or desirable for a person based on that person's sex. Gender roles are usually centered on conceptions of masculinity and femininity, although there are exceptions and variations. The specifics regarding these gendered expectations may vary among cultures, while other characteristics may be common throughout a range of cultures.
Pattern recognitionPattern recognition is the automated recognition of patterns and regularities in data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, , information retrieval, bioinformatics, data compression, computer graphics and machine learning.
Sound recording and reproductionSound recording and reproduction is the electrical, mechanical, electronic, or digital inscription and re-creation of sound waves, such as spoken voice, singing, instrumental music, or sound effects. The two main classes of sound recording technology are analog recording and digital recording. Sound recording is the transcription of invisible vibrations in air onto a storage medium such as a phonograph disc. The process is reversed in sound reproduction, and the variations stored on the medium are transformed back into sound waves.
Audio engineerAn audio engineer (also known as a sound engineer or recording engineer) helps to produce a recording or a live performance, balancing and adjusting sound sources using equalization, dynamics processing and audio effects, mixing, reproduction, and reinforcement of sound. Audio engineers work on the "technical aspect of recording—the placing of microphones, pre-amp knobs, the setting of levels. The physical recording of any project is done by an engineer... the nuts and bolts.
Conditional random fieldConditional random fields (CRFs) are a class of statistical modeling methods often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighbouring" samples, a CRF can take context into account. To do so, the predictions are modelled as a graphical model, which represents the presence of dependencies between the predictions. What kind of graph is used depends on the application.
LoudspeakerA loudspeaker (commonly referred to as a speaker or speaker driver) is an electroacoustic transducer that converts an electrical audio signal into a corresponding sound. A speaker system, also often simply referred to as a speaker or loudspeaker, comprises one or more such speaker drivers, an enclosure, and electrical connections possibly including a crossover network. The speaker driver can be viewed as a linear motor attached to a diaphragm which couples that motor's movement to motion of air, that is, sound.
Multitrack recordingMultitrack recording (MTR), also known as multitracking, is a method of sound recording developed in 1955 that allows for the separate recording of multiple sound sources or of sound sources recorded at different times to create a cohesive whole. Multitracking became possible in the mid-1950s when the idea of simultaneously recording different audio channels to separate discrete "tracks" on the same reel-to-reel tape was developed.
Speech recognitionSpeech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.