Lossless compressionLossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates (and therefore reduced media sizes).
Data compressionIn information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information.
Lossy compressionIn information technology, lossy compression or irreversible compression is the class of data compression methods that uses inexact approximations and partial data discarding to represent the content. These techniques are used to reduce data size for storing, handling, and transmitting content. The different versions of the photo of the cat on this page show how higher degrees of approximation create coarser images as more details are removed. This is opposed to lossless data compression (reversible data compression) which does not degrade the data.
Huffman codingIn computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by David A. Huffman while he was a Sc.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol (such as a character in a file).
Entropy (information theory)In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to : where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannons"), while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys".
Image compressionImage compression is a type of data compression applied to s, to reduce their cost for storage or transmission. Algorithms may take advantage of visual perception and the statistical properties of image data to provide superior results compared with generic data compression methods which are used for other digital data. Image compression may be lossy or lossless. Lossless compression is preferred for archival purposes and often for medical imaging, technical drawings, clip art, or comics.
Entropy codingIn information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method must have expected code length greater or equal to the entropy of the source. More precisely, the source coding theorem states that for any source distribution, the expected code length satisfies , where is the number of symbols in a code word, is the coding function, is the number of symbols used to make output codes and is the probability of the source symbol.
Transparency (data compression)In data compression and psychoacoustics, transparency is the result of lossy data compression accurate enough that the compressed result is perceptually indistinguishable from the uncompressed input, i.e. perceptually lossless. A transparency threshold is a given value at which transparency is reached. It is commonly used to describe compressed data bitrates. For example, the transparency threshold for MP3 to linear PCM audio is said to be between 175 and 245 kbit/s, at 44.
Kullback–Leibler divergenceIn mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P.
Apple Lossless Audio CodecThe Apple Lossless Audio Codec (ALAC), also known as Apple Lossless, or Apple Lossless Encoder (ALE), is an audio coding format, and its reference audio codec implementation, developed by Apple Inc. for lossless data compression of digital music. After initially keeping it proprietary from its inception in 2004, in late 2011 Apple made the codec available open source and royalty-free. Traditionally, Apple has referred to the codec as Apple Lossless, though more recently it has begun to use the abbreviated term ALAC when referring to the codec.