Concept# Kolmogorov complexity

Summary

In algorithmic information theory (a subfield of computer science and mathematics), the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program (in a predetermined programming language) that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.
The notion of Kolmogorov complexity can be used to state and prove impossibility results akin to Cantor's diagonal argument, Gödel's incompleteness theorem, and Turing's halting problem.
In particular, no program P computing a lower bound for each text's Kolmogorov complexity can return a value essentially larger than P's own length (see section );

Official source

This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.

Related publications

Loading

Related people

Loading

Related units

Loading

Related concepts

Loading

Related courses

Loading

Related lectures

Loading

Related people (18)

Related concepts (26)

Related publications (100)

Information theory

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in

Algorithmic information theory

Algorithmic information theory (AIT) is a branch of theoretical computer science that concerns itself with the relationship between computation and information of computably generated objects (as oppo

Computer science

Computer science is the study of computation, information, and automation. Computer science spans theoretical disciplines (such as algorithms, theory of computation, and information theory) to applied

Loading

Loading

Loading

Related courses (14)

This course introduces the basics of cryptography. We review several types of cryptographic primitives, when it is safe to use them and how to select the appropriate security parameters. We detail how they work and sketch how they can be implemented.

This course provides an overview of key advances in continuous optimization and statistical analysis for machine learning. We review recent learning formulations and models as well as their guarantees, describe scalable solution techniques and algorithms, and illustrate the trade-offs involved.

Discrete mathematics is a discipline with applications to almost all areas of study. It provides a set of indispensable tools to computer science in particular. This course reviews (familiar) topics as diverse as mathematical reasoning, combinatorics, discrete structures & algorithmic thinking.

Related units (17)

Related lectures (26)

Mikhail Kapralov, Amir Zandieh

Reconstructing continuous signals based on a small number of discrete samples is a fundamental problem across science and engineering. We are often interested in signals with "simple" Fourier structure - e.g., those involving frequencies within a bounded range, a small number of frequencies, or a few blocks of frequencies i.e., bandlimited, sparse, and multiband signals, respectively. More broadly, any prior knowledge on a signal's Fourier power spectrum can constrain its complexity. Intuitively, signals with more highly constrained Fourier structure require fewer samples to reconstruct. We formalize this intuition by showing that, roughly, a continuous signal from a given class can be approximately reconstructed using a number of samples proportional to the statistical dimension of the allowed power spectrum of that class. We prove that, in nearly all settings, this natural measure tightly characterizes the sample complexity of signal reconstruction. Surprisingly, we also show that, up to log factors, a universal non-uniform sampling strategy can achieve this optimal complexity for any class of signals. We present an efficient and general algorithm for recovering a signal from the samples taken. For bandlimited and sparse signals, our method matches the state-of-the-art, while providing the the first computationally and sample efficient solution to a broader range of problems, including multiband signal reconstruction and Gaussian process regression tasks in one dimension. Our work is based on a novel connection between randomized linear algebra and the problem of reconstructing signals with constrained Fourier structure. We extend tools based on statistical leverage score sampling and column-based matrix reconstruction to the approximation of continuous linear operators that arise in the signal reconstruction problem. We believe these extensions are of independent interest and serve as a foundation for tackling a broad range of continuous time problems using randomized methods.

Shachi Shailesh Deshpande, Bernard Moret

Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerfulbut also most complexmodels avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

, ,

We consider the cooperative data exchange problem, in which nodes are fully connected with each other. Each node initially only has a subset of the K packets making up a file and wants to recover the whole file. Node i can make a broadcast transmission, which incurs cost w_i and is received by all other nodes. The goal is to minimize the total cost of transmissions that all nodes have to send, which is also called weighted cost. Following the same idea of our previous work which provided a method based on d-Basis construction to solve cooperative data exchange problem without weighted cost, we present a modified method to solve cooperative data exchange problem with weighted cost. We present a polynomial-time deterministic algorithm to compute the minimum weighted cost and determine the rate vector and the packets that should be used to generate each transmission. By leveraging the connection to Maximum Distance Separable codes, the coefficients of linear combinations of the optimal coding scheme can be efficiently generated. Our algorithm has significantly lower complexity than the state of the art. In particular, we prove that the minimum weighted cost function is a convex function of the total number of transmissions for integer rate cases.

2017