Publication

Negentropic linguistic evolution: A comparison of seven languages

Abstract

The relationship between the entropy of language and its complexity has been the subject of much speculation – some seeing the increase of linguistic entropy as a sign of linguistic complexification or interpreting entropy drop as a marker of greater regularity. Some evolutionary explanations, like the learning bottleneck hypothesis, argues that communication systems having more regular structures tend to have evolutionary advantages over more complex structures. Other structural effects of communication networks, like globalization of exchanges or algorithmic mediation, have been hypothesized to have a regularization effect on language. Longer-term studies are now possible thanks to the arrival of large-scale diachronic corpora, like newspaper archives or digitized libraries. However, simple analyses of such datasets are prone to misinterpretations due to significant variations of corpus size over the years and the indirect effect this can have on various measures of language change and linguistic complexity. In particular, it is important not to misinterpret the arrival of new words as an increase in complexity as this variation is intrinsical, as is the variation of corpus size. This paper is an attempt to conduct an unbiased diachronic study of linguistic complexity over seven different languages using the Google Books corpus. The paper uses a simple entropy measure on a closed, but nevertheless large, subset of words, called kernels. The kernel contains only the words that are present without interruption for the whole length of the study. This excludes all the words that arrived or disappeared during the period. We argue that this method is robust towards variations of corpus size and permits to study change in complexity despite possible (and in the case of Google Books unknown) change in the composition of the corpus. Indeed, the evolution observed on the seven different languages shows rather different patterns that are not directly correlated with the evolution of the size of the respective corpora. The rest of the paper presents the methods followed, the results obtained and the next steps we envision.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related concepts (38)
Entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the microscopic description of nature in statistical physics, and to the principles of information theory.
Entropy (information theory)
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to : where denotes the sum over the variable's possible values. The choice of base for , the logarithm, varies for different applications. Base 2 gives the unit of bits (or "shannons"), while base e gives "natural units" nat, and base 10 gives units of "dits", "bans", or "hartleys".
Language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and written forms, and may also be conveyed through sign languages. The vast majority of human languages have developed writing systems that allow for the recording and preservation of the sounds or signs of language. Human language is characterized by its cultural and historical diversity, with significant variations observed between cultures and across time.
Show more
Related publications (57)

Tree-AMP: Compositional Inference with Tree Approximate Message Passing

Florent Gérard Krzakala, Lenka Zdeborová

We introduce Tree-AMP, standing for Tree Approximate Message Passing, a python package for compositional inference in high-dimensional tree-structured models. The package provides a unifying framework to study several approximate message passing algorithms ...
2023

Type-Preserving Compilation of Class-Based Languages

Guillaume André Fradji Martres

The Dependent Object Type (DOT) calculus was designed to put Scala on a sound basis, but while DOT relies on structural subtyping, Scala is a fundamentally class-based language. This impedance mismatch means that a proof of DOT soundness by itself is ...
EPFL2023

Matchertext: Towards Verbatim Interlanguage Embedding

Bryan Alexander Ford

Embedding text in one language within text of another is commonplace for numerous purposes, but usually requires tedious and error-prone “escaping” transformations on the embedded string. We propose a simple cross-language syntactic discipline, matchertext ...
2022
Show more
Related MOOCs (12)
Thermodynamics
Ce cours complète le MOOC « Thermodynamique : fondements » qui vous permettra de mettre en application les concepts fondamentaux de la thermodynamique. Pour atteindre cet objectif, le Professeur J.-P
Thermodynamics
Ce cours complète le MOOC « Thermodynamique : fondements » qui vous permettra de mettre en application les concepts fondamentaux de la thermodynamique. Pour atteindre cet objectif, le Professeur J.-P
Thermodynamics
Ce cours vous apportera une compréhension des concepts fondamentaux de la thermodynamique du point de vue de la physique, de la chimie et de l’ingénierie. Il est scindé un deux MOOCs. Première partie:
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.