Attention-based transformers have achieved tremendous success across a variety of disciplines including natural languages. To deepen our understanding of their sequential modeling capabilities, there is a growing interest in using Markov input processes to ...
In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current un ...
Large language models (LLMs) have recently gained much popularity due to their surprising ability at generating human-like English sentences. LLMs are essentially predictors, estimating the probability of a sequence of words given the past. Therefore, it i ...
Inspired by Sibson’s alpha-mutual information, we introduce a new parametric class of universal predictors. This class interpolates two well-known predictors, the mixture estimator, that includes the Laplace and the Krichevsky-Trofimov predictors, and the ...
We revise the proof of low-rate upper bounds on the reliability function of discrete memoryless channels for ordinary and list-decoding schemes, in particular Berlekamp and Blinovsky's zero-rate bound, as well as Blahut's bound for low rates. The available ...
We derive an upper bound on the reliability function of mismatched decoding for zero-rate codes. The bound is based on a result by Komlos that shows the existence of a subcode with certain symmetry properties. The bound is shown to coincide with the expurg ...
We derive an upper bound on the reliability function of mismatched decoding for zero-rate codes. The bound is based on a result by Komlos that shows the existence of a subcode with certain symmetry properties. The bound is shown to coincide with the expurg ...