Inspired by Sibson’s alpha-mutual information, we introduce a new parametric class of universal predictors. This class interpolates two well-known predictors, the mixture estimator, that includes the Laplace and the Krichevsky-Trofimov predictors, and the ...
We revise the proof of low-rate upper bounds on the reliability function of discrete memoryless channels for ordinary and list-decoding schemes, in particular Berlekamp and Blinovsky's zero-rate bound, as well as Blahut's bound for low rates. The available ...
We derive an upper bound on the reliability function of mismatched decoding for zero-rate codes. The bound is based on a result by Komlos that shows the existence of a subcode with certain symmetry properties. The bound is shown to coincide with the expurg ...
In recent years, transformer-based models have revolutionized deep learning, particularly in sequence modeling. To better understand this phenomenon, there is a growing interest in using Markov input processes to study transformers. However, our current un ...
Attention-based transformers have achieved tremendous success across a variety of disciplines including natural languages. To deepen our understanding of their sequential modeling capabilities, there is a growing interest in using Markov input processes to ...
Large language models (LLMs) have recently gained much popularity due to their surprising ability at generating human-like English sentences. LLMs are essentially predictors, estimating the probability of a sequence of words given the past. Therefore, it i ...
We derive an upper bound on the reliability function of mismatched decoding for zero-rate codes. The bound is based on a result by Komlos that shows the existence of a subcode with certain symmetry properties. The bound is shown to coincide with the expurg ...