Concept

Binary-to-text encoding

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data (such as email or NNTP) or is not 8-bit clean. PGP documentation () uses the term "ASCII armor" for binary-to-text encoding when referring to Base64. The basic need for a binary-to-text encoding comes from a need to communicate arbitrary binary data over preexisting communications protocols that were designed to carry only English language human-readable text. Those communication protocols may only be 7-bit safe (and within that avoid certain ASCII control codes), and may require line breaks at certain maximum intervals, and may not maintain whitespace. Thus, only the 94 printable ASCII characters are "safe" to use to convey data. The ASCII text-encoding standard uses 7 bits to encode characters. With this it is possible to encode 128 (i.e. 27) unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in English, plus a selection of Control characters which do not represent printable characters. For example, the capital letter A is represented in 7 bits as 100 00012, 0x41 (1018) , the numeral 2 is 011 00102 0x32 (628), the character } is 111 11012 0x7D (1758), and the Control character RETURN is 000 11012 0x0D (158). In contrast, most computers store data in memory organized in eight-bit bytes. Files that contain machine-executable code and non-textual data typically contain all 256 possible eight-bit byte values. Many computer programs came to rely on this distinction between seven-bit text and eight-bit binary data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (2)
CS-320: Computer language processing
We teach the fundamental aspects of analyzing and interpreting computer languages, including the techniques to build compilers. You will build a working compiler from an elegant functional language in
COM-102: Advanced information, computation, communication II
Text, sound, and images are examples of information sources stored in our computers and/or communicated over the Internet. How do we measure, compress, and protect the informatin they contain?
Related lectures (21)
Code Generation Lab
Covers generating code for a compiler, translating an Amy program to WebAssembly, including memory management and pattern matching compilation.
File Handling: Text and Bytes
Covers file handling, string operations, and character encodings in Python.
Text Encoding: Unicode and XML
Explores the evolution of text encoding through Unicode standards and XML, highlighting the challenges and advancements in multilingual word processing and text recognition technologies.
Show more
Related publications (5)

Idiap Submission to Swiss-German Language Detection Shared Task

Petr Motlicek

Language detection is a key part of the NLP pipeline for text processing. The task of automatically detecting languages belonging to disjoint groups is relatively easy. It is considerably challenging to detect languages that have similar origins or dialect ...
CEUR Workshop Proceedings2020

Open-Vocabulary Keyword Spotting with Audio and Text Embeddings

Martin Jaggi, Milos Cernak, Niccolò Sacchi

Keyword Spotting (KWS) systems allow detecting a set of spoken (pre-defined) keywords. Open-vocabulary KWS systems search for the keywords in the set of word hypotheses generated by an automatic speech recognition (ASR) system which is computationally expe ...
2019

Quantifying Collaboration in Synchronous Document Editing

Adrian Pierre Sergio Pace, Louis Antoine Baligand, Jennifer Kaitlyn Olsen, Stian Haklev

Collaborative synchronous writing tools like Google Docs and Etherpad let multiple users edit the same document and see each others edits in near real-time to simplify collaboration and avoid merge-conflicts. These tools are used extensively across many do ...
2018
Show more
Related concepts (8)
Ascii85
Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data (making the encoded size larger than the original, assuming eight bits per ASCII character), it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data ( increase, assuming eight bits per ASCII character). Its main modern uses are in Adobe's PostScript and Portable Document Format file formats, as well as in the patch encoding for s used by Git.
8-bit clean
8-bit clean is an attribute of computer systems, communication channels, and other devices and software, that handle 8-bit character encodings correctly. Such encoding include the ISO/IEC 8859 series and the UTF-8 encoding of Unicode. Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters, e.g., ETX, as control characters. Other assumed a stream of seven-bit characters, with values between 0 and 127; for example, the ASCII standard used only seven bits per character, avoiding an 8-bit representation in order to save on data transmission costs.
BinHex
BinHex, originally short for "binary-to-hexadecimal", is a binary-to-text encoding system that was used on the classic Mac OS for sending binary files through e-mail. Originally a hexadecimal encoding, subsequent versions of BinHex are more similar to uuencode, but combined both "forks" of the Mac file system together along with extended file information. BinHexed files take up more space than the original files, but will not be corrupted by non-"8-bit clean" software.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.