Concept

Mojibake

Mojibake (文字化け; mod͡ʑibake, "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system. This display may include the generic replacement character ("�") in places where the binary representation is considered invalid. A replacement can also involve multiple consecutive symbols, as viewed in one encoding, when the same binary code constitutes one symbol in the other encoding. This is either because of differing constant length encoding (as in Asian 16-bit encodings vs European 8-bit encodings), or the use of variable length encodings (notably UTF-8 and UTF-16). Failed rendering of glyphs due to either missing fonts or missing glyphs in a font is a different issue that is not to be confused with mojibake. Symptoms of this failed rendering include blocks with the code point displayed in hexadecimal or using the generic replacement character. Importantly, these replacements are valid and are the result of correct error handling by the software. To correctly reproduce the original text that was encoded, the correspondence between the encoded data and the notion of its encoding must be preserved (i.e. the source and target encoding standards must be the same). As mojibake is the instance of non-compliance between these, it can be achieved by manipulating the data itself, or just relabelling it. Mojibake is often seen with text data that have been tagged with a wrong encoding; it may not even be tagged at all, but moved between computers with different default encodings. A major source of trouble are communication protocols that rely on settings on each computer rather than sending or storing metadata together with the data. The differing default settings between computers are in part due to differing deployments of Unicode among operating system families, and partly the legacy encodings' specializations for different writing systems of human languages.

About this result
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Related courses (10)
ME-213: Programmation pour ingénieur
Mettre en pratique les bases de la programmation vues au semestre précédent. Développer un logiciel structuré. Méthode de debug d'un logiciel. Introduction à la programmation scientifique. Introductio
ChE-312: Numerical methods
This course introduces students to modern computational and mathematical techniques for solving problems in chemistry and chemical engineering. The use of introduced numerical methods will be demonstr
FIN-406: Macrofinance
This course provides students with a working knowledge of macroeconomic models that explicitly incorporate financial markets. The goal is to develop a broad and analytical framework for analyzing the
Show more
Related lectures (29)
Code Generation Lab
Covers generating code for a compiler, translating an Amy program to WebAssembly, including memory management and pattern matching compilation.
MATLAB Essentials: Functions and Variables
Covers essential MATLAB functions, variables, loops, and debugging tools.
Matlab: 3D Surface Plotting
Covers logical arrays, 3D surface plotting, parametric curves, interpolation, and fitting in Matlab.
Show more
Related publications (3)

Content-Preserving Unpaired Translation from Simulated to Realistic Ultrasound Images

Devavrat Tomar, Lin Zhang

Interactive simulation of ultrasound imaging greatly facilitates sonography training. Although ray-tracing based methods have shown promising results, obtaining realistic images requires substantial modeling effort and manual parameter tuning. In addition, ...
Springer2021

Redéfinition du site de Maine Road. Dans une idée de revitaliser Moss Side (Manchester, GB)

Le site de Maine Road est un lieu symbolique pour toute la communauté du quartier défavorisé de Moss Side, à Manchester. Actuellement occupé par un stade de football qui sera bientôt vacant, ce site est un espace résiduel isolé au milieu d'une grande densité de rangées d'habitations. Ces Terraced Houses datent du début de l'ère industrielle. Ce projet consiste à redéfinir le site en juxtaposant différentes activités s'intégrant à la vie du quartier, permettant à celui-ci de s'ouvrir au reste de la ville. Ainsi, de nouveaux réseaux peuvent se développer. Il est composé d'un espace public ouvert délimité par divers éléments implantés tout autour, utilisant l'échelle et la géométrie définies par le bâti existant. L'espace central, végétal, devient un grand terrain de sport et de jeux pour la communauté. Il est en relation avec les infrastructures déjà existantes du quartier et permet au football de rester le symbole du site parmi les habitants. Autour, les nouveaux éléments agissent comme un filtre entre l'existant et l'espace vert. Ils accueillent du logement familial et estudiantin, des commerces, de l'artisanat et différentes activités utiles à la communauté. Ils dynamisent ainsi Moss Side en le transformant en pôle attractif.
2003

Mismatched decoding revisited: general alphabets, channels with memory, and the wide-band limit

Emre Telatar

The mismatch capacity of a channel is the highest rate at which reliable communication is possible over the channel with a given (possibly suboptimal) decoding rule. This quantity has been studied extensively for single-letter decoding rules over discrete ...
2000
Related concepts (17)
Big5
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set instead. Big5 gets its name from the consortium of five companies in Taiwan that developed it. The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical. The original Big5 character set lacked many commonly used characters.
Extended ASCII
Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.
Newline
A newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or a sequence of characters, is used to signify the end of a and the start of a new one. In the mid-1800s, long before the advent of teleprinters and teletype machines, Morse code operators or telegraphists invented and used Morse code prosigns to encode white space text formatting in formal written text messages.
Show more

Graph Chatbot

Chat with Graph Search

Ask any question about EPFL courses, lectures, exercises, research, news, etc. or try the example questions below.

DISCLAIMER: The Graph Chatbot is not programmed to provide explicit or categorical answers to your questions. Rather, it transforms your questions into API requests that are distributed across the various IT services officially administered by EPFL. Its purpose is solely to collect and recommend relevant references to content that you can explore to help you answer your questions.