Mojibake (文字化け; mod͡ʑibake, "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system.
This display may include the generic replacement character ("�") in places where the binary representation is considered invalid. A replacement can also involve multiple consecutive symbols, as viewed in one encoding, when the same binary code constitutes one symbol in the other encoding. This is either because of differing constant length encoding (as in Asian 16-bit encodings vs European 8-bit encodings), or the use of variable length encodings (notably UTF-8 and UTF-16).
Failed rendering of glyphs due to either missing fonts or missing glyphs in a font is a different issue that is not to be confused with mojibake. Symptoms of this failed rendering include blocks with the code point displayed in hexadecimal or using the generic replacement character. Importantly, these replacements are valid and are the result of correct error handling by the software.
To correctly reproduce the original text that was encoded, the correspondence between the encoded data and the notion of its encoding must be preserved (i.e. the source and target encoding standards must be the same). As mojibake is the instance of non-compliance between these, it can be achieved by manipulating the data itself, or just relabelling it.
Mojibake is often seen with text data that have been tagged with a wrong encoding; it may not even be tagged at all, but moved between computers with different default encodings. A major source of trouble are communication protocols that rely on settings on each computer rather than sending or storing metadata together with the data.
The differing default settings between computers are in part due to differing deployments of Unicode among operating system families, and partly the legacy encodings' specializations for different writing systems of human languages.
This page is automatically generated and may contain information that is not correct, complete, up-to-date, or relevant to your search query. The same applies to every other page on this website. Please make sure to verify the information with EPFL's official sources.
Mettre en pratique les bases de la programmation vues au semestre précédent. Développer un logiciel structuré. Méthode de debug d'un logiciel. Introduction à la programmation scientifique. Introductio
This course introduces students to modern computational and mathematical techniques for solving problems in chemistry and chemical engineering. The use of introduced numerical methods will be demonstr
This course provides students with a working knowledge of macroeconomic models that explicitly incorporate financial markets. The goal is to develop a broad and analytical framework for analyzing the
Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character set instead. Big5 gets its name from the consortium of five companies in Taiwan that developed it. The original Big5 character set is sorted first by usage frequency, second by stroke count, lastly by Kangxi radical. The original Big5 character set lacked many commonly used characters.
Mojibake (文字化け; mod͡ʑibake, "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system. This display may include the generic replacement character ("�") in places where the binary representation is considered invalid. A replacement can also involve multiple consecutive symbols, as viewed in one encoding, when the same binary code constitutes one symbol in the other encoding.
Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case.