Related articles:
Unicode
ASCII
Mojibake
Code page
Character encoding
Universal Character Set
Windows-1252
Cyrillic alphabet
Byte-order mark
Notepad
Character encodings in HTML
Plan 9 from Bell Labs
Ken Thompson
XML
Key terms:
bytes
encoding
unicode
string
bom
ascii
invalid
rfc
bits
errors
iso
alphabet
code points
iec
character set
surrogate
algorithms
browsers
api
may indicate
disadvantages
annex
unicode standard
parser
cyrillic
one byte
ascii characters
code page
bell labs
compatibility
universal character set
programming language
bytes per character
obsolete
implementations
parsing
nul
rather than
mapping of unicode character planes
decoding
two bytes
surrogate characters
simplistic
first byte
unicode code
string has been encoded
sorting
basic multilingual plane
character encoding
three bytes
Search external links cited by footnotes on Wikipedia page UTF-8:
|
|