TSV
Character Encoding - TSV
Character encoding is a rule system for converting characters and symbols into byte sequences that computers can process. Various schemes exist, from international standards like ASCII and UTF-8, to Japanese-specific encodings like Shift_JIS and EUC-JP, to country-specific code pages. While UTF-8 based on Unicode is now widely adopted as the global standard, understanding various encoding schemes remains important for maintaining compatibility with legacy systems.
character code
Unicode
UTF-8
character set
internationalization
text processing
code slug name description category ianaName mibEnum
utf-8 utf-8 UTF-8 A variable-length character encoding that represents Unicode using 1 to 4 bytes. Unicode UTF-8 106
utf-16 utf-16 UTF-16 A character encoding that represents Unicode in 16-bit units. Unicode UTF-16 1015
utf-32 utf-32 UTF-32 A character encoding that represents Unicode in fixed-length 32 bits (4 bytes). Unicode UTF-32 1017
us-ascii us-ascii US-ASCII A basic character encoding that defines 128 characters in 7 bits. ASCII US-ASCII 3
iso-8859-1 iso-8859-1 ISO-8859-1 (Latin-1) An 8-bit character encoding for Western European languages. ISO-8859 ISO-8859-1 4
iso-8859-2 iso-8859-2 ISO-8859-2 (Latin-2) An 8-bit character encoding for Central European languages. ISO-8859 ISO-8859-2 5
iso-8859-5 iso-8859-5 ISO-8859-5 (Cyrillic) An 8-bit character encoding for Cyrillic script. ISO-8859 ISO-8859-5 8
iso-8859-7 iso-8859-7 ISO-8859-7 (Greek) An 8-bit character encoding for Modern Greek. ISO-8859 ISO-8859-7 10
iso-8859-15 iso-8859-15 ISO-8859-15 (Latin-9) A revised version of ISO-8859-1 that includes the Euro sign. ISO-8859 ISO-8859-15 111
shift_jis shift-jis Shift_JIS A Japanese character encoding standardly used on Windows and Macintosh. Japanese Shift_JIS 17
euc-jp euc-jp EUC-JP A Japanese character encoding used on Unix-like systems. Japanese EUC-JP 18
iso-2022-jp iso-2022-jp ISO-2022-JP An encoding for Japanese email in 7-bit environments. Japanese ISO-2022-JP 39
gb2312 gb2312 GB2312 A basic character encoding for Simplified Chinese. Chinese GB2312 2025
gbk gbk GBK A Chinese character encoding that extends GB2312. Chinese GBK 113
gb18030 gb18030 GB18030 China's current national standard, capable of representing all Unicode characters. Chinese GB18030 114
big5 big5 Big5 A Traditional Chinese character encoding used in Taiwan and Hong Kong. Chinese Big5 2026
euc-kr euc-kr EUC-KR A Korean character encoding used on Unix-like systems. Korean EUC-KR 38
iso-2022-kr iso-2022-kr ISO-2022-KR An encoding for Korean email in 7-bit environments. Korean ISO-2022-KR 37
koi8-r koi8-r KOI8-R An 8-bit character encoding for Russian Cyrillic. Cyrillic KOI8-R 2084
koi8-u koi8-u KOI8-U An 8-bit character encoding for Ukrainian Cyrillic. Cyrillic KOI8-U 2088
windows-1252 windows-1252 Windows-1252 An 8-bit encoding for Western European languages used on Microsoft Windows. Windows Code Page windows-1252 2252