Markdown

Character Encoding - Markdown

Character encoding is a rule system for converting characters and symbols into byte sequences that computers can process. Various schemes exist, from international standards like ASCII and UTF-8, to Japanese-specific encodings like Shift_JIS and EUC-JP, to country-specific code pages. While UTF-8 based on Unicode is now widely adopted as the global standard, understanding various encoding schemes remains important for maintaining compatibility with legacy systems.

character code Unicode UTF-8 character set internationalization text processing

| code | slug | name | description | category | ianaName | mibEnum |
| --- | --- | --- | --- | --- | --- | --- |
| utf-8 | utf-8 | UTF-8 | A variable-length character encoding that represents Unicode using 1 to 4 bytes. | Unicode | UTF-8 | 106 |
| utf-16 | utf-16 | UTF-16 | A character encoding that represents Unicode in 16-bit units. | Unicode | UTF-16 | 1015 |
| utf-32 | utf-32 | UTF-32 | A character encoding that represents Unicode in fixed-length 32 bits (4 bytes). | Unicode | UTF-32 | 1017 |
| us-ascii | us-ascii | US-ASCII | A basic character encoding that defines 128 characters in 7 bits. | ASCII | US-ASCII | 3 |
| iso-8859-1 | iso-8859-1 | ISO-8859-1 (Latin-1) | An 8-bit character encoding for Western European languages. | ISO-8859 | ISO-8859-1 | 4 |
| iso-8859-2 | iso-8859-2 | ISO-8859-2 (Latin-2) | An 8-bit character encoding for Central European languages. | ISO-8859 | ISO-8859-2 | 5 |
| iso-8859-5 | iso-8859-5 | ISO-8859-5 (Cyrillic) | An 8-bit character encoding for Cyrillic script. | ISO-8859 | ISO-8859-5 | 8 |
| iso-8859-7 | iso-8859-7 | ISO-8859-7 (Greek) | An 8-bit character encoding for Modern Greek. | ISO-8859 | ISO-8859-7 | 10 |
| iso-8859-15 | iso-8859-15 | ISO-8859-15 (Latin-9) | A revised version of ISO-8859-1 that includes the Euro sign. | ISO-8859 | ISO-8859-15 | 111 |
| shift_jis | shift-jis | Shift_JIS | A Japanese character encoding standardly used on Windows and Macintosh. | Japanese | Shift_JIS | 17 |
| euc-jp | euc-jp | EUC-JP | A Japanese character encoding used on Unix-like systems. | Japanese | EUC-JP | 18 |
| iso-2022-jp | iso-2022-jp | ISO-2022-JP | An encoding for Japanese email in 7-bit environments. | Japanese | ISO-2022-JP | 39 |
| gb2312 | gb2312 | GB2312 | A basic character encoding for Simplified Chinese. | Chinese | GB2312 | 2025 |
| gbk | gbk | GBK | A Chinese character encoding that extends GB2312. | Chinese | GBK | 113 |
| gb18030 | gb18030 | GB18030 | China's current national standard, capable of representing all Unicode characters. | Chinese | GB18030 | 114 |
| big5 | big5 | Big5 | A Traditional Chinese character encoding used in Taiwan and Hong Kong. | Chinese | Big5 | 2026 |
| euc-kr | euc-kr | EUC-KR | A Korean character encoding used on Unix-like systems. | Korean | EUC-KR | 38 |
| iso-2022-kr | iso-2022-kr | ISO-2022-KR | An encoding for Korean email in 7-bit environments. | Korean | ISO-2022-KR | 37 |
| koi8-r | koi8-r | KOI8-R | An 8-bit character encoding for Russian Cyrillic. | Cyrillic | KOI8-R | 2084 |
| koi8-u | koi8-u | KOI8-U | An 8-bit character encoding for Ukrainian Cyrillic. | Cyrillic | KOI8-U | 2088 |
| windows-1252 | windows-1252 | Windows-1252 | An 8-bit encoding for Western European languages used on Microsoft Windows. | Windows Code Page | windows-1252 | 2252 |