HTML

Character Encoding - HTML

Character encoding is a rule system for converting characters and symbols into byte sequences that computers can process. Various schemes exist, from international standards like ASCII and UTF-8, to Japanese-specific encodings like Shift_JIS and EUC-JP, to country-specific code pages. While UTF-8 based on Unicode is now widely adopted as the global standard, understanding various encoding schemes remains important for maintaining compatibility with legacy systems.

character code Unicode UTF-8 character set internationalization text processing
<table>
<thead><tr><th>code</th><th>slug</th><th>name</th><th>description</th><th>category</th><th>ianaName</th><th>mibEnum</th></tr></thead>
<tbody><tr><td>utf-8</td><td>utf-8</td><td>UTF-8</td><td>A variable-length character encoding that represents Unicode using 1 to 4 bytes.</td><td>Unicode</td><td>UTF-8</td><td>106</td></tr>
<tr><td>utf-16</td><td>utf-16</td><td>UTF-16</td><td>A character encoding that represents Unicode in 16-bit units.</td><td>Unicode</td><td>UTF-16</td><td>1015</td></tr>
<tr><td>utf-32</td><td>utf-32</td><td>UTF-32</td><td>A character encoding that represents Unicode in fixed-length 32 bits (4 bytes).</td><td>Unicode</td><td>UTF-32</td><td>1017</td></tr>
<tr><td>us-ascii</td><td>us-ascii</td><td>US-ASCII</td><td>A basic character encoding that defines 128 characters in 7 bits.</td><td>ASCII</td><td>US-ASCII</td><td>3</td></tr>
<tr><td>iso-8859-1</td><td>iso-8859-1</td><td>ISO-8859-1 (Latin-1)</td><td>An 8-bit character encoding for Western European languages.</td><td>ISO-8859</td><td>ISO-8859-1</td><td>4</td></tr>
<tr><td>iso-8859-2</td><td>iso-8859-2</td><td>ISO-8859-2 (Latin-2)</td><td>An 8-bit character encoding for Central European languages.</td><td>ISO-8859</td><td>ISO-8859-2</td><td>5</td></tr>
<tr><td>iso-8859-5</td><td>iso-8859-5</td><td>ISO-8859-5 (Cyrillic)</td><td>An 8-bit character encoding for Cyrillic script.</td><td>ISO-8859</td><td>ISO-8859-5</td><td>8</td></tr>
<tr><td>iso-8859-7</td><td>iso-8859-7</td><td>ISO-8859-7 (Greek)</td><td>An 8-bit character encoding for Modern Greek.</td><td>ISO-8859</td><td>ISO-8859-7</td><td>10</td></tr>
<tr><td>iso-8859-15</td><td>iso-8859-15</td><td>ISO-8859-15 (Latin-9)</td><td>A revised version of ISO-8859-1 that includes the Euro sign.</td><td>ISO-8859</td><td>ISO-8859-15</td><td>111</td></tr>
<tr><td>shift_jis</td><td>shift-jis</td><td>Shift_JIS</td><td>A Japanese character encoding standardly used on Windows and Macintosh.</td><td>Japanese</td><td>Shift_JIS</td><td>17</td></tr>
<tr><td>euc-jp</td><td>euc-jp</td><td>EUC-JP</td><td>A Japanese character encoding used on Unix-like systems.</td><td>Japanese</td><td>EUC-JP</td><td>18</td></tr>
<tr><td>iso-2022-jp</td><td>iso-2022-jp</td><td>ISO-2022-JP</td><td>An encoding for Japanese email in 7-bit environments.</td><td>Japanese</td><td>ISO-2022-JP</td><td>39</td></tr>
<tr><td>gb2312</td><td>gb2312</td><td>GB2312</td><td>A basic character encoding for Simplified Chinese.</td><td>Chinese</td><td>GB2312</td><td>2025</td></tr>
<tr><td>gbk</td><td>gbk</td><td>GBK</td><td>A Chinese character encoding that extends GB2312.</td><td>Chinese</td><td>GBK</td><td>113</td></tr>
<tr><td>gb18030</td><td>gb18030</td><td>GB18030</td><td>China's current national standard, capable of representing all Unicode characters.</td><td>Chinese</td><td>GB18030</td><td>114</td></tr>
<tr><td>big5</td><td>big5</td><td>Big5</td><td>A Traditional Chinese character encoding used in Taiwan and Hong Kong.</td><td>Chinese</td><td>Big5</td><td>2026</td></tr>
<tr><td>euc-kr</td><td>euc-kr</td><td>EUC-KR</td><td>A Korean character encoding used on Unix-like systems.</td><td>Korean</td><td>EUC-KR</td><td>38</td></tr>
<tr><td>iso-2022-kr</td><td>iso-2022-kr</td><td>ISO-2022-KR</td><td>An encoding for Korean email in 7-bit environments.</td><td>Korean</td><td>ISO-2022-KR</td><td>37</td></tr>
<tr><td>koi8-r</td><td>koi8-r</td><td>KOI8-R</td><td>An 8-bit character encoding for Russian Cyrillic.</td><td>Cyrillic</td><td>KOI8-R</td><td>2084</td></tr>
<tr><td>koi8-u</td><td>koi8-u</td><td>KOI8-U</td><td>An 8-bit character encoding for Ukrainian Cyrillic.</td><td>Cyrillic</td><td>KOI8-U</td><td>2088</td></tr>
<tr><td>windows-1252</td><td>windows-1252</td><td>Windows-1252</td><td>An 8-bit encoding for Western European languages used on Microsoft Windows.</td><td>Windows Code Page</td><td>windows-1252</td><td>2252</td></tr></tbody>
</table>