HTML

Character Encoding Standards - HTML

Character encoding standards are specifications for representing characters as digital data in computers. Various schemes exist including ASCII, UTF-8, UTF-16, Shift_JIS, and EUC-JP, each with different character sets, byte structures, and compatibility characteristics. While UTF-8 has become the international standard widely adopted today, legacy encodings like Shift_JIS and EUC-JP are still used in Japanese computing environments.

character encoding Unicode UTF-8 ASCII Shift_JIS EUC-JP charset internationalization
<table>
<thead><tr><th>code</th><th>slug</th><th>name</th><th>description</th><th>asciiCompatible</th><th>byteStructure</th><th>japaneseSupport</th><th>maxCharacters</th><th>usage</th><th>yearIntroduced</th></tr></thead>
<tbody><tr><td>ASCII</td><td>ascii</td><td>ASCII</td><td>American Standard Code for Information Interchange. Represents alphanumeric characters and symbols in 7 bits.</td><td>true</td><td>固定長(1バイト、7ビット使用)</td><td>false</td><td>128</td><td>legacy</td><td>1963</td></tr>
<tr><td>UTF-8</td><td>utf-8</td><td>UTF-8</td><td>Variable-length Unicode encoding. ASCII-compatible and represents characters worldwide.</td><td>true</td><td>可変長(1〜4バイト)</td><td>true</td><td>1114112</td><td>standard</td><td>1993</td></tr>
<tr><td>UTF-16</td><td>utf-16</td><td>UTF-16</td><td>16-bit Unicode encoding. Widely used in Windows and Java.</td><td>false</td><td>可変長(2または4バイト)</td><td>true</td><td>1114112</td><td>system</td><td>1996</td></tr>
<tr><td>UTF-32</td><td>utf-32</td><td>UTF-32</td><td>Fixed-length 32-bit Unicode encoding. Used for internal processing.</td><td>false</td><td>固定長(4バイト)</td><td>true</td><td>1114112</td><td>internal</td><td>1996</td></tr>
<tr><td>Shift_JIS</td><td>shift-jis</td><td>Shift_JIS</td><td>Legacy Japanese encoding. Widely used in Windows systems.</td><td>false</td><td>可変長(1〜2バイト)</td><td>true</td><td>10000</td><td>legacy</td><td>1978</td></tr>
<tr><td>EUC-JP</td><td>euc-jp</td><td>EUC-JP</td><td>Japanese encoding used in Unix/Linux. ASCII-compatible.</td><td>true</td><td>可変長(1〜3バイト)</td><td>true</td><td>11000</td><td>legacy</td><td>1988</td></tr>
<tr><td>ISO-2022-JP</td><td>iso-2022-jp</td><td>ISO-2022-JP</td><td>7-bit escape sequence Japanese encoding. Used in email.</td><td>true</td><td>7ビット可変長(エスケープシーケンス使用)</td><td>true</td><td>10000</td><td>legacy</td><td>1983</td></tr>
<tr><td>GB2312</td><td>gb2312</td><td>GB2312</td><td>National standard encoding for Simplified Chinese.</td><td>false</td><td>可変長(1〜2バイト)</td><td>false</td><td>7445</td><td>legacy</td><td>1980</td></tr>
<tr><td>Big5</td><td>big5</td><td>Big5</td><td>Traditional Chinese encoding used in Taiwan and Hong Kong.</td><td>false</td><td>可変長(1〜2バイト)</td><td>false</td><td>13000</td><td>legacy</td><td>1984</td></tr>
<tr><td>Windows-1252</td><td>windows-1252</td><td>Windows-1252</td><td>Western European encoding used in Windows.</td><td>true</td><td>固定長(1バイト)</td><td>false</td><td>256</td><td>legacy</td><td>1992</td></tr></tbody>
</table>