CSV

Character Encoding - CSV

Character encoding is a rule system for converting characters and symbols into byte sequences that computers can process. Various schemes exist, from international standards like ASCII and UTF-8, to Japanese-specific encodings like Shift_JIS and EUC-JP, to country-specific code pages. While UTF-8 based on Unicode is now widely adopted as the global standard, understanding various encoding schemes remains important for maintaining compatibility with legacy systems.

character code Unicode UTF-8 character set internationalization text processing
code,slug,name,description,category,ianaName,mibEnum
utf-8,utf-8,UTF-8,A variable-length character encoding that represents Unicode using 1 to 4 bytes.,Unicode,UTF-8,106
utf-16,utf-16,UTF-16,A character encoding that represents Unicode in 16-bit units.,Unicode,UTF-16,1015
utf-32,utf-32,UTF-32,A character encoding that represents Unicode in fixed-length 32 bits (4 bytes).,Unicode,UTF-32,1017
us-ascii,us-ascii,US-ASCII,A basic character encoding that defines 128 characters in 7 bits.,ASCII,US-ASCII,3
iso-8859-1,iso-8859-1,ISO-8859-1 (Latin-1),An 8-bit character encoding for Western European languages.,ISO-8859,ISO-8859-1,4
iso-8859-2,iso-8859-2,ISO-8859-2 (Latin-2),An 8-bit character encoding for Central European languages.,ISO-8859,ISO-8859-2,5
iso-8859-5,iso-8859-5,ISO-8859-5 (Cyrillic),An 8-bit character encoding for Cyrillic script.,ISO-8859,ISO-8859-5,8
iso-8859-7,iso-8859-7,ISO-8859-7 (Greek),An 8-bit character encoding for Modern Greek.,ISO-8859,ISO-8859-7,10
iso-8859-15,iso-8859-15,ISO-8859-15 (Latin-9),A revised version of ISO-8859-1 that includes the Euro sign.,ISO-8859,ISO-8859-15,111
shift_jis,shift-jis,Shift_JIS,A Japanese character encoding standardly used on Windows and Macintosh.,Japanese,Shift_JIS,17
euc-jp,euc-jp,EUC-JP,A Japanese character encoding used on Unix-like systems.,Japanese,EUC-JP,18
iso-2022-jp,iso-2022-jp,ISO-2022-JP,An encoding for Japanese email in 7-bit environments.,Japanese,ISO-2022-JP,39
gb2312,gb2312,GB2312,A basic character encoding for Simplified Chinese.,Chinese,GB2312,2025
gbk,gbk,GBK,A Chinese character encoding that extends GB2312.,Chinese,GBK,113
gb18030,gb18030,GB18030,"China's current national standard, capable of representing all Unicode characters.",Chinese,GB18030,114
big5,big5,Big5,A Traditional Chinese character encoding used in Taiwan and Hong Kong.,Chinese,Big5,2026
euc-kr,euc-kr,EUC-KR,A Korean character encoding used on Unix-like systems.,Korean,EUC-KR,38
iso-2022-kr,iso-2022-kr,ISO-2022-KR,An encoding for Korean email in 7-bit environments.,Korean,ISO-2022-KR,37
koi8-r,koi8-r,KOI8-R,An 8-bit character encoding for Russian Cyrillic.,Cyrillic,KOI8-R,2084
koi8-u,koi8-u,KOI8-U,An 8-bit character encoding for Ukrainian Cyrillic.,Cyrillic,KOI8-U,2088
windows-1252,windows-1252,Windows-1252,An 8-bit encoding for Western European languages used on Microsoft Windows.,Windows Code Page,windows-1252,2252