TSV

Character Encoding - TSV

Character encoding is a rule system for converting characters and symbols into byte sequences that computers can process. Various schemes exist, from international standards like ASCII and UTF-8, to Japanese-specific encodings like Shift_JIS and EUC-JP, to country-specific code pages. While UTF-8 based on Unicode is now widely adopted as the global standard, understanding various encoding schemes remains important for maintaining compatibility with legacy systems.

character code Unicode UTF-8 character set internationalization text processing
code	slug	name	description	category	ianaName	mibEnum
utf-8	utf-8	UTF-8	A variable-length character encoding that represents Unicode using 1 to 4 bytes.	Unicode	UTF-8	106
utf-16	utf-16	UTF-16	A character encoding that represents Unicode in 16-bit units.	Unicode	UTF-16	1015
utf-32	utf-32	UTF-32	A character encoding that represents Unicode in fixed-length 32 bits (4 bytes).	Unicode	UTF-32	1017
us-ascii	us-ascii	US-ASCII	A basic character encoding that defines 128 characters in 7 bits.	ASCII	US-ASCII	3
iso-8859-1	iso-8859-1	ISO-8859-1 (Latin-1)	An 8-bit character encoding for Western European languages.	ISO-8859	ISO-8859-1	4
iso-8859-2	iso-8859-2	ISO-8859-2 (Latin-2)	An 8-bit character encoding for Central European languages.	ISO-8859	ISO-8859-2	5
iso-8859-5	iso-8859-5	ISO-8859-5 (Cyrillic)	An 8-bit character encoding for Cyrillic script.	ISO-8859	ISO-8859-5	8
iso-8859-7	iso-8859-7	ISO-8859-7 (Greek)	An 8-bit character encoding for Modern Greek.	ISO-8859	ISO-8859-7	10
iso-8859-15	iso-8859-15	ISO-8859-15 (Latin-9)	A revised version of ISO-8859-1 that includes the Euro sign.	ISO-8859	ISO-8859-15	111
shift_jis	shift-jis	Shift_JIS	A Japanese character encoding standardly used on Windows and Macintosh.	Japanese	Shift_JIS	17
euc-jp	euc-jp	EUC-JP	A Japanese character encoding used on Unix-like systems.	Japanese	EUC-JP	18
iso-2022-jp	iso-2022-jp	ISO-2022-JP	An encoding for Japanese email in 7-bit environments.	Japanese	ISO-2022-JP	39
gb2312	gb2312	GB2312	A basic character encoding for Simplified Chinese.	Chinese	GB2312	2025
gbk	gbk	GBK	A Chinese character encoding that extends GB2312.	Chinese	GBK	113
gb18030	gb18030	GB18030	China's current national standard, capable of representing all Unicode characters.	Chinese	GB18030	114
big5	big5	Big5	A Traditional Chinese character encoding used in Taiwan and Hong Kong.	Chinese	Big5	2026
euc-kr	euc-kr	EUC-KR	A Korean character encoding used on Unix-like systems.	Korean	EUC-KR	38
iso-2022-kr	iso-2022-kr	ISO-2022-KR	An encoding for Korean email in 7-bit environments.	Korean	ISO-2022-KR	37
koi8-r	koi8-r	KOI8-R	An 8-bit character encoding for Russian Cyrillic.	Cyrillic	KOI8-R	2084
koi8-u	koi8-u	KOI8-U	An 8-bit character encoding for Ukrainian Cyrillic.	Cyrillic	KOI8-U	2088
windows-1252	windows-1252	Windows-1252	An 8-bit encoding for Western European languages used on Microsoft Windows.	Windows Code Page	windows-1252	2252