TSV

Character Encoding Standards - TSV

Character encoding standards are specifications for representing characters as digital data in computers. Various schemes exist including ASCII, UTF-8, UTF-16, Shift_JIS, and EUC-JP, each with different character sets, byte structures, and compatibility characteristics. While UTF-8 has become the international standard widely adopted today, legacy encodings like Shift_JIS and EUC-JP are still used in Japanese computing environments.

character encoding Unicode UTF-8 ASCII Shift_JIS EUC-JP charset internationalization
code	slug	name	description	asciiCompatible	byteStructure	japaneseSupport	maxCharacters	usage	yearIntroduced
ASCII	ascii	ASCII	American Standard Code for Information Interchange. Represents alphanumeric characters and symbols in 7 bits.	true	固定長(1バイト、7ビット使用)	false	128	legacy	1963
UTF-8	utf-8	UTF-8	Variable-length Unicode encoding. ASCII-compatible and represents characters worldwide.	true	可変長(1〜4バイト)	true	1114112	standard	1993
UTF-16	utf-16	UTF-16	16-bit Unicode encoding. Widely used in Windows and Java.	false	可変長(2または4バイト)	true	1114112	system	1996
UTF-32	utf-32	UTF-32	Fixed-length 32-bit Unicode encoding. Used for internal processing.	false	固定長(4バイト)	true	1114112	internal	1996
Shift_JIS	shift-jis	Shift_JIS	Legacy Japanese encoding. Widely used in Windows systems.	false	可変長(1〜2バイト)	true	10000	legacy	1978
EUC-JP	euc-jp	EUC-JP	Japanese encoding used in Unix/Linux. ASCII-compatible.	true	可変長(1〜3バイト)	true	11000	legacy	1988
ISO-2022-JP	iso-2022-jp	ISO-2022-JP	7-bit escape sequence Japanese encoding. Used in email.	true	7ビット可変長(エスケープシーケンス使用)	true	10000	legacy	1983
GB2312	gb2312	GB2312	National standard encoding for Simplified Chinese.	false	可変長(1〜2バイト)	false	7445	legacy	1980
Big5	big5	Big5	Traditional Chinese encoding used in Taiwan and Hong Kong.	false	可変長(1〜2バイト)	false	13000	legacy	1984
Windows-1252	windows-1252	Windows-1252	Western European encoding used in Windows.	true	固定長(1バイト)	false	256	legacy	1992