TSV

World Languages Details - TSV

World Languages Details provides comprehensive information about major languages spoken worldwide, based on linguistic research such as Ethnologue. For each language, it includes language family classification, writing system (script), number of native speakers (L1), second language speakers (L2), and total speaker statistics. While there are currently over 7,000 languages in the world, this dataset focuses primarily on major languages with 50 million or more speakers. It can be utilized in various fields including linguistic research, international business, education, and translation services.

language language family writing system speakers linguistics multilingual international communication
code	slug	name	description	languageFamily	nativeSpeakers	secondLanguageSpeakers	totalSpeakers	writingSystem
1	english	English	The most widely spoken language in the world, functioning as an international lingua franca.	Indo-European (Germanic)	380000000	1140000000	1520000000	Latin alphabet
2	mandarin-chinese	Mandarin Chinese	The language with the most native speakers worldwide, primarily used in mainland China, Taiwan, and Singapore.	Sino-Tibetan (Sinitic)	940000000	240000000	1180000000	Chinese characters (Simplified/Traditional)
3	hindi	Hindi	A major language of India belonging to the Indo-Aryan branch.	Indo-European (Indo-Aryan)	350000000	260000000	610000000	Devanagari script
4	spanish	Spanish	A Romance language widely used in Spain and Latin America.	Indo-European (Romance)	490000000	70000000	560000000	Latin alphabet
5	french	French	A Romance language with growing speakers, particularly in Africa.	Indo-European (Romance)	80000000	240000000	320000000	Latin alphabet
6	arabic	Arabic	An Afro-Asiatic language widely used in the Middle East and North Africa.	Afro-Asiatic (Semitic)	320000000	20000000	340000000	Arabic script
7	bengali	Bengali	An Indo-Aryan language used in Bangladesh and eastern India.	Indo-European (Indo-Aryan)	230000000	50000000	280000000	Bengali script
8	portuguese	Portuguese	A Romance language used in the Lusophone world including Brazil.	Indo-European (Romance)	230000000	40000000	270000000	Latin alphabet
9	russian	Russian	A Slavic language widely used in former Soviet countries.	Indo-European (Slavic)	150000000	110000000	260000000	Cyrillic script
10	urdu	Urdu	An Indo-Aryan language and the national language of Pakistan.	Indo-European (Indo-Aryan)	70000000	180000000	250000000	Perso-Arabic script
11	indonesian	Indonesian	An Austronesian language and the official language of Indonesia.	Austronesian	40000000	210000000	250000000	Latin alphabet
12	german	German	A Germanic language widely used in Central Europe.	Indo-European (Germanic)	80000000	55000000	135000000	Latin alphabet
13	japanese	Japanese	A language spoken in Japan, considered a language isolate in terms of classification.	Japonic	124000000	1000000	125000000	Kanji, Hiragana, and Katakana
14	nigerian-pidgin	Nigerian Pidgin	An English-based creole widely used in Nigeria.	English-based Creole	5000000	115000000	120000000	Latin alphabet
15	egyptian-arabic	Egyptian Arabic	A variety of Arabic widely used in Egypt.	Afro-Asiatic (Semitic)	100000000	20000000	120000000	Arabic script
16	marathi	Marathi	An Indo-Aryan language of India, primarily used in Maharashtra state.	Indo-European (Indo-Aryan)	83000000	16000000	99000000	Devanagari script
17	telugu	Telugu	A Dravidian language widely used in southern India.	Dravidian	83000000	13000000	96000000	Telugu script
18	turkish	Turkish	A Turkic language primarily used in Turkey and Cyprus.	Turkic	84000000	7000000	91000000	Latin alphabet
19	tamil	Tamil	A classical Dravidian language used in southern India and Sri Lanka.	Dravidian	75000000	11000000	86000000	Tamil script
20	cantonese	Cantonese (Yue Chinese)	A Chinese variety primarily used in Hong Kong, Macau, and Guangdong Province.	Sino-Tibetan (Sinitic)	85000000	1000000	86000000	Chinese characters
21	vietnamese	Vietnamese	An Austroasiatic language and an East Asian language using the Latin alphabet.	Austroasiatic (Viet-Muong)	85000000	12000000	97000000	Latin alphabet (Quoc Ngu)
22	wu-chinese	Wu Chinese (Shanghainese)	A Chinese variety used in the Yangtze River Delta region centered on Shanghai.	Sino-Tibetan (Sinitic)	83000000	0	83000000	Chinese characters
23	tagalog	Tagalog (Filipino)	An Austronesian language and the official language of the Philippines.	Austronesian	30000000	53000000	83000000	Latin alphabet
24	korean	Korean	A language spoken on the Korean Peninsula with its own writing system called Hangul.	Koreanic (language isolate)	80000000	2000000	82000000	Hangul
25	farsi	Persian (Farsi)	An Indo-European language spoken in Iran.	Indo-European (Iranian)	55000000	24000000	79000000	Perso-Arabic script