TSV

Big Data Technologies - TSV

Big data technologies refer to a collection of technologies designed to efficiently process large volumes, diverse types, and high-velocity data that traditional database management systems cannot handle. Centered around distributed processing frameworks such as Hadoop, Spark, Kafka, and Flink, these technologies form a comprehensive ecosystem for data collection, storage, processing, analysis, and visualization. They serve as the foundation for modern data-driven society, enabling real-time analytics, machine learning, IoT data processing, and business intelligence.

Big Data Distributed Processing Hadoop Spark Kafka Flink Data Engineering Stream Processing Batch Processing
code	slug	name	description	category	initialRelease	latency	license	processingType
1	apache-hadoop	Apache Hadoop	An open-source framework for distributed storage and batch processing.	Distributed Storage & Batch Processing	2006	Minutes to Hours	Apache License 2.0	Batch Processing
2	apache-spark	Apache Spark	A high-speed data processing engine using in-memory computation.	General-Purpose Distributed Processing Engine	2014	Seconds	Apache License 2.0	Batch & Stream Processing (Micro-batch)
3	apache-kafka	Apache Kafka	A high-throughput distributed streaming platform.	Messaging & Streaming Platform	2011	Milliseconds	Apache License 2.0	Stream Processing (Messaging)
4	apache-flink	Apache Flink	A distributed processing engine enabling true stream processing.	Stream Processing Engine	2015	Milliseconds	Apache License 2.0	True Stream Processing
5	apache-hive	Apache Hive	Data warehouse software for running SQL-like queries on Hadoop.	Data Warehouse	2010	Minutes to Hours	Apache License 2.0	Batch Processing
6	apache-storm	Apache Storm	A distributed real-time computation system.	Stream Processing Engine	2011	Milliseconds	Apache License 2.0	Stream Processing
7	apache-hbase	Apache HBase	A distributed NoSQL database running on Hadoop.	NoSQL Database	2010	Milliseconds	Apache License 2.0	Real-time Read/Write
8	apache-presto-trino	Apache Trino (formerly PrestoSQL)	A distributed SQL query engine for large-scale data.	Distributed SQL Query Engine	2012	Seconds to Minutes	Apache License 2.0	Interactive Query