TSV
Big Data Technologies - TSV
Big data technologies refer to a collection of technologies designed to efficiently process large volumes, diverse types, and high-velocity data that traditional database management systems cannot handle. Centered around distributed processing frameworks such as Hadoop, Spark, Kafka, and Flink, these technologies form a comprehensive ecosystem for data collection, storage, processing, analysis, and visualization. They serve as the foundation for modern data-driven society, enabling real-time analytics, machine learning, IoT data processing, and business intelligence.
Big Data
Distributed Processing
Hadoop
Spark
Kafka
Flink
Data Engineering
Stream Processing
Batch Processing
code slug name description category initialRelease latency license processingType
1 apache-hadoop Apache Hadoop An open-source framework for distributed storage and batch processing. Distributed Storage & Batch Processing 2006 Minutes to Hours Apache License 2.0 Batch Processing
2 apache-spark Apache Spark A high-speed data processing engine using in-memory computation. General-Purpose Distributed Processing Engine 2014 Seconds Apache License 2.0 Batch & Stream Processing (Micro-batch)
3 apache-kafka Apache Kafka A high-throughput distributed streaming platform. Messaging & Streaming Platform 2011 Milliseconds Apache License 2.0 Stream Processing (Messaging)
4 apache-flink Apache Flink A distributed processing engine enabling true stream processing. Stream Processing Engine 2015 Milliseconds Apache License 2.0 True Stream Processing
5 apache-hive Apache Hive Data warehouse software for running SQL-like queries on Hadoop. Data Warehouse 2010 Minutes to Hours Apache License 2.0 Batch Processing
6 apache-storm Apache Storm A distributed real-time computation system. Stream Processing Engine 2011 Milliseconds Apache License 2.0 Stream Processing
7 apache-hbase Apache HBase A distributed NoSQL database running on Hadoop. NoSQL Database 2010 Milliseconds Apache License 2.0 Real-time Read/Write
8 apache-presto-trino Apache Trino (formerly PrestoSQL) A distributed SQL query engine for large-scale data. Distributed SQL Query Engine 2012 Seconds to Minutes Apache License 2.0 Interactive Query