CSV
Big Data Technologies - CSV
Big data technologies refer to a collection of technologies designed to efficiently process large volumes, diverse types, and high-velocity data that traditional database management systems cannot handle. Centered around distributed processing frameworks such as Hadoop, Spark, Kafka, and Flink, these technologies form a comprehensive ecosystem for data collection, storage, processing, analysis, and visualization. They serve as the foundation for modern data-driven society, enabling real-time analytics, machine learning, IoT data processing, and business intelligence.
Big Data
Distributed Processing
Hadoop
Spark
Kafka
Flink
Data Engineering
Stream Processing
Batch Processing
code,slug,name,description,category,initialRelease,latency,license,processingType
1,apache-hadoop,Apache Hadoop,An open-source framework for distributed storage and batch processing.,Distributed Storage & Batch Processing,2006,Minutes to Hours,Apache License 2.0,Batch Processing
2,apache-spark,Apache Spark,A high-speed data processing engine using in-memory computation.,General-Purpose Distributed Processing Engine,2014,Seconds,Apache License 2.0,Batch & Stream Processing (Micro-batch)
3,apache-kafka,Apache Kafka,A high-throughput distributed streaming platform.,Messaging & Streaming Platform,2011,Milliseconds,Apache License 2.0,Stream Processing (Messaging)
4,apache-flink,Apache Flink,A distributed processing engine enabling true stream processing.,Stream Processing Engine,2015,Milliseconds,Apache License 2.0,True Stream Processing
5,apache-hive,Apache Hive,Data warehouse software for running SQL-like queries on Hadoop.,Data Warehouse,2010,Minutes to Hours,Apache License 2.0,Batch Processing
6,apache-storm,Apache Storm,A distributed real-time computation system.,Stream Processing Engine,2011,Milliseconds,Apache License 2.0,Stream Processing
7,apache-hbase,Apache HBase,A distributed NoSQL database running on Hadoop.,NoSQL Database,2010,Milliseconds,Apache License 2.0,Real-time Read/Write
8,apache-presto-trino,Apache Trino (formerly PrestoSQL),A distributed SQL query engine for large-scale data.,Distributed SQL Query Engine,2012,Seconds to Minutes,Apache License 2.0,Interactive Query