Markdown

Big Data Technologies - Markdown

Big data technologies refer to a collection of technologies designed to efficiently process large volumes, diverse types, and high-velocity data that traditional database management systems cannot handle. Centered around distributed processing frameworks such as Hadoop, Spark, Kafka, and Flink, these technologies form a comprehensive ecosystem for data collection, storage, processing, analysis, and visualization. They serve as the foundation for modern data-driven society, enabling real-time analytics, machine learning, IoT data processing, and business intelligence.

Big Data Distributed Processing Hadoop Spark Kafka Flink Data Engineering Stream Processing Batch Processing
| code | slug | name | description | category | initialRelease | latency | license | processingType |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | apache-hadoop | Apache Hadoop | An open-source framework for distributed storage and batch processing. | Distributed Storage & Batch Processing | 2006 | Minutes to Hours | Apache License 2.0 | Batch Processing |
| 2 | apache-spark | Apache Spark | A high-speed data processing engine using in-memory computation. | General-Purpose Distributed Processing Engine | 2014 | Seconds | Apache License 2.0 | Batch & Stream Processing (Micro-batch) |
| 3 | apache-kafka | Apache Kafka | A high-throughput distributed streaming platform. | Messaging & Streaming Platform | 2011 | Milliseconds | Apache License 2.0 | Stream Processing (Messaging) |
| 4 | apache-flink | Apache Flink | A distributed processing engine enabling true stream processing. | Stream Processing Engine | 2015 | Milliseconds | Apache License 2.0 | True Stream Processing |
| 5 | apache-hive | Apache Hive | Data warehouse software for running SQL-like queries on Hadoop. | Data Warehouse | 2010 | Minutes to Hours | Apache License 2.0 | Batch Processing |
| 6 | apache-storm | Apache Storm | A distributed real-time computation system. | Stream Processing Engine | 2011 | Milliseconds | Apache License 2.0 | Stream Processing |
| 7 | apache-hbase | Apache HBase | A distributed NoSQL database running on Hadoop. | NoSQL Database | 2010 | Milliseconds | Apache License 2.0 | Real-time Read/Write |
| 8 | apache-presto-trino | Apache Trino (formerly PrestoSQL) | A distributed SQL query engine for large-scale data. | Distributed SQL Query Engine | 2012 | Seconds to Minutes | Apache License 2.0 | Interactive Query |