YAML
Big Data Technologies - YAML
Big data technologies refer to a collection of technologies designed to efficiently process large volumes, diverse types, and high-velocity data that traditional database management systems cannot handle. Centered around distributed processing frameworks such as Hadoop, Spark, Kafka, and Flink, these technologies form a comprehensive ecosystem for data collection, storage, processing, analysis, and visualization. They serve as the foundation for modern data-driven society, enabling real-time analytics, machine learning, IoT data processing, and business intelligence.
Big Data
Distributed Processing
Hadoop
Spark
Kafka
Flink
Data Engineering
Stream Processing
Batch Processing
- code: "1"
slug: "apache-hadoop"
name: "Apache Hadoop"
description: "An open-source framework for distributed storage and batch processing."
category: "Distributed Storage & Batch Processing"
processingType: "Batch Processing"
latency: "Minutes to Hours"
initialRelease: "2006"
license: "Apache License 2.0"
- code: "2"
slug: "apache-spark"
name: "Apache Spark"
description: "A high-speed data processing engine using in-memory computation."
category: "General-Purpose Distributed Processing Engine"
processingType: "Batch & Stream Processing (Micro-batch)"
latency: "Seconds"
initialRelease: "2014"
license: "Apache License 2.0"
- code: "3"
slug: "apache-kafka"
name: "Apache Kafka"
description: "A high-throughput distributed streaming platform."
category: "Messaging & Streaming Platform"
processingType: "Stream Processing (Messaging)"
latency: "Milliseconds"
initialRelease: "2011"
license: "Apache License 2.0"
- code: "4"
slug: "apache-flink"
name: "Apache Flink"
description: "A distributed processing engine enabling true stream processing."
category: "Stream Processing Engine"
processingType: "True Stream Processing"
latency: "Milliseconds"
initialRelease: "2015"
license: "Apache License 2.0"
- code: "5"
slug: "apache-hive"
name: "Apache Hive"
description: "Data warehouse software for running SQL-like queries on Hadoop."
category: "Data Warehouse"
processingType: "Batch Processing"
latency: "Minutes to Hours"
initialRelease: "2010"
license: "Apache License 2.0"
- code: "6"
slug: "apache-storm"
name: "Apache Storm"
description: "A distributed real-time computation system."
category: "Stream Processing Engine"
processingType: "Stream Processing"
latency: "Milliseconds"
initialRelease: "2011"
license: "Apache License 2.0"
- code: "7"
slug: "apache-hbase"
name: "Apache HBase"
description: "A distributed NoSQL database running on Hadoop."
category: "NoSQL Database"
processingType: "Real-time Read/Write"
latency: "Milliseconds"
initialRelease: "2010"
license: "Apache License 2.0"
- code: "8"
slug: "apache-presto-trino"
name: "Apache Trino (formerly PrestoSQL)"
description: "A distributed SQL query engine for large-scale data."
category: "Distributed SQL Query Engine"
processingType: "Interactive Query"
latency: "Seconds to Minutes"
initialRelease: "2012"
license: "Apache License 2.0"