HTML

Big Data Technologies - HTML

Big data technologies refer to a collection of technologies designed to efficiently process large volumes, diverse types, and high-velocity data that traditional database management systems cannot handle. Centered around distributed processing frameworks such as Hadoop, Spark, Kafka, and Flink, these technologies form a comprehensive ecosystem for data collection, storage, processing, analysis, and visualization. They serve as the foundation for modern data-driven society, enabling real-time analytics, machine learning, IoT data processing, and business intelligence.

Big Data Distributed Processing Hadoop Spark Kafka Flink Data Engineering Stream Processing Batch Processing

<table>
<thead><tr><th>code</th><th>slug</th><th>name</th><th>description</th><th>category</th><th>initialRelease</th><th>latency</th><th>license</th><th>processingType</th></tr></thead>
<tbody><tr><td>1</td><td>apache-hadoop</td><td>Apache Hadoop</td><td>An open-source framework for distributed storage and batch processing.</td><td>Distributed Storage &amp; Batch Processing</td><td>2006</td><td>Minutes to Hours</td><td>Apache License 2.0</td><td>Batch Processing</td></tr>
<tr><td>2</td><td>apache-spark</td><td>Apache Spark</td><td>A high-speed data processing engine using in-memory computation.</td><td>General-Purpose Distributed Processing Engine</td><td>2014</td><td>Seconds</td><td>Apache License 2.0</td><td>Batch &amp; Stream Processing (Micro-batch)</td></tr>
<tr><td>3</td><td>apache-kafka</td><td>Apache Kafka</td><td>A high-throughput distributed streaming platform.</td><td>Messaging &amp; Streaming Platform</td><td>2011</td><td>Milliseconds</td><td>Apache License 2.0</td><td>Stream Processing (Messaging)</td></tr>
<tr><td>4</td><td>apache-flink</td><td>Apache Flink</td><td>A distributed processing engine enabling true stream processing.</td><td>Stream Processing Engine</td><td>2015</td><td>Milliseconds</td><td>Apache License 2.0</td><td>True Stream Processing</td></tr>
<tr><td>5</td><td>apache-hive</td><td>Apache Hive</td><td>Data warehouse software for running SQL-like queries on Hadoop.</td><td>Data Warehouse</td><td>2010</td><td>Minutes to Hours</td><td>Apache License 2.0</td><td>Batch Processing</td></tr>
<tr><td>6</td><td>apache-storm</td><td>Apache Storm</td><td>A distributed real-time computation system.</td><td>Stream Processing Engine</td><td>2011</td><td>Milliseconds</td><td>Apache License 2.0</td><td>Stream Processing</td></tr>
<tr><td>7</td><td>apache-hbase</td><td>Apache HBase</td><td>A distributed NoSQL database running on Hadoop.</td><td>NoSQL Database</td><td>2010</td><td>Milliseconds</td><td>Apache License 2.0</td><td>Real-time Read/Write</td></tr>
<tr><td>8</td><td>apache-presto-trino</td><td>Apache Trino (formerly PrestoSQL)</td><td>A distributed SQL query engine for large-scale data.</td><td>Distributed SQL Query Engine</td><td>2012</td><td>Seconds to Minutes</td><td>Apache License 2.0</td><td>Interactive Query</td></tr></tbody>
</table>