ACID Support for Data Lake with Delta Lake, Hudi, Iceberg, Hive and Impala

2022-09-02 apache-hudiapache-orcdata-engineeringdata-lakedelta-lakehiveparquet

This diagram summarizes the commonly used frameworks to build a data lake that supports ACID (Atomic, Consistency, Isolation, Durability).

They have different implementation mechanisms but can all support schema evolutions and integrate with Hive meta catalog (metastore) and computing frameworks like Apache Spark, Trino, etc.

Data Lake with ACID Support
[Not supported by viewer]
Apache Hive / Impala with ORC
[Not supported by viewer]
Delta Lake
[Not supported by viewer]
Apache Hudi
[Not supported by viewer]
Apache Iceberg
[Not supported by viewer]
Merge on Read
[Not supported by viewer]
Merge on Read
[Not supported by viewer]
Copy on Write
or Merge on Read
[Not supported by viewer]
Copy on Write
or Merge on Read
[Not supported by viewer]
Data Lake Tables
[Not supported by viewer]
Batch or Streaming Source
[Not supported by viewer]
Data Analytics with Spark, Trino, Hive, etc.
[Not supported by viewer]
Compactions and Other Supports 
[Not supported by viewer]
Stored as Parquet (Delta Lake, Iceberg, Hudi), or ORC (Hive, Iceberg), or AVRO (Iceberg) 
[Not supported by viewer]