Diagramvisibility 149 access_time 10 months ago language Englishtag Tags
ACID Support for Data Lake with Delta Lake, Hudi, Iceberg, Hive and Impala
copyright This page is subject to Site terms.
No comments yet.
Please log in or register to comment.account_circle Log in person_add Register
Log in with external accounts
This diagram summarizes the commonly used frameworks to build a data lake that supports ACID (Atomic, Consistency, Isolation, Durability).
- Apache Hive/Impala with ORC based transactional tables: storage format is ORC.Hive ACID Inserts, Updates and Deletes with ORC.
- Delta Lake: storage format is parquet with transactional JSON log files. Delta Lake with PySpark Walkthrough.
- Apache Hudi: storage format is parquet.
- Apache Iceberg: stored as parquet, ORC or Avro
They have different implementation mechanisms but can all support schema evolutions and integrate with Hive meta catalog (metastore) and computing frameworks like Apache Spark, Trino, etc.