Diagram
Raymond
visibility 149
access_time 10 months ago
language English
ACID Support for Data Lake with Delta Lake, Hudi, Iceberg, Hive and Impala
copyright
This page is subject to Site terms.
comment Comments
No comments yet.
Log in with external accounts
tag
Tags
info Info
info Info
Image URL
SVG URL
This diagram summarizes the commonly used frameworks to build a data lake that supports ACID (Atomic, Consistency, Isolation, Durability).
- Apache Hive/Impala with ORC based transactional tables: storage format is ORC.Hive ACID Inserts, Updates and Deletes with ORC.
- Delta Lake: storage format is parquet with transactional JSON log files. Delta Lake with PySpark Walkthrough.
- Apache Hudi: storage format is parquet.
- Apache Iceberg: stored as parquet, ORC or Avro
They have different implementation mechanisms but can all support schema evolutions and integrate with Hive meta catalog (metastore) and computing frameworks like Apache Spark, Trino, etc.