ACID Support for Data Lake with Delta Lake, Hudi, Iceberg, Hive and Impala
This diagram summarizes the commonly used frameworks to build a data lake that supports ACID (Atomic, Consistency, Isolation, Durability).
- Apache Hive/Impala with ORC based transactional tables: storage format is ORC.Hive ACID Inserts, Updates and Deletes with ORC.
- Delta Lake: storage format is parquet with transactional JSON log files. Delta Lake with PySpark Walkthrough.
- Apache Hudi: storage format is parquet.
- Apache Iceberg: stored as parquet, ORC or Avro
They have different implementation mechanisms but can all support schema evolutions and integrate with Hive meta catalog (metastore) and computing frameworks like Apache Spark, Trino, etc.
copyright
This page is subject to Site terms.
comment Comments
No comments yet.
Log in with external accounts
tag
Tags
info Info
info Info
Image URL
SVG URL