4 items tagged with "apache-orc"
This diagram summarizes the commonly used frameworks to build a data lake that supports ACID (Atomic, Consistency, Isolation, Durability). Apache Hive/Impala with ORC based transactional tables: storage format is ORC.Hive ACID Inserts, Updates and Deletes with ORC. Delta Lake: storage format is parquet with transactional JSON log files. Delta Lake with PySpark Walkthrough. Apache Hudi: storage format is parquet. Apache Iceberg: stored as parquet, ORC or Avro They have different implementation mechanisms but can all support schema evolutions and integrate with Hive meta catalog (metastore) and computing frameworks like Apache Spark, Trino, etc.
We use cookies to improve your experience. Privacy | Cookies
Configure your cookie preferences below.
These cookies are necessary for the website to function properly. They enable core functionality such as security, network management, and accessibility.
Required for website functionality
These cookies help us understand how visitors interact with our website by collecting and reporting information anonymously. This helps us improve our website.