In Hive, there are two types of tables can be created - internal and external table. Internal tables are also called managed tables. Different features are available to different types. This article lists some of the common differences. 

Internal table

By default, Hive creates internal tables. These tables' properties and data layout will and can only be changed via Hive commands. The data for managed tables are stored under the hive.metastore.warehouse.dir path. You can also specify a location via LOCATION sub-statement when creating tables. 

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name    -- (Note: TEMPORARY available in Hive 0.14.0 and later)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...)                  -- (Note: Available in Hive 0.10.0 and later)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
     | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  -- (Note: Available in Hive 0.6.0 and later)
  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]   -- (Note: Available in Hive 0.6.0 and later)
  [AS select_statement];   -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
 
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

For complete syntax, please refer to Hive DDL Language Manual.

For example, the following statement creates a Hive internal table with storage format as Parquet. The data is stored in HDFS path: /data/internaltable.

CREATE TABLE IF NOT EXISTS `internal-table`(
`id` int,
`name` string)
STORED AS PARQUET
LOCATION '/data/internaltable'

External table

For creating external tables, we need to specify EXTERNAL in CREATE TABLE statement. An external table only describes the metadata/schema on external files. MSCK REPAIR TABLE statement can be used to refresh table metadata information when the structure of partitions of an external table has changed.

The following code snippet creates a Hive external table with data stored in /data/externaltable.

CREATE EXTERNAL TABLE IF NOT EXISTS `external-table`(
`id` int,
`name` string)
STORED AS PARQUET
LOCATION '/data/externaltable'

The external sources can be HDFS (hdfs://), Azure Storage (wasb:///), Google Cloud Storage (gs://), AWS S3 (s3://), etc. The following cloud storage examples can be run in each cloud's managed Hadoop/Hive instances. When file protocol is not specified, HDFS will be used by default. 

Azure Storage example

CREATE EXTERNAL TABLE IF NOT EXISTS `external-table`(
`id` int,
`name` string)
STORED AS PARQUET
LOCATION 'wasb:///data/externaltable'

Google Cloud Storage example

CREATE EXTERNAL TABLE IF NOT EXISTS `external-table`(
`id` int,
`name` string)
STORED AS PARQUET
LOCATION 'gs://myproject-warehouse/data/externaltable'

AWS S3 example

CREATE EXTERNAL TABLE IF NOT EXISTS `external-table`(
`id` int,
`name` string)
STORED AS PARQUET
LOCATION 's3://data/externaltable/'

Identify internal and external tables

Internal and external tables can be identified via the following SQL statements:

DESCRIBE FORMATTEED table_name;
SHOW TABLE table_name;

Each of the statement will show you whether the table is created as external table.

Differences between internal and external tables

The following table lists incomplete items of the differences:

FeatureInternal TableExternal Table
Statements: 
ARCHIVE
UNARCHIVE
TRUNCATE
MERGE
CONCATENATE

Yes
DROP deletes data from storageYes
DROP deletes only metadata
Yes
ACID/transnational actions support Yes
Query Result CachingYes
RELY constraints (cost-based-optimization)
Yes
Temporary tableYes
For scenarios that data is also used outside of Hive
Yes
CREATE TABLE AS SELECTYes
* This page is subject to Site terms.

More from Kontext

local_offer hadoop local_offer hive local_offer Java

visibility 424
thumb_up 1
access_time 3 months ago

When I was configuring Hive 3.0.0 in Hadoop 3.2.1 environment, I encountered the following error: Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V Ro...

open_in_new Hadoop

local_offer jupyter-notebook local_offer hdfs

visibility 265
thumb_up 0
access_time 4 months ago

Jupyter notebook service can be started in most of operating system. In the system where Hadoop clients are available, you can also easily ingest data into HDFS (Hadoop Distributed File System) using HDFS CLIs.  *Python 3 Kernel is used in the following examples. List files in H...

open_in_new Code snippets

local_offer hdfs local_offer hadoop local_offer windows

visibility 363
thumb_up 0
access_time 5 months ago

Network Attached Storage are commonly used in many enterprises where files are stored remotely on those servers.  They typically provide access to files using network file sharing protocols such as  ...

open_in_new Hadoop

Schema Merging (Evolution) with Parquet in Spark and Hive

local_offer parquet local_offer pyspark local_offer spark-2-x local_offer hive local_offer hdfs

visibility 2217
thumb_up 0
access_time 5 months ago

Schema evolution is supported by many frameworks or data serialization systems such as Avro, Orc, Protocol Buffer and Parquet. With schema evolution, one set of data can be stored in multiple files with different but compatible schema. In Spark, Parquet data source can detect and merge sch...

open_in_new Spark + PySpark

info About author

comment Comments (0)

comment Add comment

Please log in or register to comment.

account_circle Log in person_add Register

Log in with external accounts

No comments yet.

Dark theme mode

Dark theme mode is available on Kontext.

Learn more arrow_forward

Kontext Column

Created for everyone to publish data, programming and cloud related articles. Follow three steps to create your columns.


Learn more arrow_forward