Hive DDL Manual
Hive tables over Snappy Parquet
Useful script for creating partitioned, snappy-based, Hive tables.
drop table mytable;
CREATE EXTERNAL TABLE `mytable`(
`mytimestamp` string,
`key` string,
`value` double,
PARTITIONED BY (
`year` int,
`month` int,
`day` int)
STORED AS PARQUET TBLPROPERTIES ("parquet.compression"="SNAPPY");
alter table mytable set location 'hdfs://[HDFS_HOST]:8020/user/victor/mytable';
msck repair table mytable;
select * from mytable limit 100;
Notes
- Data should be stored in Snappy Parquet files and respecting the defined partitioning
- See this example, where a DataFrame writes compressed data.