Let’s say what we are dealing with a CSV file, where there is a quoted field that contains commas.
my_file.csv
ID,VALUE 1,"[1,2,3]" 2,"[0,5,10]" 3,"[7,8,9]" 4,[6]
We can then create the external table as:
CREATE EXTERNAL TABLE my_table ( ID INTEGER, VALUE STRING ) ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE' stored as textfile LOCATION "s3://my-bucket/my_files/" tblproperties ("skip.header.line.count"="1");
Note that you can specify some SerDe properties. For example:
CREATE TABLE my_table(a string, b string, ...) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = "\t", "quoteChar" = "'", "escapeChar" = "\\" ) STORED AS TEXTFILE;
where the default properties are:
DEFAULT_ESCAPE_CHARACTER \ DEFAULT_QUOTE_CHARACTER " DEFAULT_SEPARATOR ,