Predictive Hacks

Create External Tables in Hive from CSV with Quoted Field

Let’s say what we are dealing with a CSV file, where there is a quoted field that contains commas.

my_file.csv

ID,VALUE
1,"[1,2,3]"
2,"[0,5,10]"
3,"[7,8,9]"
4,[6]

We can then create the external table as:

CREATE EXTERNAL TABLE my_table (
ID INTEGER,
VALUE STRING
)
ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
stored as textfile
LOCATION "s3://my-bucket/my_files/"
tblproperties ("skip.header.line.count"="1");

Note that you can specify some SerDe properties. For example:

CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)  
STORED AS TEXTFILE;

where the default properties are:

DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER  "
DEFAULT_SEPARATOR        ,

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s