Snowflake gives us the option to create external tables. We will provide an example of how to create external tables with Snowflake and data located in S3. Assume our CSV file called Example.csv
is the following:
ID,Name,DOB 1,George,1980-03-04 2,Billy,1983-10-22 3,Marco,1990-01-01 4,Alex,1970-12-07 5,Maria,2000-01-01 6,Christina,1990-06-01
and it is placed in an S3 bucket. We have already created an S3 stage and we are ready to create the external table. There are two options, one is to create it directly and the other option is to specify its column.
Option A
Note that the data are under the path demo_data
and our external stage is called my_s3_test_snowflake_stage
.
create or replace external table my_ext_table with location = @my_s3_test_snowflake_stage/demo_data/ auto_refresh = true file_format = (type = CSV, skip_header=1); select * from my_ext_table
Above, you can see the output of the my_ext_table
table and most probably was not exactly what we expected to get since the data are unstructured. However, we can convert them to tabular data as follows:
select value:c1::int as ID, value:c2::varchar as Name, value:c3::date as Date from my_ext_table
Option B
We can specify the column names in the beginning as we can see below:
create or replace external table my_ext_table ( ID int as (value:c1::int), Name varchar as (value:c2::varchar), Date date as (value:c3::date) ) with location = @my_s3_test_snowflake_stage/demo_data/ auto_refresh = true file_format = (type = CSV, skip_header=1); select * from my_ext_table
Things to Know
Refresh the Data
Unlikely with Hive, when the data changes in the S3 bucket, the external table is not updated automatically. So, in order to update it, you need to run the following command:
alter external table my_ext_table refresh;
However, if you want your external table to refresh automatically, you can do it by setting the SQS notifications.
Multiple Files
You can have multiple files in the S3 bucket, and Snowflake can remove the headers for each file. Let’s have the same file twice at S3 bucket, of course with a different name.
create or replace external table my_ext_table ( ID int as (value:c1::int), Name varchar as (value:c2::varchar), Date date as (value:c3::date) ) with location = @my_s3_test_snowflake_stage/demo_data/ auto_refresh = true file_format = (type = CSV, skip_header=1); select * from my_ext_table
Get the filename
You can also get the file name of the external table data using the command metadata$filename
.
select metadata$filename, name from my_ext_table
More Snowflake Tutorials
How To Schedule Tasks In Snowflake
What you can do with Snowflake