Snowflake is a popular cloud-based data storage and analytics service. Personally, when I examine new technologies and products, I would like to have a summary of the most important features. This allows us to get a general idea of the product and to understand if it covers our needs. In this post, I will try to outline some important features of Snowflake.
Supports Different Cloud Platforms
A Snowflake account can be hosted on the top 3 cloud providers such as:
- AWS
- Azure
- GCP
Python Connector API
Another feature is that you can connect Python with Snowflake using the Python Connector API that is very useful for Data Scientists who tend to work with Python and Jupyter Notebooks.
Command Line Client (SnowSQL CLI)
With the SnowSQL you can connect Snowflake with the command line to execute SQL queries and perform all DDL and DML operations, including loading data into and unloading data out of database tables and can be run as an interactive shell or in batch mode through stdin or using the -f option.
Loading and Unloading Data
You can load almost any type of data (csv, Parguet, JSON, XML, Avro, ORC) from:
- Local File System
- AWS S3
- Google Cloud Storage
- Microsoft Azure
One feature that I found really cool is that error handling options of the copy command. For example, you can skip the lines with errors or you can specify how many lines you accept to skip and if it exceeds this threshold, then you can cancel the copy command. Also, when it loads data it takes into consideration the changes in the metadata, so if there is no change in the source file, it does not run the copy command.
Apart from loading data, you can unload data by sending it to other sources like AWS S3.
Different Types of Tables and Databases
There are 3 types of tables and databases such as:
- Temporary that are available as long as the session is active
- Transient
- Permanent
Transforming Data
Snowflake gives us the possibility to transform the data while we upload them. For instance, we can ignore some columns, are we can generate others.
Continuously Loading with Snowpipe
Snowflake has a feature called Snowpipe that allows us to load data from files automatically as soon as they are available.
Time Travel and Fail-Safe
With Snowflake, you can travel back in time, meaning that you can “undo” a command that you have run, like modifying a table, dropping a table or schema and so on. Fail-safe ensures historical data is protected in the event of a system failure or other catastrophic event, e.g. a hardware failure or security breach.
Cloning Tables, Schemas and Databases
We can easily clone tables, schemas and databases with just one line command. Moreover, we can swap tables like switching stage and production tables
Data Sharing
Snowflake enables us to share data with Snowflake and Non-Snowflake users.
Data Sampling
We can easily generate random samples of the tables. This is a very useful tool since it is common to work with sample data, especially when we are dealing with big tables of million/billion rows.
Scheduling Tasks
We can schedule tasks of a single task or trees of tasks. The tasks can be SQL commands or stored procedures. The tasks can be defined in minutes or using the CRON notation. We can get detailed logs, the history and it supports error handling.
Data Streams
You can define data streams where your target data will change if the source data have changed.
Materialized Views
With Snowflake, you can create materialized views.
Dynamic Data Masking
You can create masking policies to mask PII data for specific users and roles. This means that different users can see different data.
Access Control Management
Snowflake provides a powerful Access Control Management with roles.
The Takeaway
Snowflake is a really powerful tool. In this post, we mentioned WHAT you can do with it. Stay tuned in case you are interested to see some examples of HOW you do some cool stuff with Snowflake.