Predictive Hacks

What you can do with Snowflake

snowflake

Snowflake is a popular cloud-based data storage and analytics service. Personally, when I examine new technologies and products, I would like to have a summary of the most important features. This allows us to get a general idea of the product and to understand if it covers our needs. In this post, I will try to outline some important features of Snowflake.

Supports Different Cloud Platforms

A Snowflake account can be hosted on the top 3 cloud providers such as:

  • AWS
  • Azure
  • GCP

Python Connector API

Another feature is that you can connect Python with Snowflake using the Python Connector API that is very useful for Data Scientists who tend to work with Python and Jupyter Notebooks.

Command Line Client (SnowSQL CLI)

With the SnowSQL you can connect Snowflake with the command line to execute SQL queries and perform all DDL and DML operations, including loading data into and unloading data out of database tables and can be run as an interactive shell or in batch mode through stdin or using the -f option.

Loading and Unloading Data

You can load almost any type of data (csv, Parguet, JSON, XML, Avro, ORC) from:

  • Local File System
  • AWS S3
  • Google Cloud Storage
  • Microsoft Azure

One feature that I found really cool is that error handling options of the copy command. For example, you can skip the lines with errors or you can specify how many lines you accept to skip and if it exceeds this threshold, then you can cancel the copy command. Also, when it loads data it takes into consideration the changes in the metadata, so if there is no change in the source file, it does not run the copy command.

Apart from loading data, you can unload data by sending it to other sources like AWS S3.

Different Types of Tables and Databases

There are 3 types of tables and databases such as:

  • Temporary that are available as long as the session is active
  • Transient
  • Permanent

Transforming Data

Snowflake gives us the possibility to transform the data while we upload them. For instance, we can ignore some columns, are we can generate others.

Continuously Loading with Snowpipe

Snowflake has a feature called Snowpipe that allows us to load data from files automatically as soon as they are available.

Time Travel and Fail-Safe

With Snowflake, you can travel back in time, meaning that you can “undo” a command that you have run, like modifying a table, dropping a table or schema and so on. Fail-safe ensures historical data is protected in the event of a system failure or other catastrophic event, e.g. a hardware failure or security breach.

Cloning Tables, Schemas and Databases

We can easily clone tables, schemas and databases with just one line command. Moreover, we can swap tables like switching stage and production tables

Data Sharing

Snowflake enables us to share data with Snowflake and Non-Snowflake users.

Overview of provider accounts sharing data with consumer accounts

Data Sampling

We can easily generate random samples of the tables. This is a very useful tool since it is common to work with sample data, especially when we are dealing with big tables of million/billion rows.

Scheduling Tasks

We can schedule tasks of a single task or trees of tasks. The tasks can be SQL commands or stored procedures. The tasks can be defined in minutes or using the CRON notation. We can get detailed logs, the history and it supports error handling.

Tree of tasks

Data Streams

You can define data streams where your target data will change if the source data have changed.

Materialized Views

With Snowflake, you can create materialized views.

Dynamic Data Masking

You can create masking policies to mask PII data for specific users and roles. This means that different users can see different data.

Access Control Management

Snowflake provides a powerful Access Control Management with roles.

Access control relationships
Hierarchy of securable database objects

The Takeaway

Snowflake is a really powerful tool. In this post, we mentioned WHAT you can do with it. Stay tuned in case you are interested to see some examples of HOW you do some cool stuff with Snowflake.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

pip
Python

Python pip Tips for Data Scientists

Data Scientists use to work with Anaconda Environments and for installing packages they use to run the “conda” commands. However,