Predictive Hacks

Get Started with Python UDFs in Snowflake

snowflake

Finally, Snowflake supports UDF (user-define functions) in Python. Thank you Snowflake! Apart from Python, we can write UDFs in Java, Javascript and SQL. In this tutorial, we will provide basic examples of UDFs in Python.

Before we start, you should keep in mind that we can import a curated list of 3rd party packages from Anaconda. We can get a list of all available packages and their corresponding versions by running:

select * from information_schema.packages where language = 'python';

Get Started with Python UDFs in Snowflake 1

Also, keep in mind that for the time being, Snowflake supports Python 3.8 only and this should be specified within the UDF.

UDFs with In-Line Code

We can create a UDF function with an “in-line code” which means that we write directly the code or with a “code uploaded from a stage” where within the function we specify the location of an existing Python source code in a stage. In this tutorial, we will work with the “in-line code” UDFs.

We will build a function that applies a text cleansing by converting the text to lower case and by removing the HTM tags and markups, the digits, the punctuation, the extra spaces, and the leading/trailing whitespace.

create or replace function text_cleansing(s string)
returns string
language python
runtime_version = '3.8'
handler = 'cleansing'
as
$$
def cleansing(x):
    import re
    import string
    
    # lower case
    x = x.lower()
    
    # Remove HTML tags/markups
    x = re.sub('<.*?>', '', x)
    
    # remove digits
    x = re.sub('\d+', '', x)
    
    # remove punctuation
    x = x.translate(str.maketrans('', '', string.punctuation))
    
    # remove etra space and tabs
    x = re.sub('\s+', ' ', x)
    
    # remove leading/trailing whitespace
    x = x.strip()
    
    return x
$$;

Once we run the snippet code above, the UDF is ready to use. Make sure that you received a message that the function has been created successfully.

Get Started with Python UDFs in Snowflake 2

Notice that the handler function is within the double dollar signs ($$) and we start the function by specifying the name, the input and the input type, the type of the output, the programming language, the runtime and finally the handler that should be the same name with the function within the double dollar signs, in our case cleansing. Note that if we wanted to load packages from Anaconda, we should have added a line mentioning the required packages like:

create or replace function py_udf()
returns variant
language python
runtime_version = 3.8
packages = ('numpy','pandas')
handler = 'udf'
as $$
import numpy as np
import pandas as pd
import xgboost as xgb
def udf():
    return [np.__version__, pd.__version__]
$$;
 

Call the UDF

Now the UDF is ready to use as a common built-in function, meaning that you can use it in any SQL statement. Let’s try to use the function on a sample text:

“This is an example (of) how to remove numbers 123 and tags <li> and punctuations !:#*”

select TEXT_CLEANSING('This is an example (of) how to remove numbers 123 and tags <li> and punctuations !:#*') as cleaned_text

And we get:

this is an example of how to remove numbers and tags and punctuations
Get Started with Python UDFs in Snowflake 3

How to Get a List of the UDF

We can get a detailed list of UDFs by running the command:

show user functions;
 
Get Started with Python UDFs in Snowflake 4

How to Delete a UDF

You can delete a UDF by running the command:

drop function text_cleansing(string);
 
Get Started with Python UDFs in Snowflake 5

Note that you need to pass the function name and the data type.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Miscellaneous

Arrays in Bash

This tutorial is a brief introduction to arrays in bash. Data Scientists are familiar with arrays, in R, a simple