Predictive Hacks

Get Started with Hugging Face Auto Train

Hugging Face has launched the auto train, which is a new way to automatically train, evaluate and deploy state-of-the-art Machine Learning models. It enables us to train custom machine learning models by simply uploading the data. Under the hood, it runs automatically different models and keeps the best ones. Finally, we use our models directly from the Hugging Face Hub. Currently, it supports the following tasks:

  • Image Classification
  • Text Classification
  • Token Classification
  • Question Answering
  • Translation
  • Summarization
  • Text Regression
  • Tabular Data Classification
  • Tabular Data Regression

In this tutorial, we will work on a Text Classification example.

Text Classification with Hugging Face Auto Train

Let’s start building our text classification model using the Hugging Face Auto Train. You have to sign in to the Hugging Face. Then, you click on the “Create new project” button.

Get Started with Hugging Face Auto Train 1

Then, you give the project name, and you choose a task. In our case, we will use a “Text” task and more particularly a “Text Classification (Binary)” and finally we click on the “Create Project

Get Started with Hugging Face Auto Train 2

Then, we can upload the .csv file of two columns, such as text and target.

Get Started with Hugging Face Auto Train 3

For this example, I chose a dataset from hotel reviews. The file consists of two columns, the text and the target that takes two values, 0 (negative) or 1 (positive).

Get Started with Hugging Face Auto Train 4

Note, that for the free version, the dataset must be less than 3000 rows! Once we upload the data, we click on “Add to project“. Then we are ready to train the model, by clicking on the “Go to trainings“.

Get Started with Hugging Face Auto Train 5

The free version allows us to train up to 5 models.

Get Started with Hugging Face Auto Train 6

The 5 models run in parallel, and you can see their accuracy.

Get Started with Hugging Face Auto Train 7

If we click on the model, we can see other metrics such as Precision, Recall, Auc, F1 and Loss.

Get Started with Hugging Face Auto Train 8

Or, if we go to the Metrics section, we can have a summary view of all models.

Get Started with Hugging Face Auto Train 9

Make Predictions from the UI

When we are in the Metrics section, we can click on any Model ID. Let’s try the review “The hotel was amazing“.

Get Started with Hugging Face Auto Train 10

We got a label=1 which means positive with a probability of 95%.

Make Predictions with Python

On the bottom left, you can see a section called “Usage“, where it shows how to make curl and Python calls. We will need to just copy paste the Python API code snippet code.

Get Started with Hugging Face Auto Train 11

In order to call the model from the Python API, we will need to create an use_auth_token. We should go to settings/tokens and create a new token for the auto_train.

Get Started with Hugging Face Auto Train 12

Once we have created the access token, we can copy it and use it, as we will show below. Now, let’s move to Colab. We have to install the transformers library and then simply paste the code snippet that we copied above. For the use_aut_token, we will pass the Access Token that we generated earlier.

!pip install transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("gpipis/autotrain-auto_train_text_classification-1557955500", use_auth_token='xxx')

tokenizer = AutoTokenizer.from_pretrained("gpipis/autotrain-auto_train_text_classification-1557955500", use_auth_token='xxx')

inputs = tokenizer("I love AutoTrain", return_tensors="pt")

outputs = model(**inputs)

SequenceClassifierOutput(loss=None, logits=tensor([[-1.3991,  1.5701]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

The output returns the logits. If we want to get the probabilities of each class, we will need to use the softmax function as follows:

from torch import nn

pt_predictions  = nn.functional.softmax(outputs.logits, dim=-1)
tensor([[0.0488, 0.9512]], grad_fn=<SoftmaxBackward0>)

Make Predictions with the Pipeline

We can make predictions using the pipelines as follows.

from transformers import pipeline
my_pipeline = pipeline(task="text-classification", model=model, tokenizer=tokenizer)

my_score = my_pipeline('The hotel was amazing')
[{'label': '1', 'score': 0.9511635303497314}]

As we can see, we got the same results with the UI, meaning a label equal to 1 with a probability of 95%.

More about Transformers and Hugging Face?

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore


Get Started with Python UDFs in Snowflake

Finally, Snowflake supports UDF (user-define functions) in Python. Thank you Snowflake! Apart from Python, we can write UDFs in Java,