Predictive Hacks

Get Started with Hugging Face Auto Train

Hugging Face has launched the auto train, which is a new way to automatically train, evaluate and deploy state-of-the-art Machine Learning models. It enables us to train custom machine learning models by simply uploading the data. Under the hood, it runs automatically different models and keeps the best ones. Finally, we use our models directly from the Hugging Face Hub. Currently, it supports the following tasks:

  • Image Classification
  • Text Classification
  • Token Classification
  • Question Answering
  • Translation
  • Summarization
  • Text Regression
  • Tabular Data Classification
  • Tabular Data Regression

In this tutorial, we will work on a Text Classification example.

Text Classification with Hugging Face Auto Train

Let’s start building our text classification model using the Hugging Face Auto Train. You have to sign in to the Hugging Face. Then, you click on the “Create new project” button.

Then, you give the project name, and you choose a task. In our case, we will use a “Text” task and more particularly a “Text Classification (Binary)” and finally we click on the “Create Project

Then, we can upload the .csv file of two columns, such as text and target.

For this example, I chose a dataset from hotel reviews. The file consists of two columns, the text and the target that takes two values, 0 (negative) or 1 (positive).

Note, that for the free version, the dataset must be less than 3000 rows! Once we upload the data, we click on “Add to project“. Then we are ready to train the model, by clicking on the “Go to trainings“.

The free version allows us to train up to 5 models.

The 5 models run in parallel, and you can see their accuracy.

If we click on the model, we can see other metrics such as Precision, Recall, Auc, F1 and Loss.

Or, if we go to the Metrics section, we can have a summary view of all models.

Make Predictions from the UI

When we are in the Metrics section, we can click on any Model ID. Let’s try the review “The hotel was amazing“.

We got a label=1 which means positive with a probability of 95%.

Make Predictions with Python

On the bottom left, you can see a section called “Usage“, where it shows how to make curl and Python calls. We will need to just copy paste the Python API code snippet code.

In order to call the model from the Python API, we will need to create an use_auth_token. We should go to settings/tokens and create a new token for the auto_train.

Once we have created the access token, we can copy it and use it, as we will show below. Now, let’s move to Colab. We have to install the transformers library and then simply paste the code snippet that we copied above. For the use_aut_token, we will pass the Access Token that we generated earlier.

!pip install transformers

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("gpipis/autotrain-auto_train_text_classification-1557955500", use_auth_token='xxx')

tokenizer = AutoTokenizer.from_pretrained("gpipis/autotrain-auto_train_text_classification-1557955500", use_auth_token='xxx')

inputs = tokenizer("I love AutoTrain", return_tensors="pt")

outputs = model(**inputs)

outputs
  
SequenceClassifierOutput(loss=None, logits=tensor([[-1.3991,  1.5701]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

The output returns the logits. If we want to get the probabilities of each class, we will need to use the softmax function as follows:

from torch import nn

pt_predictions  = nn.functional.softmax(outputs.logits, dim=-1)
pt_predictions
 
tensor([[0.0488, 0.9512]], grad_fn=<SoftmaxBackward0>)

Make Predictions with the Pipeline

We can make predictions using the pipelines as follows.

from transformers import pipeline
my_pipeline = pipeline(task="text-classification", model=model, tokenizer=tokenizer)

my_score = my_pipeline('The hotel was amazing')
my_score
 
[{'label': '1', 'score': 0.9511635303497314}]

As we can see, we got the same results with the UI, meaning a label equal to 1 with a probability of 95%.

More about Transformers and Hugging Face?

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s