Predictive Hacks

How to Load CSV files as Huggingface Dataset

Huggingface is a great library for transformers. If you have a look at the documentation, almost all the examples are using a data type called DatasetDict. Let’s see how we can load CSV files as Huggingface Dataset.

Assume that we have a train and a test dataset called train_spam.csv and test_spam.csv respectively.

# Install the libraries
!pip install pandas
!pip install datasets
!pip install transformers


import datasets
from datasets import load_dataset
import pandas as pd


# load the CSV files as Dataset

dataset = load_dataset('csv', data_files={'train': 'train_spam.csv', 'test': 'test_spam.csv'})

dataset

How to Convert a Pandas DataFrame to Hugging Face Dataset

Let’s see how we can convert a Pandas DataFrame to Huggingface Dataset. Then we will create a Dataset of the train and test Datasets.

import pandas as pd
import datasets
from datasets import Dataset, DatasetDict

df_train = pd.read_csv('train_spam.csv')
df_test = pd.read_csv('test_spam.csv')


train = Dataset.from_pandas(df_train)
test = Dataset.from_pandas(df_test)


dataset = DatasetDict()

dataset['train'] = train
dataset['test'] = test

dataset

References

[1] Huggingface

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s