Predictive Hacks

How to Load CSV files as Huggingface Dataset

Huggingface is a great library for transformers. If you have a look at the documentation, almost all the examples are using a data type called DatasetDict. Let’s see how we can load CSV files as Huggingface Dataset.

Assume that we have a train and a test dataset called train_spam.csv and test_spam.csv respectively.

# Install the libraries
!pip install pandas
!pip install datasets
!pip install transformers


import datasets
from datasets import load_dataset
import pandas as pd


# load the CSV files as Dataset

dataset = load_dataset('csv', data_files={'train': 'train_spam.csv', 'test': 'test_spam.csv'})

dataset
How to Load CSV files as Huggingface Dataset 1

How to Convert a Pandas DataFrame to Hugging Face Dataset

Let’s see how we can convert a Pandas DataFrame to Huggingface Dataset. Then we will create a Dataset of the train and test Datasets.

import pandas as pd
import datasets
from datasets import Dataset, DatasetDict

df_train = pd.read_csv('train_spam.csv')
df_test = pd.read_csv('test_spam.csv')


train = Dataset.from_pandas(df_train)
test = Dataset.from_pandas(df_test)


dataset = DatasetDict()

dataset['train'] = train
dataset['test'] = test

dataset
How to Load CSV files as Huggingface Dataset 2

References

[1] Huggingface

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Get Started with Hugging Face Auto Train

Hugging Face has launched the auto train, which is a new way to automatically train, evaluate and deploy state-of-the-art Machine