Predictive Hacks

How to Load CSV files as Huggingface Dataset

Huggingface is a great library for transformers. If you have a look at the documentation, almost all the examples are using a data type called DatasetDict. Let’s see how we can load CSV files as Huggingface Dataset.

Assume that we have a train and a test dataset called train_spam.csv and test_spam.csv respectively.

# Install the libraries
!pip install pandas
!pip install datasets
!pip install transformers


import datasets
from datasets import load_dataset
import pandas as pd


# load the CSV files as Dataset

dataset = load_dataset('csv', data_files={'train': 'train_spam.csv', 'test': 'test_spam.csv'})

dataset
How to Load CSV files as Huggingface Dataset 1

How to Convert a Pandas DataFrame to Hugging Face Dataset

Let’s see how we can convert a Pandas DataFrame to Huggingface Dataset. Then we will create a Dataset of the train and test Datasets.

import pandas as pd
import datasets
from datasets import Dataset, DatasetDict

df_train = pd.read_csv('train_spam.csv')
df_test = pd.read_csv('test_spam.csv')


train = Dataset.from_pandas(df_train)
test = Dataset.from_pandas(df_test)


dataset = DatasetDict()

dataset['train'] = train
dataset['test'] = test

dataset
How to Load CSV files as Huggingface Dataset 2

References

[1] Huggingface

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore