Predictive Hacks

How to Load CSV files as Huggingface Dataset

Huggingface is a great library for transformers. If you have a look at the documentation, almost all the examples are using a data type called DatasetDict. Let’s see how we can load CSV files as Huggingface Dataset.

Assume that we have a train and a test dataset called train_spam.csv and test_spam.csv respectively.

# Install the libraries
!pip install pandas
!pip install datasets
!pip install transformers

import datasets
from datasets import load_dataset
import pandas as pd

# load the CSV files as Dataset

dataset = load_dataset('csv', data_files={'train': 'train_spam.csv', 'test': 'test_spam.csv'})


How to Convert a Pandas DataFrame to Hugging Face Dataset

Let’s see how we can convert a Pandas DataFrame to Huggingface Dataset. Then we will create a Dataset of the train and test Datasets.

import pandas as pd
import datasets
from datasets import Dataset, DatasetDict

df_train = pd.read_csv('train_spam.csv')
df_test = pd.read_csv('test_spam.csv')

train = Dataset.from_pandas(df_train)
test = Dataset.from_pandas(df_test)

dataset = DatasetDict()

dataset['train'] = train
dataset['test'] = test



[1] Huggingface

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore


Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.


Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s