How to Merge different CountVectorizer in Scikit-Learn

Assume that we have two different Count Vectorizers, and we want to merge them in order to end up with one unique table, where the columns will be the features of the Count Vectorizers. For example,

vecA = CountVectorizer(ngram_range=(1, 1), min_df = 1)
vecA.fit(my_document)


vecB = CountVectorizer(ngram_range=(2, 2), min_df = 5)
vecB.fit(my_document)

We can merge the features as follows:

from sklearn.pipeline import FeatureUnion

merged_features = FeatureUnion([('CountVectorizer', vecA),('CountVect', vecB)])
merged_features.transform(my_document)

Or, alternatively:

from scipy.sparse import csr_matrix, hstack

combined_features= hstack([vecA, vecB], 'csr')

Share This Post

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Image Captioning with HuggingFace

Image captioning with AI is a fascinating application of artificial intelligence (AI) that involves generating textual descriptions for images automatically.

George Pipis March 21, 2024

Python

Intro to Chatbots with HuggingFace

In this tutorial, we will show you how to use the Transformers library from HuggingFace to build chatbot pipelines. Let’s

George Pipis March 15, 2024

How to Merge different CountVectorizer in Scikit-Learn

Share This Post

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Image Captioning with HuggingFace

Intro to Chatbots with HuggingFace

#Tag Cloud ☁️