Assume that we have two different Count Vectorizers, and we want to merge them in order to end up with one unique table, where the columns will be the features of the Count Vectorizers. For example,
vecA = CountVectorizer(ngram_range=(1, 1), min_df = 1) vecA.fit(my_document) vecB = CountVectorizer(ngram_range=(2, 2), min_df = 5) vecB.fit(my_document)
We can merge the features as follows:
from sklearn.pipeline import FeatureUnion merged_features = FeatureUnion([('CountVectorizer', vecA),('CountVect', vecB)]) merged_features.transform(my_document)
Or, alternatively:
from scipy.sparse import csr_matrix, hstack combined_features= hstack([vecA, vecB], 'csr')