Let’s go straightforward to show you how to save and load the scikit learn models. We will start with random forest model.
from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0, random_state=0, shuffle=False) clf = RandomForestClassifier(max_depth=2, random_state=0) clf.fit(X, y)
Our model is the clf and we want to save it in the hard disk and then load it.
We can save the models as
pkl objects using the pickle library.
import pickle # Save the model under the cwd pkl_filename = "clf.pkl" with open(pkl_filename, 'wb') as file: pickle.dump(clf, file) # Load the saved model with open("clf.pkl", 'rb') as file: clf = pickle.load(file) # Now you can use the model print(clf.predict([[0, 0, 0, 0]]))
We will save the clf model but using the joblib library.
from sklearn.externals import joblib # Save the model under the cwd joblib_filename = "clf.pkl" joblib.dump(clf, joblib_filename ) # Load the saved model clf = joblib.load('clf.pkl') # Now you can use the model print(clf.predict([[0, 0, 0, 0]]))
How to Save the Model and the Tokenizer in a Single File
In many NLP tasks, apart from the Machine Learning model, we have a tokenizer where it makes sense to save both of them in a single file. Let’s see how we can achieve that.
import pickle # Save the Tokenizer and the Model in the same file with open('model_and_tokenizer.pkl', 'wb') as file: pickle.dump((tokenizer, clf), file) # Load the Tokenizer and the Model with open('model_and_tokenizer.pkl', 'rb') as file: tokenizer, clf = pickle.load(file) # Apply it to your data X_test_tokenized = tokenizer.transform(X_test) clf.predict(X_test_tokenized)