In this post, we will show you how you easily apply Stacked Ensemble Models in R using the H2O package. The models can treat both Classification and Regression problems. For this example, we will apply a classification problem, using the Breast Cancer Wisconsin dataset which can be found here.
Description of the Stacked Ensemble Models
The steps below describe the individual tasks involved in training and testing a Super Learner ensemble. H2O automates most of the steps below so that you can quickly and easily build ensembles of H2O models.
- Set up the ensemble.
- Specify a list of L base algorithms (with a specific set of model parameters).
- Specify a metalearning algorithm.
- Train the ensemble.
- Train each of the L base algorithms on the training set.
- Perform k-fold cross-validation on each of these learners and collect the cross-validated predicted values from each of the L algorithms.
- The N cross-validated predicted values from each of the L algorithms can be combined to form a new N x L matrix. This matrix, along wtih the original response vector, is called the “level-one” data. (N = number of rows in the training set.)
- Train the metalearning algorithm on the level-one data. The “ensemble model” consists of the L base learning models and the metalearning model, which can then be used to generate predictions on a test set.
- Predict on new data.
- To generate ensemble predictions, first generate predictions from the base learners.
- Feed those predictions into the metalearner to generate the ensemble prediction.
Example of the Stacked Ensemble Model
We will build a Stacked Ensemble Model by applying the following steps:
- Split the dataset into Train (75%) and Test (25%) dataset.
- Run 3 base models, such as Gradient Boost, Random Forest, and Logistic Regression using Cross-Validation of 5 Folds
- Stack the 3 base model by applying Random Forest and train them. The X features are the predicted values of the 3 models obtained from the Cross-Validation.
- Compare the AUC score of each 3 models and the Stacked one on the Test dataset.
library(tidyverse) library(h2o) df<-read.csv("breast_cancer.csv", stringsAsFactors = TRUE) # remove the id_number from the features df<-df%>%select(-id_number) # Split the data frame into Train and Test dataset ## 75% of the sample size smp_size <- floor(0.75 * nrow(df)) ## set the seed to make your partition reproducible set.seed(5) train_ind <- sample(seq_len(nrow(df)), size = smp_size) train_df <- df[train_ind, ] test_df <- df[-train_ind, ] # initialize the h2o h2o.init() # create the train and test h2o data frames train_df_h2o<-as.h2o(train_df) test_df_h2o<-as.h2o(test_df) # Identify predictors and response y <- "diagnosis" x <- setdiff(names(train_df_h2o), y) # Number of CV folds (to generate level-one data for stacking) nfolds <- 5 # 1. Generate a 3-model ensemble (GBM + RF + Logistic) # Train & Cross-validate a GBM my_gbm <- h2o.gbm(x = x, y = y, training_frame = train_df_h2o, nfolds = nfolds, keep_cross_validation_predictions = TRUE, seed = 5) # Train & Cross-validate a RF my_rf <- h2o.randomForest(x = x, y = y, training_frame = train_df_h2o, nfolds = nfolds, keep_cross_validation_predictions = TRUE, seed = 5) # Train & Cross-validate a LR my_lr <- h2o.glm(x = x, y = y, training_frame = train_df_h2o, family = c("binomial"), nfolds = nfolds, keep_cross_validation_predictions = TRUE, seed = 5) # Train a stacked random forest ensemble using the GBM, RF and LR above ensemble <- h2o.stackedEnsemble(x = x, y = y, metalearner_algorithm="drf", training_frame = train_df_h2o, base_models = list(my_gbm, my_rf, my_lr)) # Eval ensemble performance on a test set perf <- h2o.performance(ensemble, newdata = test_df_h2o) # Compare to base learner performance on the test set perf_gbm_test <- h2o.performance(my_gbm, newdata = test_df_h2o) perf_rf_test <- h2o.performance(my_rf, newdata = test_df_h2o) perf_lr_test <- h2o.performance(my_lr, newdata = test_df_h2o) baselearner_best_auc_test <- max(h2o.auc(perf_gbm_test), h2o.auc(perf_rf_test), h2o.auc(perf_lr_test)) ensemble_auc_test <- h2o.auc(perf) print(sprintf("Best Base-learner Test AUC: %s", baselearner_best_auc_test)) print(sprintf("Ensemble Test AUC: %s", ensemble_auc_test))
Running the above block of code we get the following results:
As we can see all the models performed really well but the Stacked one achieved the highest AUC score. Whenever you test different models it is worth trying also the Stacked Ensemble Models.