Definition

ensemble modeling

By

What is ensemble modeling?

Ensemble modeling is the process of running two or more related but different analytical models and then synthesizing the results into a single score or spread. This improves the accuracy of predictive analytics and data mining applications. Ensemble modeling techniques are commonly used in machine learning applications to improve the overall predictive performance.

Analytical and machine learning models process some inputs and identify patterns in those inputs. Then, based on those patterns, they produce some output (outcome), which is usually some kind of prediction.

In many applications, one model is not enough to produce accurate predictions (i.e., predictions with low generalization error). To improve the prediction process and deliver better predictive output, multiple models are combined and trained. This approach is known as ensemble modeling.

Combining multiple models is akin to seeking the wisdom of crowds in making predictions and in reducing predictive or generalization error. It's important to note that the ensemble model's predictive error only decreases when the base estimators are both diverse and independent.

The models being combined are known as base estimators, base learners or first-level models. Regardless of how many base estimators are used, the ensemble model acts and performs as a single model.

The need for ensemble modeling

In machine learning, predictive modeling and other types of data analytics, a single model based on one data sample or data set can have biases and thus provide output based on an analysis of too few features. It may also have high variability, meaning the model is too sensitive to the inputs for the learned features. Another issue is that it may have outright inaccuracies so it cannot properly fit the entire training data set. All these issues can affect the reliability of the model's analytical or predictive findings.

By combining different models or analyzing multiple samples, data scientists, data analysts and machine learning engineers can reduce the effects of those limitations. They can boost the overall accuracy of the output by reducing error and provide better information to business decision-makers. Ensemble modeling also boosts resilience against uncertainties in the data set and increases the probability of producing more robust and reliable forecasts.

Ensemble modeling has grown in popularity as more organizations have deployed the computing resources and advanced analytics software needed to run such models. Hadoop and other big data technologies have led businesses to store and analyze greater volumes of data, creating increased potential for running analytical models on different data samples, or in other words, ensemble models.

Diagram of ensemble modeling. — Ensemble modeling methods run multiple analytic models to produce more accurate predictions.

Example of ensemble modeling

A random forest model is a common example of ensemble modeling. This approach uses multiple decision trees -- an analytical model that predicts outcomes based on different variables and rules. It blends decision trees that may analyze different sample data, evaluate different factors or weight common variables differently. The results of the various decision trees are then either converted into a simple average or aggregated through further weighting.

The random forest algorithm is a popular algorithm in machine learning. It is one of the bagging techniques used to create ensemble models, in which different models use a subset of the training data set to minimize variance and overfitting.

Learn more about machine learning algorithms, how they work and their various types.

Ensemble modeling techniques

Many techniques are used to create ensemble models, especially in machine learning. These are the most popular techniques: stacking, bagging, blending and boosting.

Stacking

Stacking, also known as the stacked ensembles method or stacked generalization, is the process of combining the predictions from multiple base estimators to build a new model that produces better (i.e., more accurate) predictions.

How it works: Multiple base models (sometimes known as weak learners) are trained on the same training data set. Their predictions are then fed into a higher-level model to make the final prediction. The latter, also known as a meta-model or second-level model, combines the predictions of all the base models to generate the final prediction.

Bagging

Like stacking, bagging also combines the results of multiple models to produce a generalized result. What's important to note is that the various models are trained on slightly different subsets of the original data set.

How it works: Multiple sets of the original training data are created with replacements. All subsets are of the same size. These subsets are also known as bags and the method is known as bootstrap aggregating. Here, the bags are used to get a fair idea of the complete set and to train the models in parallel.

Blending

Blending is like stacking. One key difference is that the final model learns both training and validation (holdout) data.

How it works: In blending, both the validation set and predictions are used to build a model. The model is fitted on the training set while predictions are made on the validation set and test set. The features are extended to include the validation set and the model is used to make final predictions.

Boosting

Boosting is a sequential ensemble modeling technique in which the errors of one model are corrected by the subsequent model. The models whose errors are corrected are known as weak learners. Adaptive boosting is the most popular version of the boosting ensemble modeling technique in machine learning.

How it works: A subset of the original training data is created with all the data points having equal weight. This subset is used to create a base model that makes predictions on the entire data set. Based on the actual and predicted values, errors are calculated and the incorrectly predicted observations are given higher weights. A new model is created that tries to correct the errors and new predictions are made on the data set. This process continues, with each new model attempting to correct the previous model's errors, until a final model or strong learner is created. This model is the weighted mean of all the weak learners, and it improves the overall performance of the ensemble.

Understand the differences between advanced and predictive analytics techniques to maximize insights. Explore top predictive analytics use cases with enterprise examples. Learn about machine learning models using types and examples.

This was last updated in May 2024

Continue Reading About ensemble modeling

How to build a machine learning model

Types of learning in machine learning explained

Mixture-of-experts models explained: What you need to know

Infrastructure for machine learning, AI requirements, examples

How data quality shapes machine learning and AI outcomes

Dig Deeper on Data science and analytics

Data Management

Use RAG with LLMs to democratize data analytics
Pairing retrieval-augmented generation with an LLM helps improve prompts and outputs, democratizing data access and making ...
10 top vector database options for similarity searches
Vector databases excel in different areas of vector searches, including sophisticated text and visual options. Choose the ...
Generative AI shines spotlight on data governance and trust
Generative AI creates new opportunities for how organizations use data. Strong data governance is necessary to build trust in the...

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Content Management

Box acquires Alphamoon intelligent document processing tech
Paper and unstructured PDFs need more help to be ingested into and findable within enterprise knowledge repositories. Enter ...
Compare SharePoint 2019 vs. SharePoint Online
SharePoint 2019 and SharePoint Online have different customization capabilities, payment models and more. Organizations must ...
Quiz: Test your knowledge of information governance best practices
As strict privacy laws challenge organizations, information governance is the answer. This quiz can help business leaders test ...

Oracle sets lofty national EHR goal with Cerner acquisition
With its Cerner acquisition, Oracle sets its sights on creating a national, anonymized patient database -- a road filled with ...
With Cerner, Oracle Cloud Infrastructure gets a boost
Oracle plans to acquire Cerner in a deal valued at about $30B. The second-largest EHR vendor in the U.S. could inject new life ...
Supreme Court sides with Google in Oracle API copyright suit
The Supreme Court ruled 6-2 that Java APIs used in Android phones are not subject to American copyright law, ending a ...

SAP clean core approach changes game for SIs
As SAP pushes its clean core methodology for S/4HANA Cloud environments, the partners who customized legacy SAP systems will need...
Julia White, Scott Russell to leave SAP executive board
Two executive board members will depart SAP in a move that the company says is both to streamline the structure of the board and ...
SAP sustainability chief sees IT as net plus for environment
Sophia Mendelsohn talks about SAP's ambitions to both set an example of sustainability and be an enabler of it with products such...

Close