36 Sensitivity analysis of simulation models
36.1 Bifurcation plots
36.2 Use of machine learning for sensitivity analysis
To evaluate the importance of each parameter in a simulation model, Random Forest (RF) can be used as a feature importance estimator. This involves training an RF model on simulation results and then analyzing the impact of each input parameter on the output.
36.2.1 Step-by-Step Guide to Using Random Forest for Parameter Importance
1. Generate Simulation Data The first step is to generate simulation results by systematically sampling multiple parameters using methods like Random Sampling, Latin Hypercube Sampling (LHS), or Sobol Sampling.
Example: Generating Simulation Data in R
# Load required libraries
library(lhs) # For Latin Hypercube Sampling
library(randtoolbox) # For Sobol Sampling
set.seed(123)
<- 500 # Number of samples
n <- 5 # Number of parameters
k
# Generate Latin Hypercube sampled input parameters
<- randomLHS(n, k)
param_samples colnames(param_samples) <- paste0("param", 1:k)
# Assume a simple simulation model (e.g., sum of squared params)
<- rowSums(param_samples^2)
simulation_results
# Convert to data frame
<- data.frame(param_samples, Output = simulation_results)
sim_data
head(sim_data)
π Note: Replace simulation_results
with the actual simulation output.
2. Train a Random Forest Model
Once the data is prepared, an RF model can be trained using randomForest in R.
Train the RF Model
library(randomForest)
# Train Random Forest to predict simulation output
set.seed(123)
<- randomForest(Output ~ ., data = sim_data, importance = TRUE, ntree = 500)
rf_model
# Print model summary
print(rf_model)
π Explanation:
Output ~ .
means the RF model uses all parameters to predict the output.
importance = TRUE
ensures that feature importance is computed.
ntree = 500
sets the number of trees in the forest.
3. Extract Parameter Importance
After training, RF provides two types of feature importance: 1. Mean Decrease in Accuracy (MDA) β Measures how much accuracy drops when a parameter is randomly shuffled. 2. Mean Decrease in Gini (MDG) β Measures how much each variable contributes to reducing node impurity in the decision trees.
Plot Feature Importance
# Extract importance values
<- importance(rf_model)
importance_values
# Convert to a data frame
<- data.frame(Parameter = rownames(importance_values),
importance_df MDA = importance_values[, 1],
MDG = importance_values[, 2])
# Print importance scores
print(importance_df)
# Plot feature importance
library(ggplot2)
ggplot(importance_df, aes(x = reorder(Parameter, -MDA), y = MDA)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Feature Importance (Mean Decrease in Accuracy)",
x = "Parameter", y = "Importance") +
theme_minimal()
π Interpretation: - Higher MDA values indicate more important parameters (greater accuracy drop when shuffled). - Higher MDG values mean stronger contributions to splitting decisions in trees.
4. Interpret the Results
After analysing feature importance: - Key parameters can be identified for further refinement. - Unimportant parameters can be removed to simplify the model. - Interactions between parameters can be explored.
Summary | Step | Action | |ββ|βββ| | 1 | Generate parameter samples using LHS, Sobol, or Random Sampling | | 2 | Run simulations to obtain output values | | 3 | Train a Random Forest model using randomForest
| | 4 | Extract feature importance using importance()
| | 5 | Interpret and visualize the results |
Example: (Angourakis et al. 2022)