34 Sensitivity analysis of simulation models
34.1 Stochastic exploration for sensitivity analysis
To perform sensitivity analysis in a simulation model, we need to sample parameter values more efficiently than using regular intervals. There are three common sampling methods: Random Sampling, Latin Hypercube Sampling (LHS), and Sobol Sampling. Below is an explanation of how to obtain parameter samples using each method in R.
34.1.1 Random Sampling
Random sampling is the simplest method, where parameter values are drawn independently from a given probability distribution (e.g., uniform, normal). This method may not cover the parameter space as efficiently as structured sampling methods.
Implementation in R
set.seed(123) # For reproducibility
n <- 100 # Number of samples
param1 <- runif(n, min = 0, max = 1) # Uniform distribution
param2 <- rnorm(n, mean = 0, sd = 1) # Normal distribution
# Combine into a data frame
samples_random <- data.frame(param1, param2)
head(samples_random)📌 Pros: Simple and easy to implement
📌 Cons: Potential clustering of samples, leading to inefficient coverage of the parameter space.
34.1.2 Latin Hypercube Sampling (LHS)
Latin Hypercube Sampling ensures that each parameter is sampled more uniformly across its range. It divides the range of each parameter into equal intervals and ensures that each interval is sampled exactly once (Chalom and Prado 2015).
Implementation in R Using the lhs package:
library(lhs)
set.seed(123)
n <- 100 # Number of samples
k <- 2 # Number of parameters
# Generate LHS sample in [0,1] range
samples_lhs <- randomLHS(n, k)
# Transform to specific distributions
param1 <- qunif(samples_lhs[,1], min = 0, max = 1) # Uniform
param2 <- qnorm(samples_lhs[,2], mean = 0, sd = 1) # Normal
# Combine into a data frame
samples_lhs <- data.frame(param1, param2)
head(samples_lhs)📌 Pros: More uniform coverage of the space than random sampling
📌 Cons: Does not account for interactions between parameters explicitly
34.1.3 Sobol Sampling
Sobol sampling is a quasi-random low-discrepancy sequence designed for global sensitivity analysis. It provides better uniformity across the space than both random and LHS methods (Renardy et al. 2021).
Implementation in R Using the randtoolbox package:
library(randtoolbox)
set.seed(123)
n <- 100 # Number of samples
k <- 2 # Number of parameters
# Generate Sobol sequence
samples_sobol <- sobol(n, dim = k, scrambling = 3)
# Transform to specific distributions
param1 <- qunif(samples_sobol[,1], min = 0, max = 1) # Uniform
param2 <- qnorm(samples_sobol[,2], mean = 0, sd = 1) # Normal
# Combine into a data frame
samples_sobol <- data.frame(param1, param2)
head(samples_sobol)📌 Pros: Highly efficient, low-discrepancy sequence for sensitivity analysis
📌 Cons: Requires specialized libraries, and might not be ideal for small sample sizes
34.1.4 Comparison of Methods
| Sampling Method | Uniform Coverage | Computational Efficiency | Best Use Case |
|---|---|---|---|
| Random | Poor | Fast | Basic sensitivity analysis |
| Latin Hypercube (LHS) | Good | Moderate | Optimized space-filling with no interaction control |
| Sobol | Excellent | Moderate | Global sensitivity analysis |
- Use random sampling when simplicity is preferred.
- Use LHS when uniform coverage of individual parameters is important.
- Use Sobol sampling for high-dimensional sensitivity analysis.
34.2 Use of machine learning for sensitivity analysis
To evaluate the importance of each parameter in a simulation model, Random Forest (RF) can be used as a feature importance estimator. This involves training an RF model on simulation results and then analyzing the impact of each input parameter on the output.
34.2.1 Step-by-Step Guide to Using Random Forest for Parameter Importance
1. Generate Simulation Data The first step is to generate simulation results by systematically sampling multiple parameters using methods like Random Sampling, Latin Hypercube Sampling (LHS), or Sobol Sampling.
Example: Generating Simulation Data in R
# Load required libraries
library(lhs) # For Latin Hypercube Sampling
library(randtoolbox) # For Sobol Sampling
set.seed(123)
n <- 500 # Number of samples
k <- 5 # Number of parameters
# Generate Latin Hypercube sampled input parameters
param_samples <- randomLHS(n, k)
colnames(param_samples) <- paste0("param", 1:k)
# Assume a simple simulation model (e.g., sum of squared params)
simulation_results <- rowSums(param_samples^2)
# Convert to data frame
sim_data <- data.frame(param_samples, Output = simulation_results)
head(sim_data)📌 Note: Replace simulation_results with the actual simulation output.
2. Train a Random Forest Model
Once the data is prepared, an RF model can be trained using randomForest in R.
Train the RF Model
library(randomForest)
# Train Random Forest to predict simulation output
set.seed(123)
rf_model <- randomForest(Output ~ ., data = sim_data, importance = TRUE, ntree = 500)
# Print model summary
print(rf_model)📌 Explanation:
Output ~ .means the RF model uses all parameters to predict the output.
importance = TRUEensures that feature importance is computed.
ntree = 500sets the number of trees in the forest.
3. Extract Parameter Importance
After training, RF provides two types of feature importance: 1. Mean Decrease in Accuracy (MDA) – Measures how much accuracy drops when a parameter is randomly shuffled. 2. Mean Decrease in Gini (MDG) – Measures how much each variable contributes to reducing node impurity in the decision trees.
Plot Feature Importance
# Extract importance values
importance_values <- importance(rf_model)
# Convert to a data frame
importance_df <- data.frame(Parameter = rownames(importance_values),
MDA = importance_values[, 1],
MDG = importance_values[, 2])
# Print importance scores
print(importance_df)
# Plot feature importance
library(ggplot2)
ggplot(importance_df, aes(x = reorder(Parameter, -MDA), y = MDA)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Feature Importance (Mean Decrease in Accuracy)",
x = "Parameter", y = "Importance") +
theme_minimal()📌 Interpretation: - Higher MDA values indicate more important parameters (greater accuracy drop when shuffled). - Higher MDG values mean stronger contributions to splitting decisions in trees.
4. Interpret the Results
After analysing feature importance:
- Key parameters can be identified for further refinement.
- Unimportant parameters can be removed to simplify the model.
- Interactions between parameters can be explored.
Summary
| Step | Action |
|---|---|
| 1 | Generate parameter samples using LHS, Sobol, or Random Sampling |
| 2 | Run simulations to obtain output values |
| 3 | Train a Random Forest model using randomForest |
| 4 | Extract feature importance using importance() |
| 5 | Interpret and visualize the results |
Example: (Angourakis et al. 2022)