34 Sensitivity analysis of simulation models

34.1 Stochastic exploration for sensitivity analysis

To perform sensitivity analysis in a simulation model, we need to sample parameter values more efficiently than using regular intervals. There are three common sampling methods: Random Sampling, Latin Hypercube Sampling (LHS), and Sobol Sampling. Below is an explanation of how to obtain parameter samples using each method in R.

34.1.1 Random Sampling

Random sampling is the simplest method, where parameter values are drawn independently from a given probability distribution (e.g., uniform, normal). This method may not cover the parameter space as efficiently as structured sampling methods.

Implementation in R

set.seed(123)  # For reproducibility
n <- 100  # Number of samples
param1 <- runif(n, min = 0, max = 1)  # Uniform distribution
param2 <- rnorm(n, mean = 0, sd = 1)  # Normal distribution

# Combine into a data frame
samples_random <- data.frame(param1, param2)
head(samples_random)

📌 Pros: Simple and easy to implement
📌 Cons: Potential clustering of samples, leading to inefficient coverage of the parameter space.

34.1.2 Latin Hypercube Sampling (LHS)

Latin Hypercube Sampling ensures that each parameter is sampled more uniformly across its range. It divides the range of each parameter into equal intervals and ensures that each interval is sampled exactly once (Chalom and Prado 2015).

Implementation in R Using the lhs package:

library(lhs)
set.seed(123)
n <- 100  # Number of samples
k <- 2  # Number of parameters

# Generate LHS sample in [0,1] range
samples_lhs <- randomLHS(n, k)

# Transform to specific distributions
param1 <- qunif(samples_lhs[,1], min = 0, max = 1)  # Uniform
param2 <- qnorm(samples_lhs[,2], mean = 0, sd = 1)  # Normal

# Combine into a data frame
samples_lhs <- data.frame(param1, param2)
head(samples_lhs)

📌 Pros: More uniform coverage of the space than random sampling
📌 Cons: Does not account for interactions between parameters explicitly

34.1.3 Sobol Sampling

Sobol sampling is a quasi-random low-discrepancy sequence designed for global sensitivity analysis. It provides better uniformity across the space than both random and LHS methods (Renardy et al. 2021).

Implementation in R Using the randtoolbox package:

library(randtoolbox)
set.seed(123)
n <- 100  # Number of samples
k <- 2  # Number of parameters

# Generate Sobol sequence
samples_sobol <- sobol(n, dim = k, scrambling = 3)

# Transform to specific distributions
param1 <- qunif(samples_sobol[,1], min = 0, max = 1)  # Uniform
param2 <- qnorm(samples_sobol[,2], mean = 0, sd = 1)  # Normal

# Combine into a data frame
samples_sobol <- data.frame(param1, param2)
head(samples_sobol)

📌 Pros: Highly efficient, low-discrepancy sequence for sensitivity analysis
📌 Cons: Requires specialized libraries, and might not be ideal for small sample sizes

34.1.4 Comparison of Methods

Sampling Method	Uniform Coverage	Computational Efficiency	Best Use Case
Random	Poor	Fast	Basic sensitivity analysis
Latin Hypercube (LHS)	Good	Moderate	Optimized space-filling with no interaction control
Sobol	Excellent	Moderate	Global sensitivity analysis

Use random sampling when simplicity is preferred.
Use LHS when uniform coverage of individual parameters is important.
Use Sobol sampling for high-dimensional sensitivity analysis.

34.2 Use of machine learning for sensitivity analysis

To evaluate the importance of each parameter in a simulation model, Random Forest (RF) can be used as a feature importance estimator. This involves training an RF model on simulation results and then analyzing the impact of each input parameter on the output.

34.2.1 Step-by-Step Guide to Using Random Forest for Parameter Importance

1. Generate Simulation Data The first step is to generate simulation results by systematically sampling multiple parameters using methods like Random Sampling, Latin Hypercube Sampling (LHS), or Sobol Sampling.

Example: Generating Simulation Data in R

# Load required libraries
library(lhs)        # For Latin Hypercube Sampling
library(randtoolbox) # For Sobol Sampling

set.seed(123)
n <- 500  # Number of samples
k <- 5    # Number of parameters

# Generate Latin Hypercube sampled input parameters
param_samples <- randomLHS(n, k)
colnames(param_samples) <- paste0("param", 1:k)

# Assume a simple simulation model (e.g., sum of squared params)
simulation_results <- rowSums(param_samples^2)

# Convert to data frame
sim_data <- data.frame(param_samples, Output = simulation_results)

head(sim_data)

📌 Note: Replace simulation_results with the actual simulation output.

2. Train a Random Forest Model

Once the data is prepared, an RF model can be trained using randomForest in R.

Train the RF Model

library(randomForest)

# Train Random Forest to predict simulation output
set.seed(123)
rf_model <- randomForest(Output ~ ., data = sim_data, importance = TRUE, ntree = 500)

# Print model summary
print(rf_model)

📌 Explanation:

Output ~ . means the RF model uses all parameters to predict the output.
importance = TRUE ensures that feature importance is computed.
ntree = 500 sets the number of trees in the forest.

3. Extract Parameter Importance

After training, RF provides two types of feature importance: 1. Mean Decrease in Accuracy (MDA) – Measures how much accuracy drops when a parameter is randomly shuffled. 2. Mean Decrease in Gini (MDG) – Measures how much each variable contributes to reducing node impurity in the decision trees.

Plot Feature Importance

# Extract importance values
importance_values <- importance(rf_model)

# Convert to a data frame
importance_df <- data.frame(Parameter = rownames(importance_values),
                            MDA = importance_values[, 1], 
                            MDG = importance_values[, 2])

# Print importance scores
print(importance_df)

# Plot feature importance
library(ggplot2)
ggplot(importance_df, aes(x = reorder(Parameter, -MDA), y = MDA)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Feature Importance (Mean Decrease in Accuracy)",
       x = "Parameter", y = "Importance") +
  theme_minimal()

📌 Interpretation: - Higher MDA values indicate more important parameters (greater accuracy drop when shuffled). - Higher MDG values mean stronger contributions to splitting decisions in trees.

4. Interpret the Results

After analysing feature importance:

Key parameters can be identified for further refinement.
Unimportant parameters can be removed to simplify the model.
Interactions between parameters can be explored.

Summary

Step	Action
1	Generate parameter samples using LHS, Sobol, or Random Sampling
2	Run simulations to obtain output values
3	Train a Random Forest model using `randomForest`
4	Extract feature importance using `importance()`
5	Interpret and visualize the results

Example: (Angourakis et al. 2022)