library(ggplot2)
library(readr)
23 Analysis of simulation results
<- "assets/netlogo/experiments/Artificial Anasazi_experiments " experiments_path
<- c("historical households" = "blue",
color_mapping "simulation households" = "darkred")
23.1 Single run
<- "experiment single run" expname
Read output:
<- readr::read_csv(paste0(experiments_path, expname, "-table.csv"), skip = 6) results_single
Rows: 552 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): map-view
dbl (9): [run number], fertility, death-age, harvest-variance, fertility-end...
lgl (1): historic-view?
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Plot trajectories of metrics:
<- paste0(experiments_path, expname, "-trajectories.png")
plot_name
png(plot_name, width = 840, height = 540)
ggplot(results_single) +
geom_line(aes(x = `[step]`, y = `historical-total-households`, color = "historical data"),
linewidth = 1.2) +
geom_line(aes(x = `[step]`, y = `total-households`, color = "simulation households"),
linewidth = 1.2) +
labs(x = "steps", y = "households") +
scale_color_manual(name = "", values = color_mapping) +
theme(legend.position = "right")
dev.off()
svg
2
::include_graphics(plot_name) knitr
23.2 Multiple runs in single configuration
<- "experiment multiple runs" expname
Read output:
<- readr::read_csv(paste0(experiments_path, expname, "-table.csv"), skip = 6) results_single
Rows: 5520 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): map-view
dbl (9): [run number], fertility, death-age, harvest-variance, fertility-end...
lgl (1): historic-view?
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Plot trajectories of metrics:
<- paste0(experiments_path, expname, "-trajectories.png")
plot_name
png(plot_name, width = 840, height = 540)
ggplot(results_single) +
geom_line(aes(x = `[step]`, y = `total-households`, color = `[run number]`, group = `[run number]`),
linewidth = 1.2) +
geom_line(aes(x = `[step]`, y = `historical-total-households`),
color = color_mapping["historical households"],
linewidth = 1.2, linetype = 2) +
labs(x = "steps", y = "households") +
theme(legend.position = "right")
dev.off()
svg
2
::include_graphics(plot_name) knitr
23.3 Parameter exploration - regular intervals
23.3.1 One parameter
<- "experiment harvest adjustment" expname
Read output:
<- readr::read_csv(paste0(experiments_path, expname, "-table.csv"), skip = 6) results_harvest_adj
Rows: 104880 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): map-view
dbl (9): [run number], fertility, death-age, harvest-variance, fertility-end...
lgl (1): historic-view?
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Plot trajectories of metrics:
<- paste0(experiments_path, expname, "-trajectories.png")
plot_name
png(plot_name, width = 840, height = 540)
ggplot(results_harvest_adj) +
geom_line(aes(x = `[step]`, y = `total-households`, color = `harvest-adjustment`, group = `[run number]`),
linewidth = 1.2) +
geom_line(aes(x = `[step]`, y = `historical-total-households`), color = "black",
linewidth = 1.2, linetype = 2) +
labs(x = "steps", y = "households") +
theme(legend.position = "right")
dev.off()
svg
2
::include_graphics(plot_name) knitr
23.3.2 Two parameter
<- "experiment harvest adjustment variance" expname
Read output:
<- readr::read_csv(paste0(experiments_path, expname, "-table.csv"), skip = 6) results_harvest_adj
Rows: 441600 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): map-view
dbl (9): [run number], fertility, death-age, harvest-variance, fertility-end...
lgl (1): historic-view?
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Plot trajectories of metrics:
<- paste0(experiments_path, expname, "-trajectories.png")
plot_name
png(plot_name, width = 840, height = 540)
ggplot(results_harvest_adj) +
geom_line(aes(x = `[step]`, y = `total-households`, group = `[run number]`),
color = color_mapping["simulation households"],
linewidth = 1.2) +
geom_line(aes(x = `[step]`, y = `historical-total-households`),
color = color_mapping["historical households"],
linewidth = 1.2, linetype = 2) +
facet_grid(`harvest-adjustment` ~ `harvest-variance`) +
labs(x = "steps", y = "households") +
theme(legend.position = "right")
dev.off()
svg
2
::include_graphics(plot_name) knitr
23.4 Use of machine learning for sensitivity analysis
To evaluate the importance of each parameter in a simulation model, Random Forest (RF) can be used as a feature importance estimator. This involves training an RF model on simulation results and then analyzing the impact of each input parameter on the output.
23.4.1 Step-by-Step Guide to Using Random Forest for Parameter Importance
1. Generate Simulation Data The first step is to generate simulation results by systematically sampling multiple parameters using methods like Random Sampling, Latin Hypercube Sampling (LHS), or Sobol Sampling.
Example: Generating Simulation Data in R
# Load required libraries
library(lhs) # For Latin Hypercube Sampling
library(randtoolbox) # For Sobol Sampling
set.seed(123)
<- 500 # Number of samples
n <- 5 # Number of parameters
k
# Generate Latin Hypercube sampled input parameters
<- randomLHS(n, k)
param_samples colnames(param_samples) <- paste0("param", 1:k)
# Assume a simple simulation model (e.g., sum of squared params)
<- rowSums(param_samples^2)
simulation_results
# Convert to data frame
<- data.frame(param_samples, Output = simulation_results)
sim_data
head(sim_data)
📌 Note: Replace simulation_results
with the actual simulation output.
2. Train a Random Forest Model
Once the data is prepared, an RF model can be trained using randomForest in R.
Train the RF Model
library(randomForest)
# Train Random Forest to predict simulation output
set.seed(123)
<- randomForest(Output ~ ., data = sim_data, importance = TRUE, ntree = 500)
rf_model
# Print model summary
print(rf_model)
📌 Explanation:
Output ~ .
means the RF model uses all parameters to predict the output.
importance = TRUE
ensures that feature importance is computed.
ntree = 500
sets the number of trees in the forest.
3. Extract Parameter Importance
After training, RF provides two types of feature importance: 1. Mean Decrease in Accuracy (MDA) – Measures how much accuracy drops when a parameter is randomly shuffled. 2. Mean Decrease in Gini (MDG) – Measures how much each variable contributes to reducing node impurity in the decision trees.
Plot Feature Importance
# Extract importance values
<- importance(rf_model)
importance_values
# Convert to a data frame
<- data.frame(Parameter = rownames(importance_values),
importance_df MDA = importance_values[, 1],
MDG = importance_values[, 2])
# Print importance scores
print(importance_df)
# Plot feature importance
library(ggplot2)
ggplot(importance_df, aes(x = reorder(Parameter, -MDA), y = MDA)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Feature Importance (Mean Decrease in Accuracy)",
x = "Parameter", y = "Importance") +
theme_minimal()
📌 Interpretation: - Higher MDA values indicate more important parameters (greater accuracy drop when shuffled). - Higher MDG values mean stronger contributions to splitting decisions in trees.
4. Interpret the Results
After analyzing feature importance: - Key parameters can be identified for further refinement. - Unimportant parameters can be removed to simplify the model. - Interactions between parameters can be explored.
Summary | Step | Action | |——|——–| | 1 | Generate parameter samples using LHS, Sobol, or Random Sampling | | 2 | Run simulations to obtain output values | | 3 | Train a Random Forest model using randomForest
| | 4 | Extract feature importance using importance()
| | 5 | Interpret and visualize the results |
Example: (Angourakis et al. 2022)
23.5 Use of machine learning for parameter callibration
(TO-DO)
23.6 Use of machine learning for model selection
Example: (Carrignon, Brughmans, and Romanowska 2020)