SimulationStudy Class

A streamlined guide using the SimulationStudy manager.

Use the SimulationStudy manager to handle the entire lifecycle: storage, diagnostics, active refinement, and analysis.

Note

In this tutorial, we will intentionally feed the study “bad” data (with a large gap in the input space) to demonstrate how the active learning module automatically identifies and fixes issues.

1. Setup & Data Generation

We’ll define a simulated physics function with heteroscedastic noise and generate an initial design containing a deliberate gap between 3mm and 7mm.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from digiqual import SimulationStudy


# --- Define the Physics ---
def apply_physics(df):
    """Simulates a signal with quadratic trends and heteroscedastic noise."""
    # 1. Base Signal: Quadratic trend (2*L + 0.5*L^2)
    # 2. Angle Penalty: Misalignment (-0.1*Angle) reduces signal
    signal = (
        10.0
        + (2.0 * df["Length"])
        + (0.5 * df["Length"] ** 2)
        - (0.1 * np.abs(df["Angle"]))
    )

    # 3. Heteroscedastic Noise: Higher roughness = More scatter
    noise_scale = 0.5 + (1.5 * df["Roughness"])
    noise = np.random.normal(loc=0, scale=noise_scale, size=len(df))

    return signal + noise


# --- Create Flawed Data (Gap between 3mm and 7mm) ---
df1 = pd.DataFrame(
    {
        "Length": np.random.uniform(0.1, 3.0, 40),
        "Angle": np.random.uniform(-10, 10, 40),
        "Roughness": np.random.uniform(0, 0.5, 40),
    }
)

df2 = pd.DataFrame(
    {
        "Length": np.random.uniform(7.0, 10.0, 40),
        "Angle": np.random.uniform(-10, 10, 40),
        "Roughness": np.random.uniform(0, 0.5, 40),
    }
)

# Combine and Initialise Study
df_initial = pd.concat([df1, df2], ignore_index=True)
df_initial["Signal"] = apply_physics(df_initial)

study = SimulationStudy(
    input_cols=["Length", "Angle", "Roughness"], outcome_col="Signal"
)
study.add_data(df_initial)
Data updated. Total rows: 80

2. Diagnosis (Detecting the Issue)

We ask the SimulationStudy manager to diagnose the health of our experiment.

Warning

Because of our deliberate gap, we expect the diagnostic engine to flag the Input Coverage test as failed.

study.diagnose()
Running validation...
Validation passed. 80 valid rows ready.
Checking sample sufficiency...
Test Variable Metric Value Threshold Pass
0 Input Coverage Length Max Gap Ratio 0.4359 < 0.20 False
1 Input Coverage Angle Max Gap Ratio 0.0845 < 0.20 True
2 Input Coverage Roughness Max Gap Ratio 0.0650 < 0.20 True
3 Model Fit (CV) Signal Mean R2 Score 0.9968 > 0.50 True
4 Bootstrap Convergence Signal Avg CV (Rel Std Dev) 0.0168 < 0.15 True
5 Bootstrap Convergence Signal Max CV (Rel Std Dev) 0.0314 < 0.30 True

3. Adaptive Refinement (The Fix)

Instead of manually guessing where to add new simulations, we use the refine() method.

  1. Detect: It automatically targets the exact region where the gap or uncertainty exists.
  2. Generate: It outputs a dataframe of new input coordinates.
  3. Execute & Store: You run your external simulation on these points and append the results back using add_data().
# Generate active learning samples (20 points to fill the gap)
new_samples = study.refine(n_points=20)

if not new_samples.empty:
    # Apply the exact same physics model to the new points
    new_samples['Signal'] = apply_physics(new_samples)

    # Add back to study
    study.add_data(new_samples)
Diagnostics flagged issues. Initiating Active Learning...
 -> Strategy: Exploration (Filling gaps in Length)
Data updated. Total rows: 100

Now we verify that the issue is resolved:

# Re-run diagnostics to confirm the green light
study.diagnose()
Running validation...
Validation passed. 100 valid rows ready.
Checking sample sufficiency...
Test Variable Metric Value Threshold Pass
0 Input Coverage Length Max Gap Ratio 0.0811 < 0.20 True
1 Input Coverage Angle Max Gap Ratio 0.0608 < 0.20 True
2 Input Coverage Roughness Max Gap Ratio 0.0650 < 0.20 True
3 Model Fit (CV) Signal Mean R2 Score 0.9965 > 0.50 True
4 Bootstrap Convergence Signal Avg CV (Rel Std Dev) 0.0159 < 0.15 True
5 Bootstrap Convergence Signal Max CV (Rel Std Dev) 0.0332 < 0.30 True

4. Reliability Analysis

With a validated dataset, we can now run the complete PoD pipeline.

Tip

The single pod() method automatically handles:

  • Monte-Carlo Integration (marginalising out Nuisance variables)
  • Model Selection (finding the best polynomial fit)
  • Distribution Inference (finding the best error distribution)
  • Bootstrapping (generating confidence bounds)
# 1. Run Analysis (Marginalising noise out across Nuisance Variables)
_ = study.pod(
    poi_col="Length", nuisance_col=["Angle", "Roughness"], threshold=18.0, n_jobs=-1
)
--- Starting Reliability Analysis (PoIs: ['Length']) ---
1. Selecting Mean Model (Cross-Validation)...
-> Selected Model: Kriging (Gaussian Process)
2. Fitting Variance Model (Kernel Smoothing)...
   -> Optimizing bandwidth via LOO-CV...
   -> Smoothing Bandwidth: 9.7815
3. Inferring Error Distribution (AIC)...
   -> Selected Distribution: logistic
4. Computing PoD Curve...
5. Running Bootstrap (1000 iterations on 11 cores)...
[Parallel(n_jobs=11)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=11)]: Done  19 tasks      | elapsed:    1.8s
[Parallel(n_jobs=11)]: Done 140 tasks      | elapsed:    3.4s
[Parallel(n_jobs=11)]: Done 343 tasks      | elapsed:    6.2s
[Parallel(n_jobs=11)]: Done 626 tasks      | elapsed:   10.1s
   -> a90/95 Reliability Index: 2.886
--- Analysis Complete ---
[Parallel(n_jobs=11)]: Done 1000 out of 1000 | elapsed:   15.4s finished
# 2. Visualise Results
study.visualise()
(a) Best Fitting Model (Statistics)
(b) Signal Response Model (Physics)
(c) Probability of Detection Curve (Reliability)
Figure 1: Reliability Analysis Results

Appendix: Full Script

import numpy as np
import pandas as pd

from digiqual import SimulationStudy
from digiqual.sampling import generate_lhs


# --- Define the Physics ---
def apply_physics(df):
    """Simulates a signal with quadratic trends and heteroscedastic noise."""
    # 1. Base Signal: Quadratic trend (2*L + 0.5*L^2)
    # 2. Angle Penalty: Misalignment (-0.1*Angle) reduces signal
    signal = (
        10.0
        + (2.0 * df["Length"])
        + (0.5 * df["Length"] ** 2)
        - (0.1 * np.abs(df["Angle"]))
    )

    # 3. Heteroscedastic Noise: Higher roughness = More scatter
    noise_scale = 0.5 + (1.5 * df["Roughness"])
    noise = np.random.normal(loc=0, scale=noise_scale, size=len(df))

    return signal + noise


# --- Create and analyse a complete dataset ---
vars_df = pd.DataFrame(
    [
        {"Name": "Length", "Min": 0.1, "Max": 10},
        {"Name": "Angle", "Min": -90, "Max": 90},
        {"Name": "Roughness", "Min": 0, "Max": 1},
    ]
)

df = generate_lhs(ranges=vars_df, n=50)

df["Signal"] = apply_physics(df)

study = SimulationStudy(
    input_cols=["Length", "Angle", "Roughness"], outcome_col="Signal"
)
study.add_data(df)

study.diagnose()

study.refine()

# Analyse multi-dimensional PoD (1 PoI, 2 Nuisance)
_ = study.pod(
    poi_col="Length", nuisance_col=["Angle", "Roughness"], threshold=15, n_jobs=-1
)

study.visualise()