SimulationStudy Class

A streamlined guide using the SimulationStudy manager.

Use the SimulationStudy manager to handle the entire lifecycle: storage, diagnostics, and active refinement. This allows you to automatically fix issues in your design.

In this example, we will intentionally feed the study “bad” data (with a large gap in the input space) to demonstrate how the active learning module identifies and fixes the problem.

1. Setup & “Flawed” Data Generation

First, we define our physics simulation and create a dataset with a deliberate gap (missing data between 3mm and 7mm).

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from digiqual import SimulationStudy


# --- Define the Physics ---
def apply_physics(df):
    """Simulates a signal with quadratic trends and heteroscedastic noise."""
    # 1. Base Signal: Quadratic trend (2*L + 0.5*L^2)
    # 2. Angle Penalty: Misalignment (-0.1*Angle) reduces signal
    signal = (
        10.0
        + (2.0 * df["Length"])
        + (0.5 * df["Length"] ** 2)
        - (0.1 * np.abs(df["Angle"]))
    )

    # 3. Heteroscedastic Noise: Higher roughness = More scatter
    noise_scale = 0.5 + (1.5 * df["Roughness"])
    noise = np.random.normal(loc=0, scale=noise_scale, size=len(df))

    return signal + noise


# --- Create Flawed Data (Gap between 3mm and 7mm) ---
df1 = pd.DataFrame(
    {
        "Length": np.random.uniform(0.1, 3.0, 40),
        "Angle": np.random.uniform(-10, 10, 40),
        "Roughness": np.random.uniform(0, 0.5, 40),
    }
)

df2 = pd.DataFrame(
    {
        "Length": np.random.uniform(7.0, 10.0, 40),
        "Angle": np.random.uniform(-10, 10, 40),
        "Roughness": np.random.uniform(0, 0.5, 40),
    }
)

# Combine and Initialize Study
df_initial = pd.concat([df1, df2], ignore_index=True)
df_initial["Signal"] = apply_physics(df_initial)

study = SimulationStudy(
    input_cols=["Length", "Angle", "Roughness"], outcome_col="Signal"
)
study.add_data(df_initial)
Data updated. Total rows: 80

2. Diagnosis (Detecting the Issue)

We now ask the study manager to diagnose the health of our experiment. We expect it to flag the Input Coverage test because of the gap we created.

study.diagnose()
Running validation...
Validation passed. 80 valid rows ready.
Checking sample sufficiency...
Test Variable Metric Value Threshold Pass
0 Input Coverage Length Max Gap Ratio 0.4187 < 0.20 False
1 Input Coverage Angle Max Gap Ratio 0.0653 < 0.20 True
2 Input Coverage Roughness Max Gap Ratio 0.0585 < 0.20 True
3 Model Fit (CV) Signal Mean R2 Score 0.9980 > 0.50 True
4 Bootstrap Convergence Signal Avg CV (Rel Std Dev) 0.0171 < 0.15 True
5 Bootstrap Convergence Signal Max CV (Rel Std Dev) 0.0366 < 0.30 True

3. Adaptive Refinement (The Fix)

Instead of manually guessing where to add points, we use refine(). The study manager automatically detects the gap and generates targeted samples to fill it. In this example, we run the apply_physics() function, but in reality, users should feed the new samples dataframe into their simulation engine to calculate the outcome variable. Users can then add those results into the SimulationStudy class using the add_data() method.

# Generate active learning samples (20 points to fill the gap)
new_samples = study.refine(n_points=20)

if not new_samples.empty:
    # Apply the exact same physics model to the new points
    new_samples['Signal'] = apply_physics(new_samples)

    # Add back to study
    study.add_data(new_samples)
Diagnostics flagged issues. Initiating Active Learning...
 -> Strategy: Exploration (Filling gaps in Length)
Data updated. Total rows: 100

Now we verify that the issue is resolved:

# Re-run diagnostics to confirm the green light
study.diagnose()
Running validation...
Validation passed. 100 valid rows ready.
Checking sample sufficiency...
Test Variable Metric Value Threshold Pass
0 Input Coverage Length Max Gap Ratio 0.0554 < 0.20 True
1 Input Coverage Angle Max Gap Ratio 0.0573 < 0.20 True
2 Input Coverage Roughness Max Gap Ratio 0.0506 < 0.20 True
3 Model Fit (CV) Signal Mean R2 Score 0.9978 > 0.50 True
4 Bootstrap Convergence Signal Avg CV (Rel Std Dev) 0.0131 < 0.15 True
5 Bootstrap Convergence Signal Max CV (Rel Std Dev) 0.0289 < 0.30 True

4. Reliability Analysis

With a validated dataset, we can now run the complete PoD pipeline. The pod() method handles model fitting, distribution selection, and bootstrapping in a single call.

Because the SimulationStudy object manages the state, we can run the analysis and immediately generate standard diagnostics without needing to manually handle the output data.

# 1. Run Analysis (Threshold = 18 dB)
_ = study.pod(poi_col="Length", threshold=18.0)
--- Starting Reliability Analysis (PoI: Length) ---
1. Selecting Mean Model (Cross-Validation)...
-> Selected Model: Kriging (Gaussian Process)
2. Fitting Variance Model (Kernel Smoothing)...
   -> Smoothing Bandwidth: 0.9871
3. Inferring Error Distribution (AIC)...
   -> Selected Distribution: logistic
4. Computing PoD Curve...
5. Running Bootstrap (1000 iterations)...
   -> a90/95 Reliability Index: 2.842
--- Analysis Complete ---
# 2. Visualize Results
study.visualise()
(a) Best Fitting Model (Statistics)
(b) Signal Response Model (Physics)
(c) Probability of Detection Curve (Reliability)
Figure 1: Reliability Analysis Results

Appendix: Full Script

import numpy as np
import pandas as pd

from digiqual import SimulationStudy
from digiqual.sampling import generate_lhs


# --- Define the Physics ---
def apply_physics(df):
    """Simulates a signal with quadratic trends and heteroscedastic noise."""
    # 1. Base Signal: Quadratic trend (2*L + 0.5*L^2)
    # 2. Angle Penalty: Misalignment (-0.1*Angle) reduces signal
    signal = (
        10.0
        + (2.0 * df["Length"])
        + (0.5 * df["Length"] ** 2)
        - (0.1 * np.abs(df["Angle"]))
    )

    # 3. Heteroscedastic Noise: Higher roughness = More scatter
    noise_scale = 0.5 + (1.5 * df["Roughness"])
    noise = np.random.normal(loc=0, scale=noise_scale, size=len(df))

    return signal + noise


# --- Create and analyse a complete dataset ---
vars_df = pd.DataFrame(
    [
        {"Name": "Length", "Min": 0.1, "Max": 10},
        {"Name": "Angle", "Min": -90, "Max": 90},
        {"Name": "Roughness", "Min": 0, "Max": 1},
    ]
)

df = generate_lhs(ranges=vars_df, n=50)

df["Signal"] = apply_physics(df)

study = SimulationStudy(
    input_cols=["Length", "Angle", "Roughness"], outcome_col="Signal"
)
study.add_data(df)

study.diagnose()

study.refine()

_ = study.pod(poi_col="Length", threshold=15)

study.visualise()