A streamlined guide using the SimulationStudy manager.
Use the SimulationStudy manager to handle the entire lifecycle: storage, diagnostics, active refinement, and analysis.
Note
In this tutorial, we will intentionally feed the study “bad” data (with a large gap in the input space) to demonstrate how the active learning module automatically identifies and fixes issues.
1. Setup & Data Generation
We’ll define a simulated physics function with heteroscedastic noise and generate an initial design containing a deliberate gap between 3mm and 7mm.
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom digiqual import SimulationStudy# --- Define the Physics ---def apply_physics(df):"""Simulates a signal with quadratic trends and heteroscedastic noise."""# 1. Base Signal: Quadratic trend (2*L + 0.5*L^2)# 2. Angle Penalty: Misalignment (-0.1*Angle) reduces signal signal = (10.0+ (2.0* df["Length"])+ (0.5* df["Length"] **2)- (0.1* np.abs(df["Angle"])) )# 3. Heteroscedastic Noise: Higher roughness = More scatter noise_scale =0.5+ (1.5* df["Roughness"]) noise = np.random.normal(loc=0, scale=noise_scale, size=len(df))return signal + noise# --- Create Flawed Data (Gap between 3mm and 7mm) ---df1 = pd.DataFrame( {"Length": np.random.uniform(0.1, 3.0, 40),"Angle": np.random.uniform(-10, 10, 40),"Roughness": np.random.uniform(0, 0.5, 40), })df2 = pd.DataFrame( {"Length": np.random.uniform(7.0, 10.0, 40),"Angle": np.random.uniform(-10, 10, 40),"Roughness": np.random.uniform(0, 0.5, 40), })# Combine and Initialise Studydf_initial = pd.concat([df1, df2], ignore_index=True)df_initial["Signal"] = apply_physics(df_initial)study = SimulationStudy( input_cols=["Length", "Angle", "Roughness"], outcome_col="Signal")study.add_data(df_initial)
Data updated. Total rows: 80
2. Diagnosis (Detecting the Issue)
We ask the SimulationStudy manager to diagnose the health of our experiment.
Warning
Because of our deliberate gap, we expect the diagnostic engine to flag the Input Coverage test as failed.
Instead of manually guessing where to add new simulations, we use the refine() method.
Detect: It automatically targets the exact region where the gap or uncertainty exists.
Generate: It outputs a dataframe of new input coordinates.
Execute & Store: You run your external simulation on these points and append the results back using add_data().
# Generate active learning samples (20 points to fill the gap)new_samples = study.refine(n_points=20)ifnot new_samples.empty:# Apply the exact same physics model to the new points new_samples['Signal'] = apply_physics(new_samples)# Add back to study study.add_data(new_samples)
Diagnostics flagged issues. Initiating Active Learning...
-> Strategy: Exploration (Filling gaps in Length)
Data updated. Total rows: 100
Now we verify that the issue is resolved:
# Re-run diagnostics to confirm the green lightstudy.diagnose()