A streamlined guide using the SimulationStudy manager.
Use the SimulationStudy manager to handle the entire lifecycle: storage, diagnostics, and active refinement. This allows you to automatically fix issues in your design.
In this example, we will intentionally feed the study “bad” data (with a large gap in the input space) to demonstrate how the active learning module identifies and fixes the problem.
1. Setup & “Flawed” Data Generation
First, we define our physics simulation and create a dataset with a deliberate gap (missing data between 3mm and 7mm).
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom digiqual import SimulationStudy# --- Define the Physics ---def apply_physics(df):"""Simulates a signal with quadratic trends and heteroscedastic noise."""# 1. Base Signal: Quadratic trend (2*L + 0.5*L^2)# 2. Angle Penalty: Misalignment (-0.1*Angle) reduces signal signal = (10.0+ (2.0* df["Length"])+ (0.5* df["Length"] **2)- (0.1* np.abs(df["Angle"])) )# 3. Heteroscedastic Noise: Higher roughness = More scatter noise_scale =0.5+ (1.5* df["Roughness"]) noise = np.random.normal(loc=0, scale=noise_scale, size=len(df))return signal + noise# --- Create Flawed Data (Gap between 3mm and 7mm) ---df1 = pd.DataFrame( {"Length": np.random.uniform(0.1, 3.0, 40),"Angle": np.random.uniform(-10, 10, 40),"Roughness": np.random.uniform(0, 0.5, 40), })df2 = pd.DataFrame( {"Length": np.random.uniform(7.0, 10.0, 40),"Angle": np.random.uniform(-10, 10, 40),"Roughness": np.random.uniform(0, 0.5, 40), })# Combine and Initialize Studydf_initial = pd.concat([df1, df2], ignore_index=True)df_initial["Signal"] = apply_physics(df_initial)study = SimulationStudy( input_cols=["Length", "Angle", "Roughness"], outcome_col="Signal")study.add_data(df_initial)
Data updated. Total rows: 80
2. Diagnosis (Detecting the Issue)
We now ask the study manager to diagnose the health of our experiment. We expect it to flag the Input Coverage test because of the gap we created.
Instead of manually guessing where to add points, we use refine(). The study manager automatically detects the gap and generates targeted samples to fill it. In this example, we run the apply_physics() function, but in reality, users should feed the new samples dataframe into their simulation engine to calculate the outcome variable. Users can then add those results into the SimulationStudy class using the add_data() method.
# Generate active learning samples (20 points to fill the gap)new_samples = study.refine(n_points=20)ifnot new_samples.empty:# Apply the exact same physics model to the new points new_samples['Signal'] = apply_physics(new_samples)# Add back to study study.add_data(new_samples)
Diagnostics flagged issues. Initiating Active Learning...
-> Strategy: Exploration (Filling gaps in Length)
Data updated. Total rows: 100
Now we verify that the issue is resolved:
# Re-run diagnostics to confirm the green lightstudy.diagnose()
With a validated dataset, we can now run the complete PoD pipeline. The pod() method handles model fitting, distribution selection, and bootstrapping in a single call.
Because the SimulationStudy object manages the state, we can run the analysis and immediately generate standard diagnostics without needing to manually handle the output data.