diagnostics.validate_simulation

validate_simulation(df, input_cols, outcome_col)

Validates simulation data, coercing to numeric and removing invalid rows.

Parameters

Name Type Description Default
df pd.DataFrame The raw dataframe containing input columns and the outcome column. required
input_cols List[str] List of input variable names. required
outcome_col str Name of the outcome variable. required

Returns

Name Type Description
Tuple[pd.DataFrame, pd.DataFrame] * df_clean: The validated, numeric dataframe ready for analysis. * df_removed: A dataframe containing the rows that were dropped.

Raises

Name Type Description
ValidationError If columns are missing, types are wrong, or too few valid rows remain.

Examples

import pandas as pd
# Create dirty data (includes text and negative values)
df = pd.DataFrame({
    'Length': [1.0, 'BadValue', 5.0],
    'Signal': [0.5, 0.8, -1.2]
})

# Validate
clean, removed = validate_simulation(df, ['Length'], 'Signal')
print(f"Clean rows: {len(clean)}")
print(f"Removed rows: {len(removed)}")