diagnostics.validate_simulation
validate_simulation(df, input_cols, outcome_col)
Validates simulation data, coercing to numeric and removing invalid rows.
Parameters
| df |
pd.DataFrame |
The raw dataframe containing input columns and the outcome column. |
required |
| input_cols |
List[str] |
List of input variable names. |
required |
| outcome_col |
str |
Name of the outcome variable. |
required |
Returns
|
Tuple[pd.DataFrame, pd.DataFrame] |
* df_clean: The validated, numeric dataframe ready for analysis. * df_removed: A dataframe containing the rows that were dropped. |
Raises
|
ValidationError |
If columns are missing, types are wrong, or too few valid rows remain. |
Examples
import pandas as pd
# Create dirty data (includes text and negative values)
df = pd.DataFrame({
'Length': [1.0, 'BadValue', 5.0],
'Signal': [0.5, 0.8, -1.2]
})
# Validate
clean, removed = validate_simulation(df, ['Length'], 'Signal')
print(f"Clean rows: {len(clean)}")
print(f"Removed rows: {len(removed)}")