pod.fit_variance_model

fit_variance_model(X, y, mean_model, auto_bandwidth=True, bandwidth_ratio=0.1)

Calculates residuals and defines the smoothing bandwidth for variance estimation.

This function acts as the setup phase for modeling heteroscedasticity. It computes the raw residuals from the provided mean model and establishes the smoothing bandwidth either via automated Cross-Validation or a fixed user-defined ratio. It also generates a linearly spaced evaluation grid over the X domain.

Parameters

Name	Type	Description	Default
X	np.ndarray	The 1D array of original input data (e.g., parameter of interest).	required
y	np.ndarray	The 1D array of original outcome data (e.g., signal response).	required
mean_model	Any	A fitted scikit-learn estimator (e.g., Pipeline or GaussianProcessRegressor) that implements a `.predict()` method.	required
auto_bandwidth	bool	If True, dynamically calculates the optimal bandwidth using Leave-One-Out Cross-Validation. If False, falls back to the fixed `bandwidth_ratio`. Defaults to True.	`True`
bandwidth_ratio	float	The kernel smoothing window size as a fraction of the data range (X.max() - X.min()). Only used if `auto_bandwidth` is False. Defaults to 0.1.	`0.1`

Returns

Name	Type	Description
	Tuple[np.ndarray, float]	Tuple[np.ndarray, float]: - residuals: Raw differences between `y` and the mean model predictions. - bandwidth: The selected smoothing window size (in absolute units of X).

Examples

import numpy as np
from sklearn.linear_model import LinearRegression

# 1. Setup dummy data and a basic mean model
X = np.linspace(0, 10, 50)
y = 2.5 * X + np.random.normal(0, 1, 50)

model = LinearRegression()
model.fit(X.reshape(-1, 1), y)

# 2. Extract residuals and optimized bandwidth
residuals, bandwidth = fit_variance_model(
    X, y,
    mean_model=model,
    auto_bandwidth=True
)

print(f"Calculated Bandwidth: {bandwidth:.4f}")
print(f"Evaluation Grid Size: {len(X_eval)}")