pod.fit_variance_model

fit_variance_model(X, y, mean_model, auto_bandwidth=True, bandwidth_ratio=0.1)

Calculates residuals and defines the smoothing bandwidth for variance estimation.

This function acts as the setup phase for modeling heteroscedasticity. It computes the raw residuals from the provided mean model and establishes the smoothing bandwidth either via automated Cross-Validation or a fixed user-defined ratio. It also generates a linearly spaced evaluation grid over the X domain.

Parameters

Name Type Description Default
X np.ndarray The 1D array of original input data (e.g., parameter of interest). required
y np.ndarray The 1D array of original outcome data (e.g., signal response). required
mean_model Any A fitted scikit-learn estimator (e.g., Pipeline or GaussianProcessRegressor) that implements a .predict() method. required
auto_bandwidth bool If True, dynamically calculates the optimal bandwidth using Leave-One-Out Cross-Validation. If False, falls back to the fixed bandwidth_ratio. Defaults to True. True
bandwidth_ratio float The kernel smoothing window size as a fraction of the data range (X.max() - X.min()). Only used if auto_bandwidth is False. Defaults to 0.1. 0.1

Returns

Name Type Description
Tuple[np.ndarray, float] Tuple[np.ndarray, float]: - residuals: Raw differences between y and the mean model predictions. - bandwidth: The selected smoothing window size (in absolute units of X).

Examples

import numpy as np
from sklearn.linear_model import LinearRegression

# 1. Setup dummy data and a basic mean model
X = np.linspace(0, 10, 50)
y = 2.5 * X + np.random.normal(0, 1, 50)

model = LinearRegression()
model.fit(X.reshape(-1, 1), y)

# 2. Extract residuals and optimized bandwidth
residuals, bandwidth = fit_variance_model(
    X, y,
    mean_model=model,
    auto_bandwidth=True
)

print(f"Calculated Bandwidth: {bandwidth:.4f}")
print(f"Evaluation Grid Size: {len(X_eval)}")