pod.fit_variance_model

fit_variance_model(
    X,
    y,
    mean_model,
    auto_bandwidth=True,
    bandwidth_ratio=0.1,
    n_eval_points=100,
)

Calculates residuals and defines the smoothing bandwidth for variance estimation.

This function acts as the setup phase for modeling heteroscedasticity. It computes the raw residuals from the provided mean model and establishes the smoothing bandwidth either via automated Cross-Validation or a fixed user-defined ratio. It also generates a linearly spaced evaluation grid over the X domain.

Parameters

Name Type Description Default
X np.ndarray The 1D array of original input data (e.g., parameter of interest). required
y np.ndarray The 1D array of original outcome data (e.g., signal response). required
mean_model Any A fitted scikit-learn estimator (e.g., Pipeline or GaussianProcessRegressor) that implements a .predict() method. required
auto_bandwidth bool If True, dynamically calculates the optimal bandwidth using Leave-One-Out Cross-Validation. If False, falls back to the fixed bandwidth_ratio. Defaults to True. True
bandwidth_ratio float The kernel smoothing window size as a fraction of the data range (X.max() - X.min()). Only used if auto_bandwidth is False. Defaults to 0.1. 0.1
n_eval_points int The number of points to generate for the evaluation grid (X_eval). Defaults to 100. 100

Returns

Name Type Description
Tuple[np.ndarray, float, np.ndarray] Tuple[np.ndarray, float, np.ndarray]: - residuals: Raw differences between y and the mean model predictions. - bandwidth: The selected smoothing window size (in absolute units of X). - X_eval: A linearly spaced grid over the X domain for downstream plotting/evaluation.

Examples

import numpy as np
from sklearn.linear_model import LinearRegression

# 1. Setup dummy data and a basic mean model
X = np.linspace(0, 10, 50)
y = 2.5 * X + np.random.normal(0, 1, 50)

model = LinearRegression()
model.fit(X.reshape(-1, 1), y)

# 2. Extract residuals and optimized bandwidth
residuals, bandwidth, X_eval = fit_variance_model(
    X, y,
    mean_model=model,
    auto_bandwidth=True
)

print(f"Calculated Bandwidth: {bandwidth:.4f}")
print(f"Evaluation Grid Size: {len(X_eval)}")