pod.fit_variance_model
fit_variance_model(
X,
y,
mean_model,
auto_bandwidth=True,
bandwidth_ratio=0.1,
n_eval_points=100,
)Calculates residuals and defines the smoothing bandwidth for variance estimation.
This function acts as the setup phase for modeling heteroscedasticity. It computes the raw residuals from the provided mean model and establishes the smoothing bandwidth either via automated Cross-Validation or a fixed user-defined ratio. It also generates a linearly spaced evaluation grid over the X domain.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| X | np.ndarray | The 1D array of original input data (e.g., parameter of interest). | required |
| y | np.ndarray | The 1D array of original outcome data (e.g., signal response). | required |
| mean_model | Any | A fitted scikit-learn estimator (e.g., Pipeline or GaussianProcessRegressor) that implements a .predict() method. |
required |
| auto_bandwidth | bool | If True, dynamically calculates the optimal bandwidth using Leave-One-Out Cross-Validation. If False, falls back to the fixed bandwidth_ratio. Defaults to True. |
True |
| bandwidth_ratio | float | The kernel smoothing window size as a fraction of the data range (X.max() - X.min()). Only used if auto_bandwidth is False. Defaults to 0.1. |
0.1 |
| n_eval_points | int | The number of points to generate for the evaluation grid (X_eval). Defaults to 100. |
100 |
Returns
| Name | Type | Description |
|---|---|---|
| Tuple[np.ndarray, float, np.ndarray] | Tuple[np.ndarray, float, np.ndarray]: - residuals: Raw differences between y and the mean model predictions. - bandwidth: The selected smoothing window size (in absolute units of X). - X_eval: A linearly spaced grid over the X domain for downstream plotting/evaluation. |
Examples
import numpy as np
from sklearn.linear_model import LinearRegression
# 1. Setup dummy data and a basic mean model
X = np.linspace(0, 10, 50)
y = 2.5 * X + np.random.normal(0, 1, 50)
model = LinearRegression()
model.fit(X.reshape(-1, 1), y)
# 2. Extract residuals and optimized bandwidth
residuals, bandwidth, X_eval = fit_variance_model(
X, y,
mean_model=model,
auto_bandwidth=True
)
print(f"Calculated Bandwidth: {bandwidth:.4f}")
print(f"Evaluation Grid Size: {len(X_eval)}")