Generalised Probability of Detection

In traditional non-destructive evaluation (NDE), the standard \(\hat{a}\)-versus-\(a\) approach relies on several restrictive assumptions: 1. Linearity of the signal response to defect size. 2. Homoscedasticity (constant variance) of noise across all defect sizes. 3. Normality (Gaussian shape) of the residuals.

Interaction with complex physics—such as crack roughness or angle variations—often breaks these assumptions. digiqual implements a generalized framework based on Malkiel et al. (2025) to relax these constraints.

1. Automated Model Selection (Relaxing Linearity)

Instead of forcing a linear relationship \(\hat{a} = \beta_0 + \beta_1 a\), digiqual treats the expectation model as a selection problem.

The Mathematics

It evaluates a pool of candidate models \(f(x)\) including polynomials up to degree \(10\) and Gaussian Process (Kriging) models. For each model, it performs \(K\)-fold Cross-Validation (CV) to estimate the Mean Squared Error (MSE): \[ MSE_{CV} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{f}^{-k(i)}(x_i))^2 \] Where \(\hat{f}^{-k(i)}\) is the model trained without the \(k\)-th fold containing observation \(i\). The model minimizing this error is automatically selected, balancing bias and variance.

2. Modeling Heteroscedasticity (Relaxing Constant Variance)

Signal scatter often increases with flaw size. digiqual abandons the constant variance assumption \(\sigma^2 = c\), replacing it with a localized estimation \(\sigma^2(x)\).

The Mathematics

It implements a Nadaraya-Watson Gaussian kernel average smoother. The variance at any point \(x\) is calculated as a weighted average of the surrounding squared residuals \(e_i^2 = (y_i - \hat{y}_i)^2\): \[ \hat{\sigma}^2(x) = \frac{\sum_{i=1}^n K_h(x - x_i) e_i^2}{\sum_{i=1}^n K_h(x - x_i)} \] Where \(K_h\) is the Gaussian kernel with bandwidth \(h\): \[ K_h(u) = \frac{1}{\sqrt{2\pi h^2}} \exp\left(-\frac{u^2}{2h^2}\right) \] The optimal bandwidth \(h\) is determined automatically via Leave-One-Out Cross-Validation (LOO-CV).

3. Inferring Error Distributions (Relaxing Normality)

Noise is rarely perfectly Gaussian. digiqual infers the true shape of the error.

The Mathematics

Residuals are first standardized into \(Z\)-scores using the local standard deviation: \[ Z_i = \frac{y_i - \hat{f}(x_i)}{\hat{\sigma}(x_i)} \] These \(Z\)-scores are then fitted against a suite of distributions: Normal, Gumbel, Logistic, Laplace, and t-Student. The best-fitting distribution is selected using the Akaike Information Criterion (AIC): \[ AIC = 2k - 2\ln(\hat{L}) \] Where \(k\) is the number of parameters and \(\hat{L}\) is the Maximum Likelihood.

4. Multi-Dimensional Evaluation (Marginalisation & Slicing)

When evaluating multidimensional datasets with additional variables (e.g., angle \(\theta\), roughness \(R\)), digiqual fits the surrogate model to the entire \(n\)-dimensional parameter space. To project this down to a 1D or 2D Probability of Detection curve, it uses two distinct techniques:

Active Marginalisation (Nuisance Parameters)

For parameters treated as random noise, the expected PoD for a Parameter of Interest (PoI) vector \(\mathbf{x}\) is calculated by integrating the conditional PoD over the probability density function \(p(\mathbf{z})\) of the nuisance variables \(\mathbf{z}\): \[PoD(\mathbf{x}) = \int_{\Omega_z} PoD(\mathbf{x}, \mathbf{z}) p(\mathbf{z}) d\mathbf{z}\]

digiqual approximates this integral computationally via Monte Carlo integration (Latin Hypercube Sampling) to average the signal response across the noise spectrum.

Parameter Slicing (Constant Parameters)

For parameters that are neither plotted nor integrated out, digiqual treats them as strict mathematical constants. The evaluation grid takes a “slice” through the \(n\)-dimensional response surface at these specific values (defaulting to the parameter’s median), allowing you to explore specific physical scenarios without retraining the underlying model.

5. Bootstrapping for Confidence Bounds

Because the model is non-linear and non-normal, traditional analytical confidence bounds are invalid.

The Process

digiqual uses Bootstrap resampling:

Draw \(N\) samples with replacement from the original dataset.
Completely physically re-fit the mean model, variance model, and error distribution.
Calculate the PoD curve.
Repeat this \(B\) times (typically \(B=1000\)).
Extract the \(2.5^{th}\) and \(97.5^{th}\) percentiles at every point \(a\) to form the robust 95% confidence interval \(PoD_{95}\).

Summary: The 3-Tier Parameter Taxonomy

To understand how digiqual handles high-dimensional data, it is helpful to see exactly how each parameter type is treated during the three phases of reliability assessment:

Parameter Type	1. Model Fit (Training Phase)	2. Evaluation Phase (Mean Curve)	3. PoD Generation (Bootstrapping)
Parameter(s) of Interest (PoI)	Treated as a standard geometric dimension to train the n-dimensional surrogate model.	Defines the evaluation grid. Steps evenly from its minimum to maximum value to form the plot axes.	Forms the primary axes for the final Probability of Detection confidence bounds.
Sliced Parameter (Constant)	Treated as a standard geometric dimension. The model learns its physical impact globally.	Held strictly constant at a user-defined value. Acts as a targeted, multi-dimensional “laser beam” slice.	Remains rigidly locked at the exact same constant value across all bootstrap resamples.
Nuisance Parameter (Marginalised)	Treated as a standard geometric dimension. The model learns its physical impact globally.	Randomly sampled via Monte Carlo integration across its physical bounds. Results are averaged out.	Re-sampled and averaged inside every single bootstrap iteration to capture real-world uncertainty.