Other calibration errors

Unnormalized calibration mean embedding (UCME)

Instead of the formulation of the calibration error as an integral probability metric one can consider the unnormalized calibration mean embedding (UCME).

Let $\mathcal{P} \times \mathcal{Y}$ be the product space of predictions and targets. The UCME for a real-valued kernel $k \colon (\mathcal{P} \times \mathcal{Y}) \times (\mathcal{P} \times \mathcal{Y}) \to \mathbb{R}$ and $m$ test locations is defined[WLZ] as

\[\mathrm{UCME}_{k,m}^2 := m^{-1} \sum_{i=1}^m \Big(\mathbb{E}_{Y,P_X} k\big(T_i, (P_X, Y)\big) - \mathbb{E}_{Z_X,P_X} k\big(T_i, (P_X, Z_X)\big)\Big)^2,\]

where test locations $T_1, \ldots, T_m$ are i.i.d. random variables whose law is absolutely continuous with respect to the Lebesgue measure on $\mathcal{P} \times \mathcal{Y}$.

The plug-in estimator of $\mathrm{UCME}_{k,m}^2$ is available as UCME.

CalibrationErrors.UCMEType
UCME(k, testpredictions, testtargets)

Estimator of the unnormalized calibration mean embedding (UCME) with kernel k and sets of testpredictions and testtargets.

Kernel k on the product space of predictions and targets has to be a Kernel from the Julia package KernelFunctions.jl that can be evaluated for inputs that are tuples of predictions and targets.

The number of test predictions and test targets must be the same and at least one.

Details

The estimator is biased and guaranteed to be non-negative. Its sample complexity is $O(mn)$, where $m$ is the number of test locations and $n$ is the total number of samples.

Let $(T_i)_{i=1,\ldots,m}$ be the set of test locations, i.e., test predictions and corresponding targets, and let $(P_{X_j}, Y_j)_{j=1,\ldots,n}$ be a data set of predictions and corresponding targets. The plug-in estimator of $\mathrm{UCME}_{k,m}^2$ is defined as

\[m^{-1} \sum_{i=1}^{m} {\bigg(n^{-1} \sum_{j=1}^n k\big(T_i, (P_{X_j}, Y_j)\big) - \mathbb{E}_{Z \sim P_{X_j}} k\big(T_i, (P_{X_j}, Z)\big)\bigg)}^2.\]

References

Widmann, D., Lindsten, F., & Zachariah, D. (2021). Calibration tests beyond classification. To be presented at ICLR 2021.

source