Bayesian Decision Theory Framework with Costs

The setup is as follow:

Random feature vectors $x$ .
State space $Ω = {ω_{1}, ω_{2}, \dots, ω_{c}}$ , $c$ possible classes.
Decision space $A = {α_{1}, α_{2}, \dots α_{k}}$ , where we have $k$ possible decisions.
Loss function $λ (α_{i}, ω_{j})$ , the cost of deciding $α_{i}$ when the true state is $ω_{j}$ .

The goal is to minimize the expected loss. We define the conditional risk for one sample as:

R (α_{i}, x) = E [λ (α_{i}, ω_{j}) ∣ x] = j = 1 \sum c λ (α_{i}, ω_{j}) P (ω_{j} ∣ x)

Now, we need to calculate the overall risk for all the samples.

R (α) = \int R (α (x) ∣ x) p (x) d x

Note that $α (x)$ here is a decision rule that maps each $x$ to a decision. Next, to find the optimal decision rule $α^{*} (x)$ ,

α^{*} = ar g α (\cdot) min R (α) = ar g α (\cdot) min \int R (α (x) ∣ x) p (x) d x

This might be hard to calculate, so we minimize the integral by minimizing the integrand at each pooint. This is called Minimal Risk Decision.

α^{*} (x) = ar g i = 1, \dots, k min R (α_{i} ∣ x) for each x

To calculate $R (α_{i} ∣ x)$ , same as the above equation actually.

R (α_{i} ∣ x) = j = 1 \sum c λ (α_{i}, ω_{j}) \frac{p ( x ∣ ω _{j} ) P ( ω _{j} )}{\sum _{i = 1}^{c} p ( x ∣ ω _{i} ) P ( ω _{i} )}

Messy Notes