A loss function, also called a cost function, is a mathematical mapping that assigns a real number to an event, prediction, or set of variables to represent error, penalty, or negative utility. In basic terms, it quantifies how far a model's output deviates from a desired value and provides the objective that optimization procedures seek to minimize. For background on the formal mathematical setting see mathematical definitions and related resources.
Key properties and role
Loss functions are real-valued and usually nonnegative; they can be convex or nonconvex, smooth or nondifferentiable, and local or global. Convex, differentiable losses are convenient because gradient-based methods can find minima reliably. The choice of loss affects model behavior, robustness to outliers, and whether theoretical guarantees (such as consistency) hold. For notation and formal mapping examples consult function mapping.
Common types
- Mean squared error (MSE): average squared difference between predicted and true values; common in regression.
- Mean absolute error (MAE): average absolute difference; more robust to outliers.
- Cross-entropy / log loss: measures dissimilarity between predicted probabilities and true class labels; used in classification.
- Hinge loss: used with support vector machines to encourage margins.
- Custom and robust losses: Huber loss, Tukey's biweight and others trade sensitivity and robustness.
These examples illustrate the mapping to a real number and are discussed in many applied texts and tutorials on loss design.
Uses in practice
In machine learning, the empirical risk (average loss over training data) is minimized to fit models; regularization terms are often added to penalize complexity. Loss functions also appear in decision theory, econometrics and statistical estimation where they connect to utility and risk. Distinguishing training loss from evaluation metrics is important: a model can be trained with one loss but evaluated with another.
Choosing an appropriate loss involves trade-offs: convexity vs robustness, probabilistic interpretability vs margin-based criteria. Surrogate losses replace non-differentiable objectives to permit efficient optimization. Popular solvers include variants of gradient descent and second-order methods; these exploit differentiability when available. For connections to optimization methods see optimization resources and statistical decision theory.
Historically, loss-based reasoning ties to least squares (Gauss, Legendre) and to likelihood-based inference; modern machine learning expanded the set of practical loss choices. For advanced topics such as proper scoring rules, calibration and adversarial loss design, consult specialized literature further reading.