Linear model functions are linear combinations of arbitrary, generally non-linear basis functions. For such model functions, the minimization problem can also be solved analytically using an extremal approach without iterative approximation steps. First, some simple special cases and examples are shown.
Special case of a simple linear compensation line
Derivation and method
A simple model function with two linear parameters represents the first order polynomial

is displayed. For
given measured values
the coefficients α
and α
the best-fitted straight line are searched. The deviations
between the searched straight line and the respective measured values

are called fitting errors or residuals. We are now looking for the coefficients α
and α
with the smallest sum of the error squares
.
The great advantage of the approach with this square of the errors becomes visible when this minimization is performed mathematically: The sum function is taken as a function of the two variables α
and α
(the incoming measured values are numerical constants in this case), then the derivative (more precisely: partial derivatives) of the function according to these variables (i.e. α
and α
) and finally find the zero of this derivative. This results in the linear system of equations

with the solution
and α
,
where
represents the sum of the deviation products between
and
, and 
represents the sum of the deviation squares of Here
the arithmetic mean of the
values,
correspondingly. The solution for α
can also be obtained in non-centered form using the displacement theorem

can be given. These results can also be derived with functions of a real variable, i.e. without partial derivatives.
Example with a compensation line
In this example, a compensation line of the form
is calculated to represent the relationship between two features of a data set. The data set consists of length and width of ten warships (see warship data). An attempt will be made to relate the latitude to the longitude. The data is shown in the following table in the first three columns. The other columns refer to intermediate results for the calculation of the compensation lines. The variable
shall
denote the length of the warship and
its width. The straight line
is sought for which, if the known values
substituted, the function values are as close as possible to the
known values 
| Warship | Length (m) | Width (m) |  |  | | | | |
|  |  |  |  |  |  |  |  |  |
| 1 | 208 | 21,6 | 40,2 | 3,19 | 128,24 | 1616,04 | 24,88 | 3,28 |
| 2 | 152 | 15,5 | −15,8 | −2,91 | 45,98 | 249,64 | 15,86 | 0,36 |
| 3 | 113 | 10,4 | −54,8 | −8,01 | 438,95 | 3003,04 | 9,57 | −0,83 |
| 4 | 227 | 31,0 | 59,2 | 12,59 | 745,33 | 3504,64 | 27,95 | −3,05 |
| 5 | 137 | 13,0 | −30,8 | −5,41 | 166,63 | 948,64 | 13,44 | 0,44 |
| 6 | 238 | 32,4 | 70,2 | 13,99 | 982,10 | 4928,04 | 29,72 | −2,68 |
| 7 | 178 | 19,0 | 10,2 | 0,59 | 6,02 | 104,04 | 20,05 | 1,05 |
| 8 | 104 | 10,4 | −63,8 | −8,01 | 511,04 | 4070,44 | 8,12 | −2,28 |
| 9 | 191 | 19,0 | 23,2 | 0,59 | 13,69 | 538,24 | 22,14 | 3,14 |
| 10 | 130 | 11,8 | −37,8 | −6,61 | 249,86 | 1428,84 | 12,31 | 0,51 |
| Sum Σ | 1678 | 184,1 | | | 3287,82 | 20391,60 | | |
The compensation line is determined by the coefficients α
and α
, which are calculated as above with


The constants
and
are respectively the mean values of the }
and
measured values, i.e.


As a first intermediate step, the deviation from the mean can now be calculated for each warship:
and
- these values are entered in the fourth and fifth columns of the upper table. The formula for α
thus simplifies to

As a second intermediate step, the products
and
calculated for each warship. These values are entered in the sixth and seventh columns of the table and can now be easily summed. Thus α
be calculated as.

The value of α
can already be interpreted: Assuming that the data are linearly related and can be described by our calculated compensation line, the width of a warship increases by about 0.16 meters for every whole meter it is longer.
The intercept α
is then

Thus, the equation of the balance line is 
For illustration, the data can be plotted as a scatter plot and the balance line inserted. The plot suggests that for our sample data, there is indeed a linear relationship between the length and breadth of a warship. The fit of the points is quite good. As a measure, we can also
consider the deviation
values
predicted by the straight line from the measured values The corresponding values are entered in the eighth and ninth columns of the table. The deviation is 2.1 m on average. Also the coefficient of determination, as a normalized coefficient, gives a value of about 92.2 % (100 % would correspond to a mean deviation of 0 m); for calculation see the example on the coefficient of determination.
However, the negative intercept α
, that in our linear model a warship with a length of 0 meters has a negative width - or warships only begin to exist after a certain minimum length. Compared to reality, this is obviously wrong, which can be taken into account when evaluating a statistical analysis. It is likely that the model is only valid for the range for which measured values are actually available (in this case for warship lengths between 100 m and 240 m), and outside the range a linear function is no longer suitable to represent the data.
Simple polynomial compensation curves
More general than a linear balance line are balance polynomials
,
which are now illustrated by an example (such balancing polynomial approaches can also be solved analytically via an extreme value approach - in addition to the iterative solution).
As results of the microcensus survey by the Federal Statistical Office, the average weights of men by age classes are given (source: Federal Statistical Office, Wiesbaden 2009). For the analysis, the age classes were replaced by the class midpoints. The dependence of the variable weight (
) on the variable age (
) is to be analyzed.
The scatter plot suggests an approximately parabolic relationship between
and
, which can often be well approximated by a polynomial. A polynomial approach of the form

is tried. The solution is the 4th degree polynomial
.
The measurement points deviate on average (standard deviation) 0.19 kg from the model function. Reducing the degree of the polynomial to 3, we obtain the solution

with a mean deviation of 0.22 kg and, for polynomial degree 2, the solution

with a mean deviation of 0.42 kg. As can be seen, when the higher terms are dropped, the coefficients of the lower terms change. The method tries to get the best out of each situation. Accordingly, the missing higher terms are compensated as well as possible with the help of the lower terms until the mathematical optimum is reached. With the second degree polynomial (parabola) the course of the measuring points is still described very well (see figure).
Special case of a linear balance function with several variables
If the model function is a multidimensional polynomial of the first order, i.e., instead of only one variable
several independent model variables x
, a linear function of the form
,
which is applied to the residuals

and via the minimization approach

can be solved.
The general linear case
In the following, the general case of arbitrary linear model functions with arbitrary dimension will be shown. For a given measured value function

with
independent variables be an optimally fitted linear model function

whose quadratic deviation should be minimal.
are the function coordinates, α
the linear parameters to be determined and φ
any linear independent functions chosen to fit the problem.
With
given measuring points

one obtains the fitting errors

or in matrix notation

where the vector 
summarizes the the matrix
the basis function values
, the parameter vector α
the parameters α
and the vector
the observations
where
.
The minimization problem, which can be solved with the help of the Euclidean norm through

can be formulated in the regular case (i.e.
has full column rank, thus
regular and thus invertible) with the formula

can be solved uniquely analytically, as explained in the next section. In the singular case, if
full rank, the system of normal equations is not uniquely solvable, i.e., the parameter α
identifiable (see Theorem of Gauss-Markov#Singular Case, Estimable Functions).
Solution of the minimization problem
Derivation and method
The minimization problem, as shown in the general linear case, results as.

This problem is always solvable. If the matrix has
full rank, the solution is even unique. To determine the extremal point, zeroing the partial derivatives with respect to the α
,

a linear system of normal equations (also Gaussian normal equations or normal equations)

which provides the solution to the minimization problem and, in general, must be solved numerically. If has
full rank and if
, then the matrix is
positive definite, so that the extremum found is indeed a minimum. Thus, solving the minimization problem can be reduced to solving a system of equations. In the simple case of a balance line, its solution can even be given directly as a simple formula, as has been shown.
Alternatively, the normal equations can be expressed in the representation

where ⟨
symbolizes the standard scalar product and can also be understood as the integral of the overlap of the basis functions. The basis functions φ
as vectors φ
to be read with the
discrete grid points at the location of the observations
.
Furthermore, the minimization problem can be well analyzed with a singular value decomposition. This also motivated the expression of the pseudoinverse, a generalization of the normal inverse of a matrix. This then provides a view on non-square linear systems of equations, which allows a not stochastically but algebraically motivated notion of solution.
Numerical treatment of the solution
There are two ways to solve the problem numerically. On the one hand, the normal equations

which are uniquely solvable if the matrix
has full rank. Furthermore, the product sum matrix
the property of being positive definite, so its eigenvalues are all positive. Together with the symmetry of
this can be exploited when using numerical methods to solve it: for example, the Cholesky decomposition or the CG method. Since both methods are strongly influenced by the condition of the matrix, this is sometimes not a recommended approach: If
is already ill-conditioned, then
is quadratically ill-conditioned. This leads to the fact that rounding errors can be amplified to the point of rendering the result useless. However, regularization methods can improve the condition.
One method is the so-called ridge regression, which goes back to Hoerl and Kennard (1970). The English word ridge means as much as ridge, reef, back. Here, instead of the poorly conditioned matrix
the better conditioned matrix
used. Here is
the
-dimensional unit matrix. The trick is to choose the appropriate δ
. Too small δ
increases the condition only slightly, too large δ
leads to biased fitting.
Second, the original minimization problem provides a more stable alternative, since for small values of the minimum it has a condition of the order of the condition of , for
large values of the square of the condition of
. To compute the solution, a QR decomposition is used, generated with Householder transformations or Givens rotations. The basic idea is that orthogonal transformations do not change the Euclidean norm of a vector. Thus

for any orthogonal matrix
. Thus, to solve the problem, a QR decomposition of can be
computed, directly co-transforming the right-hand side. This leads to a form

where
where
is a right upper triangular matrix. Thus, the solution of the problem is obtained by solving the system of equations

The norm of the minimum is then obtained from the remaining components of the transformed right-hand side
since the associated equations
can never be satisfied due to the zero rows in
In statistical regression analysis, when there are several given variables
we speak of multiple linear regression. The most common approach to estimate a multiple linear model is known as ordinary least squares (OLS) estimation. In contrast to the ordinary least squares method, the generalized least squares method (VMKQ) is used in a generalized linear regression model. In this model, the error terms deviate from the distribution assumption such as uncorrelatedness and/or homoscedasticity. In contrast, in multivariate regression, for each observation 
, there are
many
values, so instead of one vector, there is an
matrix
(see Generalized Linear Model). Linear regression models have been extensively studied in statistics in terms of probabilistic theory. Especially in econometrics, for example, complex recursively defined linear structural equations are analyzed to model economic systems.
Problems with constraints
Often additional information to the parameters is known, which is formulated by constraints, which are then in equation or inequality form. Equations appear, for example, when certain data points are to be interpolated. Inequalities appear more frequently, usually in the form of intervals for individual parameters. In the introductory example, the spring constant was mentioned, this is always greater than zero and can always be estimated upwards for the concrete case considered.
In the equation case, given a reasonably posed problem, these can be used to transform the original minimization problem into one of a lower dimension whose solution automatically satisfies the constraints.
The inequality case is more difficult. Here the problem arises with linear inequalities
with
, 
where the inequalities are meant component-wise. This problem is uniquely solvable as a convex and quadratic optimization problem and can be approached, for example, with methods for solving such.
Quadratic inequalities arise, for example, when using a Tykhonov regularization to solve integral equations. The solvability is not always given here. The numerical solution can be done, for example, with special QR decompositions.