Lecture 2-4: Least squares approximations

Lecture 2.4:
Least squares approximations

If the data set is corrupted by measurement noise, i.e. the data values are rounded roughly, the numerical interpolation would produce a very stiff curve with sharp corners and large interpolation error. Another technique is used instead: it is based on numerical approximation of given data values. The figure below shows numerical approximation of more than hundred of data values by a cubic polynomial (click the image to enlarge):

Numerical approximation works when the data values reproduce a simple characteristic, e.g. a polynomial of lower order or an exponential function. Such characteristics are usually obtained as solutions of theoretical models (differential equations). Therefore, the methods of numerical approximation give a tool to compare theoretical models and real data samples.

We shall study the least squares numerical approximation. This is the problem to find the best fit function y = f(x) that passes close to the data sample: (x₁,y₁), (x₂,y₂), ..., (x_n,y_n) such that a total square error between the data values and the approximation is minimized, where the total square error is

E = (y₁ - f(x₁))² + (y₂ - f(x₂))² + ... + (y_n - f(x_n))²,

Linear least squares fit

The linear least squares fit or linear regression is the linear function y = f(x) = ax + b, where the coefficients a and b are computed from statistical parameters of the data sample:

f(x) = y_mean + S_xy (x - x_mean) / S_xx,

where x_mean is the mean of x-values, y_mean is the mean of y-values, S_xx is the variance of x-values, and S_xy is the covariance between x and y values.

Exponential least squares fit

The exponential function y = c exp(d x), where c and d are constants, can be equivalently rewritten as a linear function:

log(y) = d x + log(c)

If the data sample is given for (x_k,y_k), then (i) compute the data for X_k = x_k and Y_k = log(y_k), (ii) find the coefficients a and b of the linear least square fit through the data sample (X_k,Y_k), and (iii) compute coefficients c and d as c = exp(b) and d = a (follow for these elementary transformations!).

Polynomial least square fit

One can try to match coefficients of the polynomial least squares fit by solving a linear system. The linear system is obtained by minimizing the total square error E. However, the linear system is ill-conditioned for large n (number of data points). As a result, a better method is used in practice. The method consists of two phases: (i) constructing a set of orthogonal polynomials through the given data points (x_k,y_k) and (ii) computing coefficients of the linear combination of orthogonal polynomials from a well-conditioned linear system.