reconstruction_with_incomplete_data

**Figure 2:** Global temperature Distribution according to the German global weather model.

Measurements of the temperature are given either directly, for example by airplanes and radiosondes, or indirectly, by radiation measured by satellites, see for example NWP Measurements.

All these data are
**incomplete data** in the sense that they do not measure the full temperature distribution.

Wed, Aug 6, 2014. One of the key quesitons of many inverse problems is how to achieve
a reconstruction with **incomplete data**. This problem is important for basically
any inverse problem you might think of. Usually, we only have limited ability to
measure. In medical imaging, you might be able to take a lot of measurements, but
X-Rays are not good for the body, and you try to minimize the radiation humans
get exposed to. So you need to go to “sparse” or “incomplete” data, and the task
is to carry out reconstructions in this particular case.

Let us look at an example from weather forecasting. Figure 1 shows measurements of temperature in Kelvin at 850hPa height, which is approximately 1457 meters above the ground (when the standard atmosphere is used). These measurements are obtained from radiosondes, which are launched every 12 hours usually, from particular stations. You quickly realize that these stations are not at all uniformly distributed over the globe.

How do we calculate the temperature at other places from these measurements? A first
and simple idea might be to use **interpolation**. That has been the first idea which
researchers had in the 1950s of the last century. It leads to algorithms which were
called *optimal interpolation*, since they took all the temperature measurements and
many other types of measurements together and interpolated an “optimal” state estimate.

But is interpolation really good? It does **not** take into account the
distribution of the temperature over the globe. And it does **not** take into account
the dynamics of the atmospheric movement. When you have a cold front passing
through, for example, you do not want to linearly interpolate measurements, since
there is a jump of temperature at the front line, and

- a measurement on the warm side is valid only along the warm contour lines of the front,
- a meeasurement on the cold side is valid only along the colder contour lines!

Modern algorithms in the science of **data assimilation** solves this **dynamic
inverse problem**. It usually needs some dynamical system as a basis, to know the particular
fronts. Ensemble data assimilation as well as four-dimensional variational data assimilation
is able to properly incorporate the knowledge about the temperature distribution, when
the reconstruction of the atmospheric state is carried out.

**Example.** Lets have a look at a simple example. In our figure
we have some temperature front and two meassurements of the temperature.
The black lines are some “ensemble” of temperature distributions reflecting our knowledge
about what we think is there - usually calculated from measurements in the past.
But the measurements (blue diamonds) show that the temperatures at the left
part of the front are too high, and those on the warmer part of the front are too low. If we
would just interpolate, we would loose a lot of our knowledge about the distributions.

The bold red line is the reconstruction based on the knowledge which is implicitly given by all the different black lines here. It uses the covariance distribution which can be estimated using the ensemble of black curves, and then calculates an optimal reconstruction based on this covariance distribution. The red curve is a much better estimate for the “true” front than any interpolation can ever achieve.

**More Details.**
Now, you might want to know how exactly the red line is calculated. I assume that you know
how to deal with vectors and matrices. Then, the whole algorithm can be described
in some lines. Let $x$ be the vector of temperatures, i.e.
$$
x = (t_1, t_2, ..., t_n)^T = \left( \begin{array}{c} t_1 \\ \vdots \\ t_n \end{array} \right)
$$
with $T$ for
the transposed to get a column vector with
the temperatures at $n$ different points $z_1, ..., z_n$. The different ensemble distributions with
$L$ such temperature curves
are put into a matrix $Q = (x^{(1)} - x^{(b)}, ..., x^{(L)}-x^{(b)})$, where $x^{(b)}$ is the
mean of the ensemble members $x^{(\ell)}$, $\ell=1,...,L$.
Then, $Q$ is an $n \times L$ matrix, in detail it is
$$.
Q = \left( \begin{array}{cccc}
\delta t^{(1)}_{1} & \delta t^{(2)}_{1} & \dots & \delta t^{(L)}_{1}\\
\delta t^{(1)}_{2} & \delta t^{(2)}_{2} & \dots & \delta t^{(L)}_{2}\\
\delta t^{(1)}_{3} & \dots & \\
\vdots & &
\end{array} \right)
$$
with $\delta t^{(\ell)}_{j} = x^{(\ell)}_{j} - x^{(b)}_{j}$, $j=1,...,n$, $\ell=1,...,L$.
When a state $x$
is given, we assume that we have $m$ observations ($m=2$ in the image, see blue diamonds),
which can be simulated based on the state $x$ by a matrix $H$, i.e.
$$
y = H*x^{true} + error
$$
with some “true” state $x$ and some random error. Here, when measurements are taken at some locations
given by $z$, the matrix $H$ has either zeros or ones as entries, where the ones indicate the
measurements, i.e.
$$
H = \left( \begin{array}{ccccc}
0 & 1 & 0 & ... & 0 \\
0 & ...& 0 & 1 & 0 \end{array} \right).
$$
We assume that the error distribution of the measurements is described by the covariance matrix $R \in \mathbb{R}^{m \times m}$.
Now, we estimate the covariance matrix $B$ of the background states using the ensemble $Q$ by
$$
B = \frac{1}{L-1} Q Q^{T} \in \mathbb{R}^{n\times n}
$$
and then calculate the reconstruction $x^{(a)}$ by
\begin{equation}
\label{TikUpdate}
x^{(a)} := x^{(b)} + B H^{T} (R + H B H^{T})^{-1} ( y - H*x^{(b)})
\end{equation}
The bold red curve in the image above shows the variable $x^{(a)}$. It is also possible to update
the whole ensemble using the measurements, which is indicated by the thin red lines in the image.
The reconstruction (\ref{TikUpdate}) is also called “the analysis” in data assimilation. The
formula is also known as **Tikhonov Regularization** or **Kalman update formula**. It is the best
fit both to the background state $x^{(b)}$ and to the measurements $y$, the minimizer
of
$$
J(x) := || x - x^{(b)} ||^{2}_{B^{-1}} + || y - H x ||^{2}_{R^{-1}},
$$
where $|| w ||$ is the Euclidean metric or norm, respectively, and the letters $B^{-1}$ and
$R^{-1}$ indicate that we have a weight using these matrices, i.e.
$$
|| w ||^2_{B^{-1}} = w^{T} B^{-1} w.
$$

For more details you might look at the recent survey article by Freitag et al.[1].

Enjoy science!

Roland Potthast

for http://inverseproblems.info.

Roland Potthast, University of Reading (UK) and DWD (German Meteorological Service) r.w.e.potthast@reading.ac.uk

- Freitag, M. and Potthast, R.: Synergy of Inverse Problems and Data Assimilation Techniques in “Large Scale Inverse Problems - Computational Methods and Applications in the Earth Sciences”, Radon Series on Computational and Applied Mathematics 13, Hrsg. v. Cullen, Mike / Freitag, Melina A / Kindermann, Stefan / Scheichl, Robert http://www.degruyter.com/view/product/182025, PDF see http://people.bath.ac.uk/mamamf/FrPo2013.pdf

reconstruction_with_incomplete_data.txt · Last modified: 2014/08/06 21:17 by potthast