Convolution#
Definition#
Many image processing results come from a modification of one pixel with respect to its neighbors. When this modification is similar in the entire image \(g\), it can be mathematically defined using a second image \(h\) which defines the neighbor relationships. This results in a third image \(f\). This is the so-called convolution [Jähne 2005, section 4] and it is denoted with \(*\):
Intuitively, the convolution “spreads” each pixel \((m,n)\) in \(g\) following \(h\) and proportionally to the intensity \(g(m,n)\). Fig. 29 gives an example of the computing of a particular pixel.
For the sake of simplicity, the image \(h\) is often:
of odd size (\(3\times3\), \(5\times5\), \(7\times7\)…);
centered, i.e. the pixel with coordinates \((0,0)\) is at the center of the image \(h\).
The image \(h\) is called by different names, depending on the context: filter, mask, kernel, window, pattern or point spread function (PSF).
Some convolution examples are shown above.
\(g\) is an image composed of only four non-zero pixels. \(h\) is a blurry spot. The convolution of \(g\) by \(h\) clearly shows the “spreading” effect: the result \(f\) corresponds to each of the four pixels of \(g\), at the same position as on \(g\), spreading according to the pattern shown on \(h\). Notice that the “spreading” of the two nearby pixels adds up, thus giving a very bright area.
Here, each pixel of \(g\) appears twice in \(f\): the result \(g\) then becomes the image \(f\) which also appears twice. Another interpretation can be done: the result corresponds to the two unique pixels of \(h\) which spread while reproducing the pattern \(g\).
\(g\) is an image made up of pixels of different intensities. Each of these pixels spreads over the result \(f\) following the pattern \(f\). This results in a blurry image. We will see in Filtering that we have applied here a low pass filter on the image \(g\).
Properties#
As a mathematical operation, the convolution has several properties.
The neutral element of convolution is an image filled with zeros but the pixel at the center equals 1.
The convolution is commutative: \(g*h = h*g\).
The convolution is distributive with respect to the addition: \(g*(h_1+h_2) = g*h_1 + g*h_2\).
The convolution is bilinear: \(\alpha (g*h) = (\alpha g) * h = g * (\alpha h)\) (with \(\alpha\in\mathbb{C}\)).
The convolution is associative: \(h_1*(h_2*h_3) = (h_1*h_2)*h_3\).
Boundaries effects#
The convolution formula is not defined on the boundaries of the image: as an example, computing \(f_{1,1}\) in Fig. 29 requires the value of \(g_{0,0}\) which is not defined.
Therefore, one has to make assumptions about the pixel values outside the image. Fig. 33 shows an image with some possibilities to consider the external pixels, and Fig. 34 shows the convolution of these images by a Gaussian.
One can see on the Fig. 34 that the three convolutions are basically identical: only the pixels near the boundaries may be different (darker or brighter on this example). Anyway, there is no perfect choice to set the pixels outside the image, and each choice yields some errors. Also, the best thing is to ensure when acquiring the image that the objects of interest are far from the edges
At last, note that the wrapping hypothesis yields a circular convolution. This is also the result given by a multiplication in the Fourier domain (see Fourier transform).
Separable Convolution#
A separable convolution is when the convolution kernel \(h\) can be written as the convolution of two 1D filters (say \(h_1\) and \(h_2\)) defined along the two axes. Let’s give an example:
Thus, the convolution of an image \(g\) by a separable filter \(h\) can be calculated by first computing the convolution of \(g\) with \(h_1\), then the convolution of the former result with \(h_2\) (or the reverse):
The convolution separability saves computation time because the computation of two 1D convolutions requires less operations than the computation of a 2D convolution.
Proof
Consider two images \(g\) and \(h\) of size \(M \times N\).
On the one side, the computation of one pixel by using the 2D convolution needs \(MN\) multiplications and \(MN-1\) additions. Therefore, computing the convoluted image needs \((MN+MN-1) \times MN\) operations.
On the other side, each element of a 1D convolution along one column needs \(M\) multiplications and \(M-1\) additions. Similarly, each element of a 1D convolution along one row needs \(N\) multiplications and \(N-1\) additions. Therefore, computing the convoluted image needs \(2(M+N-1) \times MN\) operations.
It is easy to see that \(2(M+N-1) \times MN < (MN+MN-1) \times MN\) operations, highlighting the efficiency of the separability.