Image denoising with wavelets using Markov random fields

Material Information

Image denoising with wavelets using Markov random fields
Ravindra, Vishal C
Publication Date:
Physical Description:
xi, 84 leaves : ; 28 cm

Thesis/Dissertation Information

Master's ( Master of Science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Electrical Engineering, CU Denver
Degree Disciplines:
Electrical engineering


Subjects / Keywords:
Wavelets (Mathematics) ( lcsh )
Electronic noise ( lcsh )
Image processing -- Digital techniques ( lcsh )
Electronic noise ( fast )
Image processing -- Digital techniques ( fast )
Wavelets (Mathematics) ( fast )
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )


Includes bibliographical references (leaf 84).
General Note:
Department of Electrical Engineering
Statement of Responsibility:
by Vishal C. Ravindra.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
63755067 ( OCLC )
LD1193.E54 2005m R38 ( lcc )

Full Text
Vishal C. Ravindra
B.E., University of Mysore, India, 2001
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Electrical Engineering

This thesis for the Master of Science
degree by
Vishal C.Ravindra
has been approved
Jan Bialaseiwicz

Ravindra, Vishal C. (M.S., Electrical Engineering)
Image Denoising with Wavelets using Markov Random Fields
Thesis directed by Professor Jan Bialaseiwicz
This thesis describes a method for the suppression of noise in images via the
wavelet transform. The method relies on two measures. The first is a classic
measure of the smoothness of the image and is based on an approximation of
the local Lipschitz exponent obtained via the wavelet coefficients. The second,
relatively new measure takes into account a priori geometrical constraints of the
image, which are generally valid for natural images. The smoothness measure
and the geometrical constraints are combined in a Bayesian probabilistic
framework, and are implemented as a Markov Random Field image model. The
manipulation of the wavelet coefficients is consequently based on the obtained
This abstract accurately represents the content of the candidates thesis. I
recommend its publication.

I dedicate this thesis to my parents, my sister, and other family members without
whose understanding and support, I would not be in Graduate School.

My thanks to my advisor, Dr. Jan Bialaseiwicz, for introducing me to the world of
Wavelets, and for his guidance and patience with me for this past year. I also wish
to thank the staff and faculty of the Graduate School for their support and

Figures............................................................ ix
Tables............................................................. xi
1. Introduction..................................................... 1
2. Brief Review on Wavelets......................................... 3
2.1 Transforms...................................................... 3
2.2 Basis Functions................................................ 3
2.3 Fourier Transforms and the Time-Frequency Wedding.............. 4
2.3.1 Windowed Fourier Transform.................................... 6
2.4 Wavelet Transforms............................................. 8
2.4.1 Wavelet Admissibility Condition............................... 9
2.4.2 Scaling Function.............................................. 9
2.5 Frames and Bases.............................................. 10
2.5.1 Pseudo Inverse............................................. 11
2.5.2 Partial Reconstruction....................................... 12
2.6 Wavelet Frames................................................ 12
2.7 Wavelet Bases................................................. 13
2.8 Continuous Wavelet Transform.................................. 14

2.9 Dyadic Wavelet Transform....................................... 14
2.10 Wavelet Design................................................ 15
2.10.1 Reconstruction Wavelets...................................... 15
2.11 Spline Dyadic Wavelets........................................ 16
3. Algorithme a Trous............................................. 20
4. Singularity Detection and Lipschitz Regularity................. 28
4.1 Wavelet Vanishing Moments.................................... 30
5. Wavelet Decomposition and Notations.............................. 33
5.1 Orthogonal Wavelet Bases and Multiresolution Approximations... 35
6. The Proposed Method.............................................. 38
6.1 Notations....................................................... 39
6.2 Bayesian Probability........................................... 40
7. Markov Random Fields............................................. 43
7.1 Introduction.................................................... 43
7.2 Markov Random Fields and Gibbs Random Fields.................... 44
7.2.1 Gibbs Random Fields........................................... 47
7.2.2 MRF-GRF Equivalence........................................... 48
7.3 Regularity Estimation........................................... 49
7.4 Lipschitz Threshold and Initial Mask Selection.................. 50
8. Conditional Probability Model.................................... 53

8.1 Stochastic Sampling.............................................. 53
9. Monte Carlo Methods for Stochastic Sampling....................... 55
9.1 Metropolis-Rosenbluth-Rosenbluth-Teller-Teller Algorithm......... 55
9.2 Image Reconstruction............................................ 63
10. Conclusion....................................................... 66
A. Algorithme a Trous MATLAB Program................................ 67
B. Metropolis Sampler MATLAB Program................................ 77
C. Variance for Lipschitz Estimator Thresholding MATLAB
Program........................................................... 79
D. Circular Convolution, Leftshift, and Zero-padding MATLAB
Programs.......................................................... 81
E. Notations Used................................................... 83
References........................................................... 84

2.1 Fleisenberg box for time-frequency atoms....................... 5
2.2 Biorthogonal spline wavelet................................... 18
2.3 Biorthogonal cubic B-spline scaling function.................. 18
2.4 Decomposition and reconstruction filters for quadratic spline. 19
3.1 The original house 256X256 bitmap image...................... 23
3.2 The noisy house bitmap image (SNR=5dB)...................... 24
3.3 Approximations, horizontal and vertical details for wavelet
decomposition for levels 1 and 2............................... 25
3.4 Decomposition filter bank for the Algorithme a Trous.......... 26
3.5 Reconstruction filter bank for the Algorithme a Trous......... 27
7.1 (a) First order neighbourhood (b) second order (c) fifth order,
X is the center point (d) single state (e) two state (f) two state,
and (g) three states neighbourhoods.............................. 46
7.2 Initial masks for horizontal and vertical components obtained by
thresholding Lipschitz estimator with a threshold T= 1.25...... 52
9.1 Metropolis sampler............................................... 58
9.2 Horizontal mask after 1 iteration of Metropolis algorithm....... 59
9.3 Vertical mask after 1 iteration of Metropolis algorithm......... 60
9.4 Modified horizontal details after shrinkage factors have
been applied to each coefficient for levels 1 and 2............ 61

9.5 Modified vertical details after shrinkage factors have
been applied to each coefficient for levels 1 and 2............... 62
9.6 Approximations, horizontal and vertical details after wavelet
reconstruction for levels 1 and 2................................. 64
9.7 Input noisy image, reconstructed image, and the
original clean house image...................................... 65

E. 1. Notations Used............................................... 83

1. Introduction
Although image acquisition techniques yield an ever improving quality, there is a
need for post-processing methods to remove noise from images. The problem of
suppressing noise in digital images is based on the model
where, / denotes the true noise-firee pixels, y denotes the observed noisy pixels
and n the observed noise. The goal is to find an optimal approximation to/, based
on y and possibly knowledge about the noise. The method used in this thesis, is
not specific for a particular type of noise, however, it will deal with white noise
for simplicity. The ability to suppress noise is valuable in many applications, such
as suppressing of motion blur in aerial imaging, and the noise in the visible
channel due to fog and clouds. Denoising (noise suppression) is done at the time
of acquisition or by post-processing methods.
A number of methods for suppression of noise in signals and images based
on the wavelet transform have been proposed and worked on in the past few
years. The common idea is to compute the wavelet decomposition of the noisy
image, and then to modify the wavelet coefficients in a specific way, so that a
relatively clean image is obtained after reconstruction. The simplest approach is to
replace the coefficients that are supposed to be affected by noise with zero, or by
any other suitable value. In the wavelet shrinkage technique (Donoho and

Johnstone), the wavelet coefficients that have an absolute value below a threshold
are replaced by zero. The other coefficients are reduced in absolute value. In the
approach of (Xu, Weaver, et at), the criterion to distinguish noise from
meaningful signal is based on the observation that wavelet coefficients of noise
have a much weaker correlation between scales than the coefficients of a noiseless
image. The method of (Mallat, et at) is based on the assumption that the noiseless
image is regular and the noisy image is irregular. It exploits the functions local
regularity parameters that can be obtained from the wavelet coefficients.
This thesis describes a relatively new method that makes use of the
following two aspects:
The selection of coefficients is based on an estimation of local regularity. It
additionally takes into consideration a priori geometrical information of the
Whereas most methods divide the coefficients into a noisy and clean category,
and apply a different manipulation to each category, this method computes for
each coefficient the probability that it is affected (unaffected) by noise. The
manipulation of the coefficients is then based on this probability. This method
combines the wavelet multiresolution concept with a Markov Random Field
(MRF) model. In addition, the method describes a novel Bayesian approach for
wavelet coefficients, (Malfait, Roose).

2. Brief Review on Wavelets
2.1 Transforms
When data of any kind is recorded, it hardly ever reveals its true nature from our
standpoint. Transforms are made on this data to perform the task of changing the
standpoint. A basic transform in the physical world is, for example, a transform
from the time-domain to the frequency-domain and vice-versa. Our ears are built
is such a way so as to identify the frequencies and the phases of the sound signal.
The frequencies are in turn used to analyze the message, and the phases carry
information, which is mostly used to spatial hearing (low frequencies).
Transforms thus have a high analytical value.
2.2 Basis Functions
The idea of having a basis function is that any signal can be expressed as a
weighted sum of a family of functions. These functions are the basis functions of
the transform. Depending upon the features of the transform, the basis functions
have to fulfill different requirements. Some of the requirements are listed as
Orthogonality: From an analysis point of view, it is practical that a set of
functions is orthogonal. For a given set of functions, orthogonality on
a given interval a 3

functions is zero. Orthogonal basis functions describe the signal explicitly. In
other words, there exists only one possible transform for each signal when such
functions are used. They are a good choice for compression tasks.
Redundancy: Not all transforms aim to analyze the signal afterwards. If we want
to protect data against errors, it might be useful to add redundancy. In these cases,
the choice of basis functions leads to redundancy, where only certain basis
functions exist, and others are forbidden. In the case of an error, the basis function
can be corrected to the nearest legal one according to the distance metric of the
function space.
Compact support: Compact support means that wherever the function is not
defined it will have a value of zero. In numerical calculation, this is an advantage
as all known zeros speed things up in calculations. Compact support finds
applications in wavelet transforms, as it tells about the locality of wavelets in the
time domain.
2.3 Fourier Transforms and the Time-Frequency Wedding
The Fourier Transform is the most well-known frequency domain analysis tool.
Fourier transforms provide a way to convert samples of a standard time-series into
the frequency domain. This provides a dual representation of the function, in
which certain operations become easier than in the time domain. Applications of
Fourier transforms include filtering, image compression, convolution and
deconvolution, and computing the correlations of functions. The uncertainty

principle states that the energy spread of a function and its Fourier transform
cannot be arbitrarily small. In 1946, the physicist Gabor defined elementary time-
frequency atoms as waveforms that have a minimal spread in a time-frequency
plane. To measure time-frequency information content he proposed
decomposing signals over these elementary waveforms. Windowed Fourier
transforms and wavelet transforms (two prime examples of time-frequency
transforms) are computed by decomposing the signal over different families of
time-frequency atoms. A linear time-frequency transform correlates the signal
with a family of waveforms that are well concentrated in time and in frequency.
These waveforms are called time-frequency atoms, as illustrated in Figure 2.1.
Let us consider a general family of time-frequency atoms {fr j^r, where y might
be a multi-index parameter. Also, we suppose that r eL2(i?) and that ||^r || = 1.
The corresponding linear time-frequency operator T associates to any / e L2 (R)

Tf(r)= //('Â¥; = -oo
The Parsevals formula (Mallat, 2001) proves that
oo 1 oo
T/0) = jfiOfy (0 = \/(<*>)% (co)dco. (2.2)
-00 -00
If y (t) is nearly zero when t is outside a neighbourhood of an abscissa u then
depends only on the values of / in this neighbourhood. Similarly, if
r (co) is negligible for m far from£, then the right integral of (2.2) proves that
(.f,(f>r ) reveals the properties of / in the neighbourhood of£ (Mallat, 2001).
2.3.1 Windowed Fourier Transform
A Windowed Fourier atom is constructed by translating in time by u and
modulating at the frequency £, a time widow g:
g,^t) = e,9g(f-u) (2.3)
It is normalized ||g|| = 1 so that the resulting Windowed Fourier transform of
/ e L2(/?)is
S/(u,4) = (f,gu.f)= \f(t)g(t-u)e~,9dt. (2.4)
This transform is called the Windowed Fourier transform because the
multiplication by g{t u) localizes the Fourier integral in the time window of a
neighbourhood around /=. The resolution in time and frequency of the

Windowed Fourier transform depends on the spread of the window in time and
frequency (Mallat, 2001).
For image analysis, the Fourier transform identifies the spectral
components present in an image but it does not provide information as to where
certain components occur within the image. If we are only interested in stationary
signals, the Fourier transform provides complete spectral analysis because all
spectral components exist at all times. However, if we are interested in non-
stationary signals with transient phenomena, the Fourier transform would only
give us the spectral components within the image but not where they are located.
The wavelet transform or wavelet analysis is probably the most recent solution to
overcome the shortcomings of the Fourier transform. In wavelet analysis a fully
scalable modulated window is shifted along the signal and for every position the
spectrum is calculated. Then this process is repeated many times with a slightly
shorter (or longer) window for every new cycle. The result is a collection of time-
frequency representations of the signal, all with different resolutions or frequency
bands. This collection of resolutions results in multi-resolution analysis. In case of
wavelets we normally do not speak about time-frequency representations but of
time-scale representations, where small scales correspond to high frequencies and
large scales correspond to low frequencies. In fact, the human visual system
performs hierarchical edge detection at multiple levels of resolution and wavelet
transforms perform a similar multi-resolution analysis.

2.4 Wavelet Transforms
Like a windowed Fourier transform, a wavelet transform can measure the time-
frequency variations of spectral components, but it has a different time-frequency
resolution. To analyze signal structures of very different sizes, it is necessary to
use time-frequency atoms with different time supports, it is achieved by varying
the scale s. The wavelet transform decomposes signals over dilated and translated
wavelets. A wavelet is a function e L (R) with a zero average:
f ViOdt = 0
It is normalized 1^1 = 1, and centered in the neighbourhood of t=0. A family of
time-frequency atoms is obtained by scaling y/ by s and translating it by u:
y/uAt) = -j=Â¥
yJS K s J
These atoms remain normalized for all scales. The wavelet transform of / at the
time u and scale s is
The wavelet transform can be rewritten as a convolution product:
wf(u,s)= =

Larger scales correspond to dilated wavelets and therefore lower frequencies.
Smaller scales correspond to compressed wavelets and therefore higher
frequencies. Since y/ has a zero average and unity norm, the above wavelet
integral measures the variation of / in a neighbourhood of u, whose size is
proportional to s. When the scale s goes to zero, it can be proved that the decay of
the wavelet coefficients characterizes the regularity of/in the neighbourhood of u
(Mallat, 2001). The resulting wavelet coefficients are known as the details of the
signal because the wavelet is considered as a high pass filter. In order to perfectly
reconstruct the signal from the wavelet coefficients, the low-frequency
approximation must also be added to the high-frequency details (Mallat, 2001).
2.4.1 Wavelet Admissibility Condition (Mallat, 2001)
2.4.2 Scaling Function
When Wf(u,s) is known only for s < s0, i.e., for high frequencies, to recover / we
need a complement of information corresponding to s > s01 i.e., for low
frequencies. This is obtained by introducing a scaling function that is an
aggregation of wavelets at scales larger than s0, where s0 corresponds to the

mother wavelet^. The scaling function can be interpreted as the impulse
response of a low-pass filter (proof can be found in Mallat, 2001). The low-
frequency approximation of/ at the scale s is
function ^(/) e LJ{R), to be a wavelet (Mallat, 2001).
2.5 Frames and Bases
Frame theory analyzes the completeness, stability and redundancy of linear
discrete signal representations. A frame is a family of vectors {n} ner that
characterizes any signal/from its inner products {{/,)} ner
Definition (Mallat, 2001): The sequence { constants A>0 and B>0 such that for any fe H
The original signal can be recovered from the following equation:
lim 2
where CÂ¥ is the admissibility condition,
When A=B the frame is said to be tight.

If the frame condition is satisfied, then U in equation (2.15) is called a frame
V6T,t//[] = (/,<*). (2.15)
A frame thus defines a complete and stable signal representation, which may also
be redundant. When the frame vectors are normalized ||^n|| = 1, this redundancy is
measured by the frame bounds A and B. If the frame vectors {} gr are linearly
independent then it can be proved that
A<\< B.
The frame is an orthonormal basis if and only if A =5=1. This can be verified by
inserting f=$ in equation (2.15) of the definition of a frame (Mallat, 2001).
2.5.1 Pseudo Inverse
The reconstruction of / from its frame coefficients Uf[ri\ is calculated with a
pseudo inverse. This pseudo inverse is a bounded operator that is expressed with a
dual frame. I2(T) is the space of vectors such that ||x||2 = Xnerl^M2 < 00 anc* ^
ImU the image space of all Uf with fe H.
Proposition (Mallat, 2001): Ij {(j)n} neT is a frame whose vectors are linearly
independent, then ImU is strictly included in I2(r), and U admits an infinite
number of left inverses U~':
V/ e H,U~'Uf = f. (2-16)

The pseudo inverse of a frame operator is related to a dual frame family, which is
specified by the following theorem.
Theorem 2.1 (Mallat, 2001): Suppose that {tf>n} ner is a frame with frame bounds
A and B. Letn =(U* U)~x n. For any fe H ,
2.5.2 Partial Reconstruction
Suppose that { inner products Uf[ri\ = ) give partial information on/that does not allow us
to fully recover f The best linear mean square approximation of/computed from
these inner products is the orthogonal projection of / on the space V (Mallat,
2.6 Wavelet Frames
Wavelet frames are constructed by sampling the time and scale parameters of a
P,f = U-'Uf = (Mallat, 2001)
continuous wavelet transform. A real continuous wavelet transform of / e L2(i?)
is defined in previous sections by

Where, y/ is a real wavelet and
Intuitively, to construct a frame we need to cover the time-frequency plane with
the Heisenberg boxes of the discrete wavelet family. A wavelet y/us has energy in
time that is centered at u over a domain proportional to s. Over positive
frequencies, its Fourier transform y/us has a support centered at frequency q/s,
with a spread proportional to 1/s. To obtain a full cover, we sample s along an
exponential sequence {c/}, with a sufficiently small dilation step a> 1. In this
thesis, a=2 is used. The time translation u is sampled uniformly at intervals
proportional to the scale 2/. Let us denote
2.7 Wavelet Bases
One can construct wavelets y/ such that the dilated and translated family of the

t -nu0a
mother wavelet can be expressed as an orthonormal basis of L2 (if).

Orthogonal wavelets dilated by 27 carry signal variations at resolution 2y
(Mallat, 2001).
2.8 Continuous Wavelet Transform (Mallat, 2001)
The wavelet transform can be written as a convolution product:
with, y/s(t) = r=\f/ Continuous wavelet transforms and windowed Fourier
4 s \ s )
transforms provide translation-invariant representations (Mallat, 2001).
2.9 Dyadic Wavelet Transform
In computer programs, an input image is a discrete signal made up of pixels
defined within a certain resolution. Therefore, the wavelet transform cannot be
computed at an arbitrary fine scale. To construct a translation-invariant wavelet
representation, the scale s is discretized but not the translation parameter u. The
scale is sampled along a dyadic sequence^7}, to simplify the numerical
calculations (Mallat, 2001). The dyadic wavelet transform offe L2(R)is defined
1 f-t

V'2j(t) = V'2j (-0 =
\2' j
2.10 Wavelet Design
A discrete dyadic wavelet transform can be computed with a fast filter bank
algorithm if the wavelet is appropriately designed. Let h and g be a pair of finite
impulse response (FIR) filters. Suppose that h is a low-pass filter whose transfer
function satisfies h(0) = -Jl. We construct a scaling function whose Fourier
transform is
k p=i
1 A Q) A 0)
V2 2 2
The corresponding wavelet y/ has a Fourier transform defined by (Mallat, 2001)
1 A 1 'co'
= f28 a, h <2;
2.10.1 Reconstruction Wavelets
Reconstruction wavelets are calculated with a pair of finite impulse response
filters h and g The following Fourier transform then has the following energy:
r,o)~ a

~ 1 ^
^(^)=V2 8

2.11 Spline Dyadic Wavelets
In the Conjugate Mirror Filter (CMF) algorithm, the continuous wavelet and
scaling functions are replaced with discrete high pass and low pass filters
respectively. The following equations show the relation between the discrete
filters and their continuous counterparts (Mallat, 2001):


wri k}*-*}
The high and the low pass filters g[ri\ and h[n], respectively, are related in the
following way:
g[n] = (-irnh[\-n]
A box spline of degree m is defined as a translation of m+1 convolutions of 1[(U]
with itself. It is centered at t= if m is even and at /= 0 if m is odd. Its Fourier
transform is

( co^
- 0 T
With e = 1 if m is even, and 0 if m is odd. (2.33)
So the Fourier transform of the low pass filter h is
A( {(0)
eio /
l 2j
We construct a wavelet that has one vanishing moment by choosing
g{co) = O(co) in the neighbourhood of co = 0. The Fourier transform of the high
pass filter is
gipy) = -iyfle 2 sin ^
The Fourier transform of the resulting wavelet is
1 J co
i y
/ \m+2
' CO
l 7 ,
To design dual scaling functions and wavelets^, which are splines, we
choose h = h. As a consequence, it can be proved that ^ ^ and
g() =

(Mallat, 2001)

Figures 2.2 and 2.3 show the resulting cubic spline y/ and corresponding scaling
function form=3:
Figure 2.2 Biorthogonal spline wavelet
Figure 2.3 Biorthogonal cubic B-spline scaling function
In this thesis, quadratic spline wavelets have been used for the wavelet
decomposition, for reasons explained in Chapter 4. Figure 2.4 gives the
corresponding filters for m=2. The program in Appendix A uses the filters
illustrated in Figure 2.4, for wavelet decomposition and reconstruction, for levels
1 and 2. Appendix A also illustrates lucidly how the filters are used for circular
convolutions with rows and columns of the image.

n h[n]/*j2 h[n]/yfl g[n]/J2 g[]/V2
-2 -0.03125
-1 0.125 0.125 -0.21875
0 0.375 0.375 -0.5 -0.6875
1 0.375 0.375 0.5 0.6875
2 0.125 0.125 0.21875
3 0.03125
Figure 2.4 Decomposition and reconstruction filters for quadratic spline

3. Algorithme a Trous
A fast dyadic wavelet transform is calculated with a filter bank algorithm called in
French the Algorithme a Trous , (introduced by Holschneider, et at). It is similar
to a fast biorthogonal wavelet transform, without subsampling. It was originally
developed for music synthesis using non-orthogonal wavelets but it is now of
particular interest in image processing applications (Shensa, 1992). The main
difference between the Conjugate Mirror Filter (CMF, Mallat, 2001) algorithm
and the Algorithme a Trous besides the type of filter is in the implementation.
During decomposition, at every scale, the filters are dilated by inserting 2J -1
zeros between each sample of the filter. Inserting zeros in the filters creates holes
{Trous in French). This procedure is illustrated in Appendix A by the dyadup and
dyaddown MATLAB functions.
Suppose that the scaling functions and wavelets ,y/, and ij/ are designed
with the filters h, g, h and g The fast dyadic wavelet transform is implemented
using filter banks. The samples of the discrete input signal aa[n\ are considered as
averages of some function/ weighed by the scaling kernels {t n):
0o M = (/(0 M n)) = \f{t){t n)dt. (3.1)

For any j> 0, we denote
[] = {/().* ((-)> (3.2)
The dyadic wavelet coefficients are computed fory>0 over the integer grid
d, M = Wf (n,V ) = {f(t\(<- n)) (3-3)
For any given filter jc[h], we denote by Xj [] the filters obtained by inserting
27 -1 zeros between each sample ofx[]. The Algorithme a Trous computes
the fast dyadic wavelet transform in the following way:
For any j>0,
= <*j *hj[n],dj+l[n] = a, *gj[n], (3.4)
<*j [] = \ (a7+1 hj ["] + dj+1 8j []) (3.5)
Only horizontal and vertical details d* and d*, respectively and an
approximation, cij of the image at each level are created. The dyadic wavelet
representation of a0 consists of a set of wavelet coefficients up to a scale 2J plus
the remaining low-frequency information :
It is computed from a0 by cascading the convolutions in the equations (3.4 and
3.5) for 0 < j < J, as illustrated in the Figure 3.4. The dyadic wavelet transform is
calculated with this filter bank algorithm. The original signal a0 is recovered

from its wavelet representation by iterating [n] for J > j> 0, as illustrated in
the Figure 3.5 (Mallat, 2001). Separable circular convolutions are used instead of
separable straight convolutions to minimize border problems. This has the effect
of periodizing the image both in the horizontal and vertical directions (Mallat,
2001). Appendix A illustrates the procedure for wavelet decomposition of the
image to obtain horizontal and vertical details for levels 1 and 2. The test image
used is the house bitmap image, as shown in Figure 3.1. Other formats of
images such as .jpg, .gif, etc. may also be used in this program. Noise,
generated by software means, is then added to this original clean image, as shown
in the program in Appendix A. The noise added by this software, for example has
a Signal to Noise Ratio (SNR) of 5dB. As Figure 3.2 shows, the original clean
image is definitely affected by noise. The goal of this thesis is the noise
suppression of this noisy image, which is described in the ensuing sections. The
horizontal and vertical details, of the noisy house bitmap image obtained after
decomposition for 2 levels can be seen in Figure 3.3.

Original Image
Figure 3.1 The original house 256X256 bitmap image.
(It is a good test image as it gives an elegant illustration of horizontal and vertical

*- ?"3&*w£
^ M.'#v,x v_ ^
100 150
Noisy Image,SNR=5dB
Figure 3.2 The noisy house" bitmap image (SNR=5dB).
(This image clearly illustrates the effect of the software generated noise added to
the clean image in Figure 2.1, and all the wavelet decompositions and the
reconstructions are henceforth performed on this image)

Approximation: level 1
Horizontal: level 1 Vertical: level 1
Approximation: level 2
Horizontal: level 2
Vertical: level 2
Figure 3.3 Approximations, horizontal and vertical details for wavelet
decomposition for levels 1 and 2.

Figure 3.4 Decomposition filter bank for the Algorithme a Trous
(The dyadic wavelet coefficients are computed by cascading convolutions with
dilated filters h and g)

Figure 3.5 Reconstruction filter bank for the Algorithme a Trous
(The original signal is reconstructed through convolutions with hjandgj. A
multiplication by 1/4 is necessary to recover the next finer scale signal a..)

4. Singularity Detection and Lipschitz Regularity
We now focus our attention on the detection of noisy wavelet coefficients. A
wavelet transform can focus on localized signal structures with a zooming
procedure that progressively reduces the scale parameter. Singularities and
irregular structures often carry essential information in a signal. For example,
discontinuities in the intensity of an image indicate the presence of edges in the
scene. In electrocardiograms or radar signals, interesting information also lies in
sharp transitions. Local signal regularity is characterized by the decay of the
wavelet transform amplitudes across scales. Singularities and edges are thereby
detected by following the wavelet transform local maxima at fine scales.
To characterize singular signal structures, it is necessary to precisely
quantify the local regularity of a signal /(t). Lipschitz exponents provide uniform
regularity measurements over time intervals, but also at any point v. If / has a
singularity at v, which means that it is not differentiable at v, then the Lipschitz
exponent at v characterizes this singular behaviour.
Definition of the Lipschitz exponent
The Taylor formula relates the differentiability of a signal to local polynomial
approximations. Suppose that/is m times differentiable in a neighbourhood [v-h,

Let pv be the Taylor polynomial expansion of/in the neighbourhood of v:
The approximation error is
*,(0 = /(0-Pv(0
Lipschitz exponents are also known as H o Ider exponents in some literature.
Definition (Lipschitz):
A function f is pointwise Lipschitz a > 0, at v, if there exist K>0, and a
A function f is uniformly Lipschitz a over if it [a,b]satisfies equation (4.3) for
allv e [a, b], with a constant K that is independent of v.
The Lipschitz regularity of f at v or over [a,b] is the sup of a such that f is
Lipschitz a.
Lipschitz exponents provide a global measurement of regularity, which applies to
a whole interval. If/is uniformly Lipschitz a>m in the neighborhood of v, then
one can verify that / is necessarily m times continuously differentiable in that
neighbourhood. If 0 < a < 1 then pv (t) = /(v) and the Lipschitz condition (4.3)
polynomial pv of degree m= \_a\ such that
VteR, |/(0 pv (/)| < K\t v|

A function that is bounded but discontinuous at v is Lipschitz 0 at v. If the
Lipschitz regularity is a < 1 at v, then / is not differentiable at v and a
characterizes the singularity type. In general, one can say that the more regular the
function is the higher is its Lipschitz exponent (Mallat, 2001).
4.1 Wavelet Vanishing Moments
To measure the local regularity of a signal, it is not so important to use a wavelet
with a narrow frequency support, but however vanishing moments are crucial. If
the wavelet has n vanishing moments then it can be shown that the wavelet
transform can be interpreted as a multiscale differential operator of order n. The
Lipschitz property' approximates / with a polynomial pv in the neighbourhood of
A wavelet transform estimates the exponent a by ignoring the polynomial pv. For
this purpose, we use a wavelet that has n>a vanishing moments:
The decay of the wavelet transform amplitude across scales is related to the
uniform and pointwise Lipschitz regularity of the signal. Measuring the
f(0 = Pv (0 + e¥ (t) with \ev (f)| < K\t v|

asymptotic decay is equivalent to zooming into signal structures with a scale that
goes to zero (Mallat, 2001).
When the algorithm in this thesis is implemented, it is important to
carefully choose the wavelet type. Every wavelet from classic wavelet theory has
at least one vanishing moment, since its integral must equal to zero (the average
of a mother wavelet is zero). The following Theorem 4.1, proven in Mallat, 2001,
is used for denoising and it demonstrates that it is possible to characterize local
Lipschitz regularity with an exponent zero or lower.
Theorem4.1 (Daubechies): Suppose that y/ has n>a+1 vanishing moments and
that jjy/(t)|<# and exist. If for some e>0
then f is continuous with exponent a at a point v.
This theorem implies that the local Lipschitz exponent of a function can be
calculated by the behaviour of the wavelet coefficients through successive scales.
For instance, if the wavelet coefficients tend to grow as the scale decreases, then
the Lipschitz exponent is less than -0.5. This principle is used in our algorithm to
obtain the Lipschitz estimator (Section 7.3), to initially detect noisy coefficients.
Since the goal is to separate noise from the noise-free image and the worst
irregularities in a typical noise-free image are bounded discontinuities, with a zero

Lipschitz exponent, there is no need for more vanishing moments. However, other
properties may be desirable. If the method insufficiently suppresses noisy
coefficient or insufficiently restores a noise-free coefficient, this will appear in the
processed image as a distortion in the shape of the wavelet used. This requires the
use of a smooth wavelet and scaling function to minimize the visual distortion.
Since the method is locally adaptive, and uses the local Lipschitz exponent and
small neighbourhood systems, it is also important that the wavelet has a small
support or fast decay. These are some of the main reasons why quadratic splines
are the chosen wavelets in this method (Malfait, Roose).

5. Wavelet Decomposition and Notations
A discrete wavelet decomposition of a function is decomposition in a particular
basis of functions, called wavelets. A one-dimensional function /is decomposed
f = 'Ldj,k*Â¥j,k
which defines the transform of / to its wavelet coefficients (dj k). The wavelets
are defined in terms of a family of time-frequency atoms obtained by scaling the
mother wavelet y by s= 27 and translating it by k
k, V
1 (t-k\
2' J
For wavelets that are orthonormal with respect to the inner product ((.,.)), the
coefficients are dJk = (^f,y/jJl'j. The factor 27is the characteristic scale of all
wavelets at level j. A wavelet oscillates faster, or it has a higher frequency when
the scale is low, and vice-versa. The scale is hence inversely proportional to the
frequency. The higher the level is, the larger the scale is and the lower the
When the wavelet transform of/computed using the transform (5.1) has a
non-zero coefficient, say, djk, it means the function / has a contribution in the

frequency band corresponding to the scale 27. The location of the coefficient k
also indicates the position of this particular frequency contribution in the domain
of f It is the time instant for single-dimensional signals and the (x,y) position or
the pixel co-ordinates (or the pixel number) for two-dimensional signals (images).
Wavelet coefficients thus provide information about both the frequency
contributions of/as well as their position.
Since, our input signal is an image, say fix.y), the corresponding two-
dimensional wavelet transform is as follows:
A level corresponding to scale 27 now consists of two components, that is, the
horizontal and the vertical components. The coefficients dhjk represent
contributions that are high- frequent in the y-direction and low frequent in the x-
direction. These coefficients especially appear where / has step edges in the
horizontal direction. Similarly, the coefficients dvjk represent contributions that
are high-frequent in the x-direction and low frequent in the y-direction. These
coefficients especially appear where/has step edges in the vertical direction. The
resultant wavelet coefficients are known as the horizontal and vertical details of
the image because the wavelet is considered as a high-pass filter. The wavelet
decomposition is carried out using the Algorithme a Trous described in Chapter
3 to obtain, for each level, a smooth low frequency image called an approximation

image aJ+x and two detailed high-frequency images dhJ+x and dvj+l. These images
are referred to as horizontal and vertical details respectively. In the notations dhj k
and dvj k, the index k replaces (x.y), the co-ordinates of a specific image pixel for
simplicity. Since the processing of the horizontal and vertical components of an
image will be the same, dj k will denote any of these components at level j in this
text. Also, since the processing at any scale will be the same, we shall also
suppress the index j and use dk to denote a wavelet coefficient for any
component (horizontal or vertical), at any fixed resolution level j. Our procedure
discussed in this thesis will introduce proper modification of wavelet coefficients
dk of a noisy image.
5.1 Orthogonal Wavelet Bases and Multiresolution
The partial sum of wavelet coefficients i ^/J k can be interpreted as the
difference between two approximations of / at the resolutions 2 (j,1) and 2~J.
Multiresolution approximations compute the approximation of signals at various
resolutions with orthogonal projections on different spaces {V7}7eZ. The
approximation of a function / at a resolution 2-' is defined as an orthogonal
projection on a space V, c L2(/?). The space V, groups together all possible
approximations at the resolution 2~J. The orthogonal projection of /is the

function /; V; that minimizes|/-/y||. To compute this projection, we must
find an orthonormal basis ofVy. The theorem 5.1 orthogonalizes the Reisz basis
{<9(t k)}keZ and constructs an orthogonal basis of each space Vy by dilating and
translating a single function

Theorem5.1 (proved in Mallat, 2001): Let { Vj}JeZ be a Multiresolution
approximation and be the scaling function whose Fourier transform is
{(o) =
oo .
^ W(a)) + 2kn
1 *
Let us denote
. ^ 1 .(t-k^
y 2J
The family \fJ k \ksZ is an orthonormal basis of Vfor all j e Z.
The orthogonal projection fM off over V is obtained with an expansion in the
scaling orthogonal basis
fu=r,,f= !{/.*%
The inner products
Oj[k\ = (f^JJI)
provide a discrete approximation at the scale 2j. V. can also be represented as

This relation is illustrated by the following decomposition projection diagram
------...Vj.,----- Vj---------Vj+1-------^Vj+2-----.....Vj.,-----Vj
^ ... Wj
Wj ^wJ+1 ^WJ+2
(Finest scale2>+1)
Now we can write the wavelet decomposition as
(Coarsest scale 2J)
fu = Z +Z(/^K*
j**j+1 k eZ
and the approximation error as:
= t£K/>;4
j=- keZ

6. The Proposed Method
Typical wavelet-based noise suppression algorithms proceed according to the
following scheme:
(1) The wavelet decomposition of the image is computed.
(2) The obtained wavelet coefficients are modified.
(3) The cleaned image is then reconstructed from the modified coefficients.
Step (1) has been extensively dealt with in the preceding sections describing
wavelet based decomposition. Commonly used techniques rely on a binary
decision in step (2): the coefficient is classified as either noisy or clean and is
accordingly modified in one of two possible ways. When a particular wavelet
coefficient is denoted by dk, the corresponding binary decision can be represented
as a binary label xk, with xk e {0,l} and the binary operation as <^mod = xkdk, where
(0, if dk is dominated by noise,
xk =
[1, if dk is clean.
The binary values xk are assigned to each image element at each representation
level j and can be considered as a multidimensional random variable whose values
depend upon both signal and noise characteristics.
The problem with binary decisions is that some noisy coefficients can get
restored, and clean coefficients can get suppressed. The proposed method

incorporates a criterion based on two measures and it does not use a binary
decision. The first measure of the criterion is a simple approximation to the local
Lipschitz exponent. The second measure is used to improve the first measure, by
introducing a priori geometrical knowledge about the image, such as the
knowledge that meaningful wavelet coefficients for noisy natural images are
clustered. These two measures are joined together in a Bayesian probabilistic
framework. In addition, the method avoids binary decision and instead works with
probabilities on the wavelet coefficients.
6.1 Notations
We denote the set of binary labels byX = / k e K), which correspond to the
wavelet coefficients dk, where K is the set of indices of the coefficients within a
level. Sets X of this type will be referred to as masks. Similarly, the
coefficients mk, considered as a measure of the noise content in the image
representation D = {dk} at a given level j, will be represented as a random
variable M = {mk}. The coefficients mk are so defined that they approximate the
local Lipschitz exponents at position k. The local Lipschitz exponents represent
image regularity. It is known that the noiseless image is regular and the noisy
image is irregular.

6.2 Bayesian Probability
In contrast to other image denoising methods, this thesis does not attempt to find a
set of masks for the given noisy image that are optimal according to a single
criterion. This method is based on the following approach. For any particular X,
one can specify how probable it is, taking into account the given image and the
chosen measure.
According to Bayes rule, the a posteriori probability equals
P(X / M) = P(M / X).P(X) / P{M) (6.1)
and, as we shall later explain, we can assume that the a priori probability
distribution P(M) is uniform and therefore,
P(X / M) oc P(M / X).P{X) (6.2)
The first factor is the conditional probability and the second factor is the a priori
probability. The a priori probability P(X) represents the fact that when the value
of the random variable M (representing a measure of the noise content and related
to the local Lipschitz exponents) is unknown, the different masks X are not
equally probable. Any kind of a priori geometrical knowledge about masks can in
this way be introduced and exploited. The large wavelet coefficients (that carry
the essential information of an image with little noise) tend to be clustered around
the locations of important features in the image, such as the edge discontinuities,
peaks and comers. The excellent quality of the image reconstructed from these
large coefficients demonstrates that they represent the essential image

information. Thus, we can assign a higher probability to masks in which the 1 and
0 mask labels appear in well-separated continuous clusters.
The second element in (6.2) that should be determined is P(M /X), but
the algorithm actually computes for each coefficient dk, the marginal probability
P(xk MM), that it is a clean coefficient. Thus the relation d0* = xkdk can be
replaced by
dr1 =x,dxl=P(xl-l/M) (6.3)
xk tells us how large the noise influence for each particular pixel is. The
identification and modification of wavelet coefficients dominated by noise as well
as clean coefficients is more selective and more adaptive than in other methods.
This method involves shrinkage of wavelet coefficients with a factor that is
different for each coefficient. To implement this algorithm, we have to generate
the set of masks M and the marginal probabilities P(xk = 1 /M) for all the
resolution levels j, all image elements and for both the components (horizontal
and vertical). The next step would be to modify the coefficients dk according to
(6.3) and then reconstruct the image (Malfait, Roose).
Both the a priori and the conditional probability will be modeled as Gibbs
probability functions. This has the advantage that the variables xk and mk can be
directly described with an image model, called a Markov Random Field (MRF)
model. All the computations to obtain the marginal probabilities or the shrinkage

factors for each wavelet coefficient will be performed using this MRF model. The
link between the probability functions and MRFs will be described in the next

7. Markov Random Fields
This chapter contains the fundamental definitions and properties of Markov
Random Fields (MRF) and links them to the formalism of Gibbs Random Fields
(GRF). This is then extended to image processing methods. The MRF is assumed
as a model describing masks X and M. With each image element, we associate
two state vectors. The set of values for one of them is considered as a realization
of the random variable X and a set of values for the other one is considered as a
realization of the random variable M. The MRF is such a field of state vectors that
each of its elements depends only upon its neighbours in some small
neighbourhood (defined later in this chapter) of this element. So, we have two
two-dimensional MRFs: X and M.
Definition: A Markov Random Field is a field of state vectors with the property
that a state vector depends only upon some of the surrounding state vectors that
are located in a small neighbourhood, but not on the whole field, (Malfait,
7.1 Introduction
The theory of MRFs is part of the larger field of probability theory and describes
the statistical properties of (physical) systems by modeling the local interactions
of the system variables (Li). Local, in this case, means that the states are

connected to each other in a neighbourhood system defined for the topology of
the whole system. The aim is to simplify the systems form and complex
behaviour by using strictly local, simple models, which require fewer parameters
to deal with. From the point of probabilities, the requirement is to find a priori
probabilities and parameters, which assign higher probabilities to the more
probable states. This is not an easy task. However, due to an equivalence between
MRFs and GRFs it is possible to define these probabilities through clique
potentials containing physical quantities like energy, coupling constants, and
temperature. This allows us to exploit the well-studied systems found in statistical
physics for image processing purposes (Besag).
7.2 Markov Random Fields and Gibbs Random Fields
Digital images are in most cases stored as two-dimensional arrays, with each field
representing a pixel, which is the brightness at that particular position. Hence the
system is described as a regular, discrete lattice, where each lattice point can take
one or several states (degrees of brightness). The number of states is typically
finite, but in other applications a continuous state variable might be allowed. The
lattice (or grid) G is used to index a finite set of N elements (pixels). It has a two-
dimensional topology, which may be summarized as:
Where nx and ny are the number of pixels in the horizontal and vertical directions,
respectively, in our case 256 and 256 (as the input image is a 256x256 image), for

example. The state of a lattice point is described by a label, which, in general, is a
member of the set of all possible labels L (binary labels 1 and 0, in our case for
example). For the purpose of image processing
L = (7.2)
Where /, is the brightness level and q is the number of levels (in the case of 8-bit
images, q=256). An important property of the lattice is that the states are all
ordered and a relation like /, < l2 <... < / exists. To combine the lattice definition
and the labeling, each point on the lattice G is assigned one state in L to give the
system state, which finally represents the image X. We have
X = {XX2,...,XN} (7.3)
to denote all possible states of the image, i.e., for all the N pixels. X can also be
seen as a mapping from G to L. As has been noted above, Markov Fields are
described by a system of neighbourhoods, namely
N {w,|V/ e g], (7-4)
Where N, is the set of all the neighbouring points of / and the neighbourhood
relation is
l.itN', that is a point is not a neighbour of itself and,
2./ g N] <=> / g Ni, i.e. if a point i is a neighbour of /, then the point /' is a
neighbour of /.

For a two-dimensional grid the environment of the point defines the
neighbourhood. The neighbourhood system defines which states are neighbours of
each other. For a given neighbourhood system, a clique is defined as a set of sites
such that any two elements in the set are neighbours of each other. Cliques will be
denoted by the symbol c and the set of all cliques for the neighbourhood system
by C. Some of the often-used neighbourhood systems are displayed in the Figure
QH u
<*) Figure 7.1 (a) First order neighbourhood (b) second order (c) fifth order, X is the
center point (d) single state (e) two state (f) two state, and (g) three states

A 3-by-3 clique is used in this algorithm.
Let X = {Xl,X2,...,XN} be a set of random variable defined on the lattice
G, with each variable Xt taking a value x, out of the set L\ X is then called a
Random Field with elements xk.
7.2.1 Gibbs Random Fields
A random field X is called a Gibbs Random Field on the lattice G, with the
neighbourhood system N, if the system configuration of the lattice follows a
Gibbs distribution, of the form:
Hammersly-Clifford theorem: P(X) = -^-exp[-U(X)],
Z = ]Texp[-t/(X)] (7.5)
t/(X) = JX(X) (7.6)
The energy U(X) is the sum of the local potentials, Vc, over all points and cliques.
The value of U depends upon the local configuration of the point and its
neighbouring points. MRFs can thus be conveniently specified using the clique
potentials. As can be seen from the first equation, the probability of a
configuration denoted by P(X) depends upon the energy U. The lower the energy
for a state the higher is the probability for finding the system in this state. Hence,

the model and the parameters have to be chosen to yield low energy values to
obtain more probable configurations (Malfait, Roose).
7.2.2 MRF-GRF Equivalence
The equivalence of these two properties is established in the Hammersly-Clifford
theorem. X is an MRF on G with a neighbourhood system N if and only if X is a
GRF on G with a nearest neighbourhood Gibbs potential (Besag). The main
benefit of this equivalence is that it provides a simple way to specify MRFs by
specifying potentials instead of local characteristics, which is usually very
difficult. Three probability functions are now considered for every set D of
wavelet coefficients in which noise is to be suppressed: the a posteriori
probability P(X / M), the a priori probability/*^), and the conditional
probability P(M / X). Each of these is a Gibbs probability function with a
corresponding MRF model. For the a posteriori and a priori probabilities, the state
vectors are the label variables xk in the mask X. For the conditional probability,
the state vectors are the values mk in the set M. They correspond to wavelet
coefficients dk in D. Consider the MRF X whose states are the binary mask
elements and assume that the cliques are 3x3 neighbourhoods, centered around xk
and denoted by Nk. VNk {X) denotes the clique potential. The four edges of the
image have been zero padded in order to have a complete set of 256X256 cliques
(Appendix D). The a priori probability expresses that masks in which

neighbouring state vectors have the same label value (binary value), are more
probable than those with different values. The clique potentials are therefore
based on a comparison of the central state xt with its neighbours Xj. We then
repeat this procedure for all cliques. The a priori probability is thus
where A is a properly chosen parameter (Malfait, Roose).
7.3 Regularity Estimation
The measure that determines to what extent a wavelet coefficient is affected by
noise is based on the local Lipschitz exponents of the image. The local Lipschitz
exponent can be estimated from the wavelet coefficients. The estimator mk of the
local Lipschitz exponent, corresponding to the wavelet coefficient dk is defined as
the ratio |dj+lk/djk| averaged over a certain number,y, of resolution levels j,
called the depth of the estimator:
P(X) = iexp(-K(X)) with V(X) = (X)
z *
and the clique potentials are defined as

where a is the Lipschitz exponent. When the ratios |dJ+lkldjk| are large, there
is the rapid decrease of dj k with the increase in resolution (2J) or decrease of the
scale (2~J). In this case, the values of the Lipschitz exponent a and its estimator
mk are large. Estimated Lipschitz exponents mk are the values of the multivalued
parameter M. This parameter, together with the set of wavelet coefficients D and
the mask X describes the analyzed image. The range of the Lipschitz exponents,
and thus the domain of the distribution P(M) are bounded in any practical sense,
since both the smoothness and the irregularities of a discretized image are limited.
Thus, the assumption about the uniform distribution of P(M) is justified
(Malfait, Roose).
7.4 Lipschitz Threshold and Initial Mask Selection
In a classic deterministic method one would determine a threshold T and classify
coefficients dk corresponding to Lipschitz estimates mk below T as noisy and
those below T as clean. Although the method illustrated in the thesis does not rely
on this type of binary decisions, thresholds for Lipschitz estimators are needed as
the method needs an initial binary mask to start the computations that will be
described in Chapter 9. The energy of the suppressed coefficients dk in each
scale and component (horizontal and vertical) should match as close as possible to
the supposed noise energy in that scale and component. This will be accomplished
by selecting the initial Lipschitz thresholds accordingly. Estimates of the amounts

of noise energy in the different components are needed. A simple way to derive
them, as illustrated in Appendix C, is by computing the decomposition of a
background part of the image, where only noise is present. In this thesis,
decomposition of the top right comer of the image is obtained for this purpose
(Appendix C). The procedure used in this thesis is as follows:
Determine a\ (the noise variance) for all the components D.
Define the energy as a function of the unknown threshold T.
E(T)= XK
Assuming that
E(T) = cx2d (1 + £),£ is the assumed tolerance (0.05) (7.11)
solve for T.
The threshold T determines the initial mask for a given resolution level j.
Figure 7.2 shows the thresholding results of the Lipschitz estimator mk obtained
by implementing equation (7.9), with y or the depth 2. Levels 1 and 2 are
used again. Appendix A illustrates this procedure. The wavelet coefficients of
level 2 (for both components separately), are divided with corresponding
coefficients of level 1 and the absolute value of each one is taken and, they are
summed up for all of the coefficients. The Figure 7.2 illustrates the initial masks
so obtained, for both horizontal and vertical components for a threshold 7= 1.25. It
clearly shows the edges of the image.

Lipshcitz mask for Horizontal component
Lipschitz mask for vertical component
Figure 7.2 Initial masks for horizontal and vertical components obtained by
thresholding Lipschitz estimator with a threshold 7=1.25. (The black pixels
correspond to the clean coefficients)

8. Conditional Probability Model
For the conditional probability, there is no need for interaction between
neighbours in the neighbourhood system. Thus in (7.5) the sum with respect to
cliques is replaced by the sum with respect to all image elements. The cliques are
thus the individual sites of the wavelet coefficients. The conditional probability is
P(MIX) = Y\P(mlxi) = exp(-£V(.mt /*,)) (8.1)
k k
P(mk / xk) is a probability that mk assumes a probability greater than T \fxk = 1.
If xk = 0, then the Lipschitz exponents mk below T are probable. Since the prior
distributions P{mk) are assumed to be uniform, the conditional probabilities
should meet the condition
P(mk / xk = 0) + P{mk /xk = 1) = K (8.2)
where K is a constant identical for all values mk.
8.1 Stochastic Sampling
Equation (6.3) showed how the method uses marginal probabilities P(xk = 1 / M)
to shrink the wavelet coefficients. These marginal probabilities are derived
from the joint probability function P(X,M), in theory by calculating a weighted

sum over the masks, in which the weights are the a posteriori probabilities. The
calculation is performed as follows:
The summation is for all possible masks for a particular pixel xk. The above
formula demonstrates that the method does not use a binary mask, but a
(marginally) averaged mask. The sum of equation (8.3) is practically intractable
because of the enormous number of possible masks and large number of pixels
xk(it has to be calculated for every pixel). Therefore, we use the estimators,
called stochastic samplers. The idea is to compute the estimates by selecting
appropriate samples from the space of all possible masks. The sample masks
denoted by X. are generated according to the a posteriori distribution P(Xj / M).
The stochastic sampler in the denoising method in this thesis is the classic
Metropolis algorithm described in section 9.1.

9. Monte Carlo Methods for Stochastic Sampling
As explained in section 7.2.1, the model and the parameters have to be chosen to
yield low energy values for more probable configurations. The minimization of
the energy is a combinatorial optimization problem and special algorithms are
required for this purpose. The most widely used algorithms are based on Monte
Carlo methods, which lead to stochastic optimization, for example the Metropolis
algorithm (Metropolis, et at) or simulated annealing. Other methods like genetic
algorithms have also found large fields of applications. This thesis concentrates
on the standard Metropolis algorithm, which gives good results for the image
denoising problem. The idea is to compute the estimates by selecting samples
from the space of possible masks. However, the drawback is, for high levels of
noise many iterations are necessary, which leads to excessive computational time.
9.1 Metropolis-Rosenbluth-Rosenbluth-Teller-Teller
Importance sampling for Markov models is typically realized via Monte Carlo
Markov Chain (MCMC) samplers, such as the Metropolis sampler. Markov
Chain for example, refers to the fact that the sampler generates the sample masks
consecutively, in a Markov Chain. A Markov Chain is a sequence of random
values whose probabilities at a time interval depend upon the value of the number
at the previous time. Each sample is generated from the previous one, such that

the transition between the samples meets certain properties. Moreover, the
generation of candidate samples is based on a random number generator, such that
it is a Monte Carlo method. The sample masks X are not uniformly selected as
in ordinary sampling but in proportion to their posterior probability P(Xj / M).
Masks that have a higher probability thus have a higher probability of being
selected as a sample mask (Metropolis, et al). This can be done as follows:
We select a set of sample masks X. in proportion to their posterior
probability. The number of masks sampled in this set is Nsampl. The estimate of
P(xk =1 / M) is based on Nsampl sample masks.
1 ^lamp!
P(xt=VM)-£/,(*,) (9.1)
sampl j~ 1
/*(*) =
[ 0 if xk = 0
11 if ** = 1
The number Nsampl can be assumed. (9.1) generates a shrinkage factor between 0
and 1 for each coefficient for both horizontal and vertical components. The
sampler starts from an initial mask X0 determined via thresholding of the
Lipschitz exponent as described in section 7.4. This is a good starting point for the
sampling process. The generation of the sample masks then proceeds as follows.

A new candidate sample maskX'^, is proposed from the previous sample^ by
random perturbation at one or several positions k. This means the flipping of
binary labels from 1 to 0 and vice-versa at a randomly selected state (pixel). It
being accepted depends upon the probability ratio r.
P(X J / M) (9 3)
= exp(
Where C denotes the set of cliques affected by the random perturbation. If
r> 1, the candidate sample maskX'y+1 is accepted and becomes the actual sample
mask XJ+]. Considering equation (9.3), we further denote the potential of a
clique c of MRP X by Vc (X). Our sample mask Xj is determined for a given M
for each component. Here, even though we actually deal with
Xj /M, we abbreviate/M as Xj. We replace X} otX'J+l by X, to further
simplify our notation. Also, we replace Vc (X) by VNi (X), which is calculated as
shown in equation (7.8). The Metropolis algorithm is illustrated in Figure 9.1.
Figures 9.2 and 9.3 illustrate the evolution of the initial masks for both
horizontal and vertical components, after the first iteration of the Metropolis
algorithm. 10 such iterations have been performed to obtain the shrinkage
factors. This procedure is performed as shown in Appendix B.

Metropolis Sampler
Choose a starting mask X0 from the space X.
Set shrinkage factor to 0
M= # of iterations
forj= 0 to M-\ do
Generate a candidate stateX x from X} by random perturbation, i.e., flip
binary label at a randomly chosen position.
Set XJ+l=Xj.
Compute r =
nr* i)
if r > 1 then
set Xj+l Xj+l
Accept X j+x' as Xj+l with probability r:
Generate a uniformly distributed random number u between 0 and 1;
if r > u set Xj+x = Xj+x'
Set XJ+x Xj
end for
Figure 9.1 Metropolis sampler

Figure 9.2 Horizontal mask after 1 iteration of Metropolis algorithm

Figure 9.3 Vertical mask after 1 iteration of Metropolis algorithm
Finally, in equation (9.4), the coefficients dk are then modified by shrinkage with
probabilities P(xk = 1 / M) obtained after the Metropolis sampling process.
dkew = dk -P(xk = 1 / M) for all k, and all transform components. (9.4)
Thus, we end up with a set of modified wavelet coefficients for both the
horizontal and vertical components, which are then reconstructed to obtain an
image with suppressed noise. The results of this step of the algorithm, after the
Metropolis sampling has been performed, and the modified wavelet details

(coefficients) have been obtained are shown in the Figures 9.4 and 9.5. They show
the resulting modified horizontal and vertical details obtained after the Metropolis
sampling has been performed for 10 iterations. These details are then applied for
reconstruction using the Algorithme a Trous, as can be seen in Appendix A, to
obtain a reconstructed denoised image with an improved Signal to Noise Ratio
Modified Horizontal details after Metropolis, level 1
Figure 9.4 Modified horizontal details after shrinkage factors have been applied
to each coefficient for levels 1 and 2

Modified Vertical details after Metropois: level 1
Figure 9.5 Modified vertical details after shrinkage factors have been applied to
each coefficient for levels 1 and 2

9.2 Image Reconstruction
The approximations, horizontal and vertical details for levels 1 and 2 of
reconstruction are displayed in Figure 9.6. These are obtained after the
reconstruction procedure shown in Appendix A has been performed. This
procedure has also been explained in Chapter 3. The modified details and
approximations are then applied to the reconstruction procedure of Algorithme a
Trous, and the reconstructed denoised image is obtained. In order to obtain the
final reconstructed image, the MATLAB code used is as follows:
%Add new approximation image to the new horizontal
%and vertical details
%and multiply results by 1/4. This generates the new
%approximation for
%the next level up. The final approximation will be the
y=( 1 /4) (rec_a {k} +rec_d 1 {k} +rec_d2 {k});
Figure 9.7 shows the final reconstructed image y that is obtained by adding up the
reconstructed approximation, horizontal and vertical details for the last level of
reconstruction, i.e., level 1. The reconstructed image is found to have an SNR of
9.7093 dB compared to the SNR of the input noisy image, which is 5dB.

Figure 9.6 Approximations, horizontal and vertical details after wavelet
reconstruction for levels 1 and 2.

Input Noisy Image, SNR=5dB Reconstructed Image, SNR=9.7093dB Original Clean Image
Figure 9.7 Input noisy image, reconstructed image, and the original clean
house image.

10. Conclusion
This algorithm is an overview of the complete noise suppression process. Two
relatively new features to improve techniques for image noise suppression via the
wavelet transform were presented. First, a priori geometrical constraints were
added to the criterion to detect noisy coefficients. Second, the method does not
make an artificial distinction between noisy and clean coefficients, but
instead favours coefficients that are not likely to be noisy, and suppresses
coefficients that are likely to be noisy. The manipulation of the coefficients is
based on this probability. The ability to treat each coefficient individually, and the
possibility to replace the binary decision with this more gradual suppression of
coefficients (shrinkage factors), is precisely the advantage of this probabilistic
approach. This experiment for the noisy test house image shows that this
improves the noise suppression, both qualitatively and quantitatively. Thus, it
demonstrates the feasibility and usefulness of a consistent probabilistic approach
to wavelet-based image processing.

"Algorithme a Trous" MATLAB Program
The following program decomposes a noisy bitmap image house.bmp into
separate horizontal and vertical components, using quadratic spline wavelet and
the Algorithme a Trous explained in Chapter 3. It then obtains the initial mask
by thresholding the Lipschitz estimator and then applies the Metropolis algorithm
function in order to obtain the modified coefficients. These modified coefficients
are then reconstructed to obtain a denoised image,
function atrous_test(image, SNR, level)
% Spline Wavelet filters
hf=[.125 .375 .375 .125]*sqrt(2); %Lowpass decomposition and reconstruction
gf=[.5 -.5]*sqrt(2); %High pass decomposition filter
%High Pass reconstruction filter
gprime=[-0.03125 -0.2185 -0.6875 0.6875 0.21875 0.03125]*sqrt(2);
%Read in the image
%Normalize image
% figure;

% imshow(z);
% title('Original Image');
%Calculate noise from the SNR input specified, and add it to the image
var_s=(std2(X))A2; %variance of the signal
var_n=10A(logl0(var_s)-(SNR/10)); %variance of the noise added
eta=sqrt(var_n) randn(M,N);
SNR=10*logl 0(var_s/var_n);
save noisy_house y map2 %noisy image is stored in y.
% figure;
% image(y);
% colormap(map2)%,colorbar
% xlabel(['Noisy Image,SNR-,num2str(SNR),'dB']);
%L=Level of detail for decomposition and reconstruction
L=level;% specify levels of decomposition desired
%Loop for decomposing image using Algorithme a Trous
for k=l:L
%Calculate Approximations
%Shift either image or latest approximation vertically
for i=l:nc
a{k}(l :nr,i)=leftshift(y(l :nr,i)',2A(k-l))';
%Circular convolution of lowpass filter,hf,with columns from
%vertically shifted image
a{k}(l :nr,i)=cconv(hf,a{k}(l :nr,i)')';
%Shift either image or latest approximation horizontally
for i=l:nr
a {k} (i, 1 :nc)=lefitshift(a {k} (i, 1: nc),2 A(k-1));

%Circular convolution of lowpass filter,hf,with rows from
%horizontally shifted image
a {k} (i, 1 :nc)=cconv(hf,a {k} (i, 1 :nc));
%Calculate horizontal and vertical details
%Circular convolution of high pass filter,gf,with columns from
%horizontally shifted image or previous approximation
for i=l:nc
dl {k}(l :nr,i)=cconv(gf,y(l :nr,i)')';
%Shift resultant image vertically. The result is a horizontal
%detail image
dl {k}(l :nr,i)=leftshift(dl {k}(l :nr,i)',2Ak)';
%Circular convolution of the highpass filter gf with rows from the
%image or previous approximations
for i=l:nr
d2{k}(i,l :nc)=cconv(gf,y(i,l :nc));
%Shift resultant image horizontally. The result is a vertical
%detail image
%Dilate filters by dyadic upsampling
hf=[dyadup(hf,2) 0]; %a trous
gf=[dyadup(gf,2) 0]; %a trous
gprime=[dyadup(gprime,2) 0];
%Display Approximations, Horizontal and Vertical details for 2 levels of
% subplot(2,3,l); imshow(a{l},[]);
% title('Approximation: level T);
% subplot(2,3,4); imshow(a{2},[]);
% title('Approximation: level 2');
% subplot(2,3,2); imshow(dl {!},[]);

% title('Horizontal: level 1');
% subplot(2,3,5); imshow(dl {2},[]);
% title('Horizontal: level 2');
% subplot(2,3,3); imshow(d2{l},[]);
% title(rVertical: level 1');
% subplot(2,3,6); imshow(d2{ 2 },[]);
% title('Vertical: level 2');
%To zeropad all the details so that the Metropolis sampler results can be
x_zp=zeropad(dl {1});
x_zp=zeropad(dl {2});
x_zp=zeropad(d2 {1});
d2_new{ 1 }=x_zp;
x_zp=zeropad(d2 {2});
%To compute approximations to the local Lipschitz exponents mk
%For horizontal components of the wavelet transform
ml=(l/2).*(abs(dl {2} ./dl {1}));
%To find the variance of the horizontal component for scales 1 and 2 of the top
%right portion of the noisy image
% row=10;
% col=10;
% for i=l:row
% forj=l:col
% n(ij)=yl(ij+246);
% end
% end
% [var 1 hor,var2hor]=variance(n,hf,gf);
% varlhor
% var2hor;
% To obtain the optimum threshold for the mask for the horizontal
% component level 1
% Lipthreshhor 1=0.05;
% sumhorl=0;

% for i=l:256
% for j=1:256
% if (m 1 (ij)<=Lipthreshhor 1)
% sumhorl=sumhorl+(abs(dl {1 }(i j)))A2;
% else
% sumhor 1 =sumhor 1;
% end
% end
% end
% sumhor 1
%To obtain the optimum threshold for the mask for the horizontal
% component level 2
% Lipthreshhor2=0.052;
% sumhor2=0;
% for i= 1:256
% for j=l:256
% if (ml(ij)<=Lipthreshhor2)
% sumhor2=sumhor2+(abs(dl {2}(ij)))A2;
% else
% sumhor2=sumhor2;
% end
% end
% end
% sumhor2
Lip=l .25;
for i=l:256
for k= 1:256
if abs(mdl(i,k)) mdl(i,k)=256;
% figure;
% subplot( 1,2,1 );imshow(mdl,[]);
% title('Lipshcitz mask for Horizontal component');

%To compute approximations to the local Lipschitz exponents mk
%For vertical components of the wavelet transform
m2=( 1/2) *(abs(d2{2} ,/d2{1}));
%To find the variance of the vertical component for scales 1 and 2 of the top right
%portion of the noisy image
% row=10;
% col=10;
% for i=l :row
% forj=l:col
% n(ij)=yl(ij+246);
% end
% end
% [varlhor,var2hor, varlver, var2ver]=rvariance(n,hf,gf);
% varlver
% var2ver;
% To obtain the optimum threshold for the mask for the vertical
% component level 1
% Lipthreshver 1=0.05;
% sumverl=0;
% for i=l:256
% for j=1:256
% if (m2(ij)<=Lipthreshhorl)
% sumverl=sumverl+(abs(d2{ 1 }(ij)))A2;
% else
% sumver 1 =sumver 1;
% end
% end
% end
% sumver 1
%To obtain the optimum threshold for the mask for the vertical
% component level 2
% Lipthreshver2=0.052;
% sumver2=0;
% for i=l :256
% for j=l:256
% if (ml(ij)<=Lipthreshver2)
% sum ver2=sum ver2+(abs(d2 {2 } (i j )))A2;

% else
% sumver2=sumver2;
% end
% end
% end
% sumver2
%Applying the derived threshold to form the initial vertical mask
for i= 1:256
for k=l:256
if abs(md2(i,k)) md2(i,k)=256;
% subplot( 1,2,2); imshow(md2,[]);
% title('Lipschitz mask for vertical component')
%Call the Zero-padded matrix as the initial mask XoA, this is to form 256X256
x_zp=zeropad(md 1);
x_initl=x_zp; %This is the initial horizontal mask for the sampling process, for
the first %iteration of the sampling process
x_init2=x_zp; %This is the initial vertical mask for the sampling process, for the
first %iteration of the sampling process
%Call the proposal function which generates the proposal matrix and apply
%the Metropolis sampler algorithm
[x_init,x_final] =metropolis(x_init 1);
x_final=x_fmal./10; %This is done to obtain the shrinkage factors for each
%coefficient by averaging the outputs of all the iterations of the Metropolis
dl_new{l}=dl_new{l}.*x_final; %The shrinkage factors applied to the
%horizontal coefficients at scale 1
subplot(2,2,l); imshow(dl_new{!},[]);

title('Modified Horizontal details after Metropolis: level 1');
dl_new{2}=dl_new{2}.*x_final; %The shrinkage factors applied to the
%horizontal coefficients at scale 2
subplot(2,2,3); imshow(d 1 _new{2},[]);
title('Modified Horizontal details after Metropolis: level 2');
x_final=x_final./10; %This is done to obtain the shrinkage factors for each
%coefficient by averaging the outputs of all the iterations of the Metropolis
d2_new{ 1 }=d2_new{ 1 }.*x_fmal; %The shrinkage factors applied to the
%vertical coefficients at scale 1
subplot(2,2,2); imshow(d2_new{ 1},[]);
title('Modified Vertical details after Metropolis: level 1');
d2_new {2} =d2_new {2}. x final; %The shrinkage factors applied to the
%vertical coefficients at scale 2
subplot(2,2,4); imshow(d2_new{2},[]);
title(Modified Vertical details after Metropolis: level 2');
%Removing the zeropadding on both the new horizontal and vertical details so
that reconstruction can be done
for k=l:L
for i=l:256
for j=l:256
dl{k}(ij)=dl_new{k}(i+l j+1);
d2{k}(ij)=d2_new{k}(i+l j+1);
%Loop for reconstructing the image
for k=L:-l:l
%Decimate filters by dyadic downsampling
gprime=dyaddown(gprime,3); %high pass filter
hprime=dyaddown(hprime,3); %low pass filter
%Circular convolution of rows of approximation,y,with the low pass
%filter hprime

for i=l:nr
ree_a{k} (i, 1 :nc)=cconv(hprime,y(i, 1 :nc));
% Shift resultant image horizontally
rec_a{k}(i, 1 :nc)=leftshift(rec_a{k}(i,l :nc),2Ak);
%Circular convolution of the columns of the resultant image with the
%low pass filter hprime
for i=l:nc
rec_a{k}(l :nr,i)=cconv(hprime,rec_a{k}(l :nr,i)')';
%Shift resultant image vertically
rec_a{k}(l :nr,i)=leftshift(rec_a{k}(l :nr,i)',2Ak)';
%Shift horizontal details vertically
for i=l:nc
rec_dl {k}(l :nr,i)=leftshift(dl {k}(l:nr,i),2A(k-l))';
%Circular convolution of the columns with high pass filter gprime
rec_dl {k}(l :nr,i)=cconv(gprime,rec_dl {k}(l :nr,i)')';
%Shift vertical details horizontally
for i=l :nr
rec_d2{k}(i,l :nc)=leftshift(d2{k}(i,l :nc),2A(k-l));
%Circular convolution of the rows with high pass filter gprime
rec_d2{k}(i,l :nc)=cconv(gprime,rec_d2{k}(i,l :nc));
%Add new approximation image to the new horizontal and vertical details
%and multiply results by 1/4. This generates the new approximation for
%the next level up. The final approximation will be the reconstructed
y=( 1 /4)* (rec_a {k} +rec_d 1 {k} +rec_d2 {k});
%Display all the reconstructed Approximations, Horizontal, and Vertical details,
subplot(2,3,1); imshow(rec_a{ 1},[]);
title('Approximation, Reconstruction: level T);
subplot(2,3,4); imsho w(rec_a {2}, []);
title('Approximation, Reconstruction: level 2');

subplot(2,3,2); imshow(rec_dl {1},[]);
title('Horizontal, Reconstruction: level 1');
subplot(2,3,5); imshow(rec_dl {2},[]);
title('Horizontal, Reconstruction: level 2');
subplot(2,3,3); imshow(rec_d2{ 1},[]);
titleCVertical, Reconstruction: level 1');
subplot(2,3,6); imshow(rec_d2 {2},[]);
title('Vertical, Reconstruction: level 2');
%To calculate the SNR of the reconstructed image
SNR=10*logl0(var_s/var_noise)% SNR of the reconstructed image
SNR_orig=10*logl0(var_s/var_n)% Input SNR of the noisy image
subp!ot( 1,3,1)
title(['Input Noisy Image, SNR-,num2str(SNR_orig),'dB']);
title(['Reconstructed Image, SNR-,num2str(SNR),'dB']);
title('Original Clean Image');

Metropolis Sampler MATLAB Program
This program performs the Metropolis sampler algorithm on the initial masks and
returns the shrinkage factors for the coefficients
%The proposal matrix for the Metropolis sampler method by perturbing all
%the pixels one by one and apply the metropolis condition to each sample
function [x_init,x_final]=metroplolis(x_init)
%Set while loop for number of iterations
while n<=10 %10 iterations required for convergence
for i=2:257
for j=2:257
Vc=0; %clique potentials for initial mask
Vc_prime=0; %clique potentials for proposal mask
x_propl=x_init; %the first proposal configuration
x_final=0; %sum of all the accepted sample masks
%Flip each pixel in the initial mask one by one and make that as
%the proposal matrix
if x_propl(ij)=0
elseif x_propl(ij)=256
%Find the clique potentials of the perturbed label in both the
%initial and the proposal masks using 3x3 cliques
for p=i-l:i+l
for q=j-l:j+l
if x_init(i j)~=x_init(p,q)
Vc=Vc+l; % y=l
if x_prop 1 (ij)~=x_prop 1 (p,q)

Vc_prime=Vc_prime+l; % y=l
r=exp(Vc-Vc_prime);%this is the metropolis decision step
u=rand; %it is the uniformly distributed random number seed between 0 and 1
if r>=l
elseif r>==u
x_final=x_fmal+x_init; %this returns the set of shrinkage factors to be applied to
%the details, once it is averaged over the number of iterations performed.

Variance for Lipschitz Estimator Thresholding MATLAB Program
This function obtains the variance of the wavelet coefficients for both
components, and for 2 levels of decomposition of a small noisy portion of the
image and returns the variance to the main program.
function [varlhor,var2hor,varlver,var2ver]=variance(y,hf,gf)
[nr,nc]=size(y);%y is the noisy portion of the image
for k=l:2
%Calculate Approximations
%Shift either image or latest approximation vertically
for i=l:nc
a{k}(l :nr,i)=leftshift(y( 1 :nr,i)',2A(k-l))';
%Circular convolution of lowpass filter,hf,with columns from
%vertically shifted image
a{k}(l :nr,i)=cconv(hf,a{k}(l :nr,i)')';
%Shift either image or latest approximation horizontally
for i=l:nr
%Circular convolution of lowpass filter,hf,with rows from
%horizontally shifted image
a{k}(i,l :nc)=cconv(hf,a{k}(i,l :nc));
%Calculate horizontal and vertical details
%Circular convolution of high pass filter,gf, with columns from
%horizontally shifted image or previous approximation
for i=l:nc
dl {k}(l: nr,i)=cconv(gf,y( 1 :nr,i)')';
%Shift resultant image vertically. The result is a horizontal
%detail image

dl {k}(l :nr,i)=leftshift(dl {k}(l :nr,i)',2Ak)';
%Circular convolution of the highpass filter gf with rows from the
%image or previous approximations
for i=l:nr
d2{k}(i,l :nc)=cconv(gf,y(i,l :nc));
%Shift resultant image horizontally. The result is a vertical
%detail image
d2{k}(i,l :nc)=leftshift(d2{k}(i,l :nc),2Ak);
%Dilate filters
hf=[dyadup(hf,2) 0]; %a trous
gf=[dyadup(gf,2) 0]; %a trous
%gprime=[dyadup(gprime,2) 0];
varlhor=(std2(dl {1 }))A2;% variance of the horizontal component for level 1
var2hor=(std2(dl {2}))A2;% variance of the horizontal component for level 2
varlver=(std2(d2{ 1 }))A2;% variance of the vertical component for level 1
var2ver=(std2(d2{2}))A2;% variance of the vertical component for level 2

Circular Convolution, Leftshift, and Zero-padding MATLAB Programs
This is a collection of utility functions which perform the tasks of circular
convolution, and left shift (for the circular convolution process) in the wavelet
decomposition by the Algorithme a Trous method, described in Chapter 2. The
zero-pad program does zero-padding to both the rows and columns of the image,
so as to give it 2 extra rows and columns to obtain 3X3 cliques for all the pixels.
%This function does a circular convolution between an input vector, x,
%and a discrete filter,f.
function y = cconv(f,x)
m = length(x);
r = length(f);
%If filter length is smaller than the input vector:
if r <= m,
xextended = [x((m+l-r):m) x];
%If filter length is bigger than the input vector:
z = zeros(l,r);
for i=l:r,
q = r*m -r + i-1;
imod = 1 + rem(q,m);
z(i) = x(imod);
x_extended = [zx];
%Filter the signal

intermediate^ = filter(f,l,x_extended);
%Save only the part of the vector that has the circular convolution results:
y = intermediate_y(r+1 :m+r);
%Left Shift
%Shift a vector value from position 1 to the end of the vector
function ShiftedVector=leftshift(Vector,LengthOfShift)
LengthOfV ector=length(Vector);
for i=l :LengthOfShift
Vector=[TempVector Temp];
Shifted V ector=V ector;
%This function zeropads both rows and the columns of the binary mask matrix
%in order to obtain a square matrix of size 258x258
function x_zp=zeropad(y)
%Zero-padding the rows, that is it adds two new rows at each ends with zero
%as their elements
xzp_rows=[zeros( 1 ,nc);y ;zeros( 1 ,nc)];
%Zero-padding the columns so that it add two new columns on both sides with
%zeros as their elements
x_zp=[zeros(nr+2,l) xzp_rows zeros(nr+2,l)];

Notations Used
(fg) Inner Product
ll/ll Norm
/ = 0(g) Order of: there exists K such that f[n] ^ Kg[n\
Z Integers
R Real numbers
Indicator function which is 1 in [a,b] and 0 outside.
L 2(R) Finite energy functions
12(Z) Finite energy discrete signals
H Hilbert Space
Table E.l

[1] Julian Besag. Spatial Interaction and the Spatial Analysis of the Lattice
Systems. Journal of the Royal Statistical Society, Series B, 36:192-236,
[2] Julian Besag. Statistical Analysis of Dirty Pictures. Journal of Applied
Statistics, 20:63-87.
[3] Donoho and Johnstone. Ideal Spatial Adaptation via Wavelet Shrinkage.
Biometrika, 81:425-455, December 1994.
[4] M. Holschneider, R. Kronland-Martinet, J. Morlet, and P. Tchamitchian.
Wavelets, Time-Frequency Methods and Phase Space, chapter A. Real-
Time Algorithm for Signal Analysis with the Help of the Wavelet Transform,
289-297. Springer-Verlag, Berlin, 1989.
[5] S.Z. Li. Markov Random Field Modeling in Computer Vision. Computer
Science Workbench. Springer Verlag, London, 1995.
[6] S.Z. Li. On Discontinuity-Adaptive Smoothness Priors in Computer Vision.
IEEE Transactions on Pattern Analysis and Machine Intelligence 17(6), 576-
[7] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller.
Equations of State Calculations by a Fast Computational Machine. Journal
of Chemical Physics, 21:1087-1091, 1953.
[8] S.G. Mallat. A Wavelet Tour of Signal Processing. Third Edition, Academic
Press, OrlandoSanDiego, 2001.
[9] M. Malfait, D. Roose. Wavelet based Image Denoising using a Markov
Random Field a priori Model. IEEE Transactions on Image Processing,
6 (4), April 1997.
[10] Y. Xu, J.B. Weaver, D.M. Healy, and J. Lu. Wavelet Transform Domain
Filters: a Spatially Selective Noise Filtration Technique. IEEE Transactions
on Image Processing, 3(6):747-758, 1994.