Citation
Improved sequential and parallel designs and implantations of the eight direction prewitt edge detection

Material Information

Title:
Improved sequential and parallel designs and implantations of the eight direction prewitt edge detection
Creator:
Mohammed, Mohammed B. ( author )
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English
Physical Description:
1 electronic file (130 pages). : ;

Subjects

Subjects / Keywords:
Computer vision ( lcsh )
Pattern recognition systems ( lcsh )
Image processing ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Review:
The exponential growth of the world's technological industry has an important impact on our lives; we are witnessing an expansion in computer power combined with a noticeable development of digital camera capabilities. To keep up with the requirements of the digitalized world, the focus has been set on the computer vision field. One of the most popular computer vision applications is recognition. The word recognition can imply different computer vision areas such as object tracking, face and pattern recognition, human computer interaction, traffic monitoring, vehicle navigation, etc. Edge detection algorithms are widely used within the computer vision and the image processing field. Edge detection algorithms are at the center of the recognition process in computer vision and image processing. This work presents design and implementation of efficient sequential and parallel edge detection algorithms capable of producing high quality results and performing at high speed [1]. The parallel version, derived from our efficient sequential algorithm, is designed for the new shared memory MIMD multicore platforms. The edge detection algorithm presented here is designed to effectively work on images impacted with different noise percentages. This had been achieved through augmenting our edge detection algorithm with an improved median filter capable of suppressing impulse noise and other noises more effectively than the original standard Median filter. A global thresholding method augments our design to dynamically find a suitable thresholding value. In order to measure the quality and execution time, we test images with different sizes along with the original Prewitt and Canny edge detection algorithms already implemented in Matlab, to show the possibility of using our design within different applications. This work will demonstrate the ability to process relatively small and medium images in real-time as well as effectively processing extremely large images, useful for biomedical image processing, rapidly.
Thesis:
Thesis (M.S.)--University of Colorado Denver. Computer science
Bibliography:
Includes bibliographic references.
System Details:
System requirements: Adobe Reader.
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Mohammed B. Mohammed.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
891743425 ( OCLC )
ocn891743425

Downloads

This item has the following downloads:


Full Text
IMPROVED SEQUENTIAL & PARALLEL DESIGNS AND
IMPLEMENTATIONS OF THE EIGHT DIRECTION PREWITT EDGE
DETECTION
by
MOHAMMED B. MOHAMMED
B.Sc., Mosul University, 2009
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science
2013


This Thesis for the Master of Science degree
by
Mohammed B. Mohammed
has been approved for the
Computer Science Program
By
Gita Alaghband, Chair
Tom Altman
Bogdan Chlebus
September 6, 2013


Mohammed, Mohammed, B. (Master of Science in Computer Science)
Improved Sequential & Parallel Designs and Implementations of the Eight Direction
Prewitt Edge Detection
Thesis directed by Professor Gita Alaghband
ABSTRACT
The exponential growth of the worlds technological industry has an important impact
on our lives; we are witnessing an expansion in computer power combined with a
noticeable development of digital camera capabilities. To keep up with the requirements
of the digitalized world, the focus has been set on the computer vision field. One of the
most popular computer vision applications is recognition. The word recognition can
imply different computer vision areas such as object tracking, face and pattern
recognition, human computer interaction, traffic monitoring, vehicle navigation, etc.
Edge detection algorithms are widely used within the computer vision and the image
processing field. Edge detection algorithms are at the center of the recognition process in
computer vision and image processing. This work presents design and implementation of
efficient sequential and parallel edge detection algorithms capable of producing high
quality results and performing at high speed [1], The parallel version, derived from our
efficient sequential algorithm, is designed for the new shared memory MIMD multicore
platforms. The edge detection algorithm presented here is designed to effectively work on
images impacted with different noise percentages. This has been achieved through
augmenting our edge detection algorithm with an improved median filter capable of
suppressing impulse noise and other noises more effectively than the original standard


Median filter. A global thresholding method augments our design to dynamically find a
suitable thresholding value. In order to measure the quality and execution time, we test
images with different sizes along with the original Prewitt and Canny edge detection
algorithms already implemented in Matlab, to show the possibility of using our design
within different applications. This work will demonstrate the ability to process relatively
small and medium images in real-time as well as effectively processing extremely large
images, useful for biomedical image processing, rapidly.
The form and content of this abstract are approved. I recommend its publication.
Approved: Gita Alaghband
IV


DEDICATION
This work is dedicated to my generous mother and supportive father who have sacrificed
their lives in order for me to succeed. This work is dedicated to my wonderful sisters and
my gentle brother in which their love has kept me focused. This work is dedicated to the
soul of a friend, Ahmed Al-Nuri, who will never be forgotten. This work is dedicated to
my only uncle who passed away without seeing me graduate, to those who are working to
rebuild my destroyed country, to the innocent being murdered on a daily basis and to all
my beloved friends who are my treasures, forever.
v


AKCNOWLEDGEMENTS
I acknowledge my fellowship from the Higher Committee for Education Development in
Iraq (HCED).
I want to show my gratefulness and gratitude to my consultant, advisor and mentor
Professor Gita Alaghband for her guidance, support and encouragement throughout my
study. Without her invaluable help, it would not be possible for me to achieve this level
of academic proficiency; I would not have the chance to collaborate with such
knowledgeable and incredible researchers.
I would like to express my gratitude to all the faculty members in the computer science
department at the University of Colorado Denver and to all my professors at the
University of Mosul in Iraq.
vi


TABLE OF CONTENTS
Chapter
1. Introduction..................................................................1
1.1 Edge Detection Stands in the Digitized World..............................2
1.2 Edge Detection and the Most Well-known Detection Algorithms...............3
1.2.1 Introduction...........................................................3
1.2.2 Prewitt Edge Detection and the Most Recent Improvements...............4
1.2.3 Canny Edge Detection as a Quality Reference............................8
1.3 Smoothing Mechanism.......................................................10
1.3.1 Introduction..........................................................10
1.3.2 Median Filter.........................................................11
1.3.3 Peak Signal-To-Noise Ratio (PSNR) as a Quality Metric.................11
1.4 Iterative Thresholding and the Most Well-known Iterative Thresholding.....13
1.4.1 Introduction..........................................................13
1.4.2 Otsu Thresholding.....................................................13
1.4.3 Basic Global Thresholding.............................................14
1.5 The Significance of the Study.............................................14
1.6 Statement of the Problem..................................................15
1.7 Thesis Organization.......................................................17
2. Background....................................................................19
2.1 Basic Definition..........................................................19
2.1.1 Most Common Types of a Digital Image..................................19
2.1.2 More about Edges......................................................20
2.1.3 Noise.................................................................21
2.2 Common Image Processing Operations........................................23
2.2.1 Linear and Nonlinear Spatial Filtering................................23
2.2.2 Denoising.............................................................25
2.2.3 Thresholding..........................................................26
2.2.4 Histogram.............................................................28
2.3 Some of the Challenges in the Field of Image Processing...................29
2.3.1 Real-time Image Processing............................................30
2.3.2 Parallel Image Processing.............................................31
vii


2.3.3 Process High Resolution Images......................................33
2.4 Common Programming Languages Utilized for Image Processing and Computer
Vision Fields...............................................................34
2.4.1 Matlab..............................................................35
2.4.2 OpenCV..............................................................35
3. Experimental Setup..........................................................38
3.1 Main Multicore Shared Memory Architectures Employed to Run the Experiments
38
3.1.1 Opteron Module......................................................39
3.1.2 Bulldozer Module....................................................40
3.2 Main Accessories Implementations........................................41
3.2.1 Changing to Gray Scale Channel & Converting between Extensions......42
3.2.2 Adding Noise........................................................43
3.2.3 Peak Signal-To-Noise Ratio (PSNR)...................................43
4. Research Methodology........................................................45
4.1 Simulation & Evaluation of the Eight Direction Prewitt....................46
4.1.1 Bidirectional Prewitt vs. Eight Directions Prewitt..................46
4.1.2 Processing Different Color Spaces can Impact the Quality............50
4.1.3 Eight Directions Prewitt Working on Noisy Images....................53
4.2 Design..................................................................55
4.2.1 Selecting an Appropriate Filter to Use as Prefix Step to Eight-direction Prewitt
56
4.2.2 Dynamic Iterative Thresholding......................................58
4.3 The Implementation of the Improved Eight Direction Prewitt Edge Detection.59
4.3.1 Sequential Implementation...........................................60
4.3.2 Parallel Implementation to Accelerate the Process of Detection......66
5. Experimental Results And Analysis...........................................74
5.1 Quality of Proposed Algorithm...........................................74
5.1.1 The Improvement of Proposed Median filter...........................74
5.1.2 Visualizing the Quality of our Improved Eight Direction Prewitt Algorithm. 77
5.2 The Efficiency of our Proposed Implementations..........................81
5.2.1 The Efficiency of the Sequential Eight Direction Prewitt Edge Detection
Implementation............................................................82
5.2.2 The Efficiency of Our Parallel Eight Direction Prewitt Edge Detection
Implementation............................................................83
5.3 Answers for the Research Questions......................................86
viii


5.3.1 Is it Possible to Come Up with Improved Version of Prewitt Edge Detection
Algorithm?......................................................................86
5.3.2 How to Overcome the Additional Complexity Added to the Original
Bidirectional Prewitt Algorithm?................................................88
5.3.3 How Efficient the Algorithm is Compare to the Original Prewitt and the Well-
known Canny Edge Detections?....................................................89
5.3.4 Will the Improved Version of this Algorithm Perform Well Unconditionally?
90
5.3.5 How Suitable is the Improved Version of this Algorithm for Real-time Edge
Detection Application?....................................................90
6. Conclusion..................................................................93
References.....................................................................94
Appendix......................................................................100
IX


LIST OF TABLES
Table
1: Comparison of quality between standard Median filter and our proposed one using
PSNR applied on Lena impacted with different percentage of impulse noise........75
2: Sequential run of the presented algorithm vs. the bidirectional Prewitt and Canny edge
detections implemented within Matlab............................................82
3: Execution time for the proposed algorithm running on 12-core compute nodes...83
4: Execution time for the proposed algorithm running on 64-core compute nodes...84
x


LIST OF FIGURES
Figure
1.1: Windows used with original Bidirectional Prewitt.............................5
1.2: Original two directions Prewitt edge detections Pseudocode..................6
1.3: Templates of different directions used in bidirectional improved eight direction
Prewitt.............................................................................7
2.1: Edge in images................................................................21
2.2: Lena and Camera man images corrupted with different types of noise...........22
2.3: Illustration of one-dimensional correlation and convolution..................24
2.4: Applying averaging filter on both Lena and Camera man........................26
2.5: Convert to a binary image using thresholding mechanism.......................27
2.6: Histograms plot for Baboon image...................................29
3.1: Pseudocode of changing to gray and converting between image extensions.......42
3.2: Pseudocode of adding different types of noise using Matlab...................43
3.3: PSNR Pseudocode...............................................................44
4.1: Flow char of Eight Direction Prewitt edge detection algorithm................47
4.2: Eight Direction Prewitt edge detections Pseudocode...........................48
4.3: Results of applying different directions of the Prewitt algorithm............49
4.4: Original bidirectional Prewitt vs. Eight Direction Prewitt...................50
4.5: Processing different channels with eight direction Prewitt...................52
4.6: Demonstrate Prewitts sensitivity working on image impacted with impulse noise. 54
4.7: The procedure of the Basic Global Thresholding...............................58
4.8.a: Procedure of the sequential proposed algorithm.............................61
xi


4.8.b: The Flow chart of the sequential version of the improved eight direction Prewitt
edge detection application...........................................................61
4.9: Snapshot of our Region of our application that allows processing ROI...........65
4.10: Processing life................................................................69
4.11: Procedure of the parallel version of the proposed algorithm...................71
4.12: Partitioning the image into sub images and apply eight direction Prewitt on each of
the sub images.......................................................................72
4.13: Flow chart of the proposed parallel application....Error! Bookmark not defined.
5.1: Visualize the difference between our Improved Median filter and the Standard
Median filter.............................................................76
5.2: Qualify of the Improved Median filter and the Standard Median filter.76
5.3: Output of Canny (industry standard) and bidirectional Prewitt vs. our proposed work
..................................................................................78
5.4: Demonstrate the noise sensitivity of Prewitt edge detection working on image
impacted with impulse noise........................................................81
Xll


1. Introduction
At the rate in which current processor technology is advancing, Moore's law is no
longer an accurate unit for measurement of computation output of processors [2, 3], This
has taken place after the industrial race for higher CPU clock frequency is reaching its
limits. The sequential world view needs to constantly adapt in order to keep up with
accelerated development platforms [4], This will be achieved through parallel
implementation of algorithms that is based multithreading programming languages [4],
Heavy computations needed by image processing applications whose usage has increased
and become a very essential part of our life belongings [5], Future multimedia systems
need to have high performance architecture. The required level of performance is not
being achieved anymore by the single core architecture [6], This limitation has affected
the progression of architecture design in which distributed and parallel systems are
rapidly growing.
The exponential growth of the worlds technological industry has an important
impact on our lives; we are witnessing an expansion of computer power combined with a
noticeable development of the digital camera capabilities. In keeping up with the
digitalized world, much focus has been set on the computer vision field. In this study
different concepts in image processing and computer vision have been covered where we
start from very fundamental concepts to end with future work ideas that are not limited
with a boundary.
In this chapter, the significance of edge detection in the image processing and more
precisely the detection filed will be explained. After that, the most common edge
detection algorithms relevant to our work and related researches and developments which
1


have contributed to them will be covered. Next, we briefly define smoothing and describe
the most related smoothing mechanisms utilized in this work. After that, thresholding, the
need for it and the most common thresholding mechanisms will be covered. Finally, the
significance of our study will be shown and the main research questions will be
highlighted.
1.1 Edge Detection Stands in the Digitized World
Computer vision is the field in which digital images are mainly transferred from
capturing device (camera or video), to be processed using different image processing
mechanisms that help achieving particular goal [7], The main aim of the computer vision
field is to simulate human vision where Artificial Intelligence (AI) can be used to make a
decision while image analysis, which falls between the image processing and the
computer vision, is exploited in image understanding [8],
Edge detection is one of the most usable operations within the computer vision and
the image processing fields [9, 10, 11, 12, 13, 14, 15, 16], The goal of edge detection is to
specify the pixels at image where sharp changing of intensity is taken place [17],
Detection algorithms are central for image analysis and recognition applications [3, 8,
13], The word recognition can imply different computer vision areas such as object
tracking, face and pattern recognition, human computer interaction, traffic monitoring,
vehicle navigation, image segmentation etc. [13], Therefore, having an efficient
recognition algorithm both in terms of quality and execution time can implicitly push
forward the real time recognition applications [1],
2


1.2 Edge Detection and the Most Well-known Detection Algorithms
In this section, some of the well-known edge detection algorithms will be covered
while the main focus is set toward Prewitt edge detection and the most recent
improvements have taken place on it. Also, Canny edge detection, the most sophisticated
edge detection operator, is highlighted since it becomes a quality reference of the
improved Prewitt algorithm presented in this work.
1.2.1 Introduction
Edge detection is one of the most usable operations within the computer vision
and the image processing fields which becomes central for image analysis and
recognition applications [3, 8, 9, 10, 11, 13, 14, 16], Edge detection essentially looks for
sharp changes in brightness of intensities [8, 17, 18], Edge detection algorithms, as
explained by Gonzales and Woods, usually contain four steps one of them is optional.
The first step is smoothing which is the mechanism of suppressing noise from the original
image to decrease the number of false detected edges. The second step is enhancement
(sharpening), in which an attempt is made to reconstruct some of the edges which have
been destroyed during the smoothing step. However, sharpening usually is an optional
operation where some algorithms are used while others are not. After that, detection
(localization) is applied to decide which non-zero pixels are considered an edge and
which are not. Finally, thresholding mechanism, is applied on the gray scale image
resulted from the localization step to a binary image. To measure the quality of edge
detection, we need to describe the criteria of optimal edge detection algorithm which are
good localization and good detection [11, 19], Good detection implies decreasing the
3


probability of detecting false edges while good localization means the detected edges are
as close as possible to the true edges [11, 19],
Edge detection algorithms are usually categorized, as explained by Phueakjeen,
Jindapetch, Kuburat and Suvorn, into either Gradient or Laplacian method. Within
Gradient method edges are detected through looking for maximum and minimum
intensity values takes place at the first derivative. On the other hand, with Laplacian,
edges are detected from zero crossing in the second derivative where zero values
represent the location of edges. Prewitt, Sobel, Robert, and Canny are examples of
gradient edge detection operators, while Laplacian of Gaussian is an example of
Laplacian operators [18], Prewitt and Sobel gradient operators offer to detect edges with
certain orientations [16, 20, 21], On the other hand, the best known industry standard
algorithm is Canny; it is shown to have high computational complexity [16, 18. In our
work the Prewitt edge detection algorithm which is known for its simplicity and
computational efficiency has been employed in the present study. Even though Prewitt
algorithm is known for its efficiency in number of computation, yet its noise sensitivity,
according to different analytical studies, has negatively affected its usability for most
common image processing application [8, 16, 17, 22, 23], Next, we will focus on two
main edge detection algorithms where Prewitt is employed as a core in our work (Prewitt)
and the other (Canny) as a quality reference.
1.2.2 Prewitt Edge Detection and the Most Recent Improvements
Prewitt edge detection algorithm as defined by Granzales, Woods and Edds is a
discrete differentiation operator with the goal of calculating the gradient of the image
intensity function, where the convolution operation is done on each pixel in both
4


horizontal and vertical directions (0 and 90 degrees) [8], This gradient edge detection
algorithm looks for maximum and minimum derivatives of the image [13], Prewitt as
well as Sobel edge detection algorithms offer the capability to detect edges with certain
orientations [16, 21], In order to detect edges in both horizontal and vertical directions,
the input image is masked with two matrices (windows) each of size 3 by 3 (Figure 1.1).
Figure 1.1: Windows used with original Bidirectional Prewitt.
Masking the input image with the windows in Figure 1.1, results in two intermediate
templates (candidates). This means for each pixel there are at most two candidates to
choose from. The strategy of election is to elect the pixel which has maximum intensity
value (sharper changing) to represent that pixel within its final destination. However,
limiting the algorithm to choose between only two candidates is not sufficient to produce
accurate results. It is possible to use all possible eight directions, as shown in Figure 1.2
and later in section 4.1.1, to detect edges from all eight candidates. An improved version
of the original bidirectional algorithm that handles eight directions was proposed in 2008
by Ci and Chen [17], The Pseudocode corresponding to the bidirectional Prewitt edge
detection is represented below in Figure 1.2.
5


/* Where f(x,y) is the input image, kl(x,y)and k2(x,y)are the two mask windows in direction 0
and 90 respectively and g(x,y) is the output image */
I[imgHeight, imgWidth] := imRead(image); /* imgHeight: images height, imgWidth: images
width. */
kl[maskHeight,mWidth] := imRead(Mask);/* maskHeight: masks height, maskWidth: masks
width. */
k2[maskHeight,mWidth] := imRead(Mask);/* maskHeight: masks height, maskWidth: masks
width. */
h := (maskHeight-1)/2 ;
w := (maskWidth-1)/2 ;
for y := 0 until imgHeight do
for x := 0 until imgWidth do
{ sum = 0 ; suml = 0 ; sum2 = 0 ; /* initialization */
for i := -h until h do
for j := -w until w do
{ suml + = kl(j,i)*I(x-j,y-i) ; /*convolution in the first direct*/
sum2 + = k2(j,i)*I(x-j,y-i); /* convolution in the second direct*/
sum = max(suml, sum2) /* select the max intensity */
}
______g(x,y) = sum; }____________________/* result image */________________________________
Figure 1.2: Original two directions Prewitt edge detections Pseudocode.
6


\
\
-1 -1 -1
0 0 0
1 1 1
1 1 1
0 0 0
-1 -1 1
-1 -1 0
1 0 1
0 1 1
1 1 0
1 0 -1
0 -1 -1
-1 0 1
-1 0 1
1 0 1
1 0 -1
1 0 -1
1 0
0 1 1
1 0 1
1 0
0 -I -1
1 0 1
1 1 0

/
/
Original
Algorithm Additional Directions Improved Version
Figure 1.3: Templates of different directions used in bidirectional improved eight
direction Prewitt.
Adding more directions to the original algorithm improves the quality of the original
Prewitt algorithm. Additional improvements such as incorporating thresholding and
implicit smoothing techniques were suggested by Lei, Dewei, Xiaoyu, and Hui in 2011
[23], The significant contribution of their work is the ability to use an implicit smoothing
technique to enable the algorithm to work on noisy images. This was particularly
important as the Prewitt edge detection algorithm is known to be sensitive to noise and
therefore may not generate satisfactory results in the presence of noise. The implicit
smoothing technique used in [23] calculates the gradient magnitude of the eight
directions as the basis to find the final magnitude to reduce the effect of noise. Our
7


experiments show that the gradient based on implicit smoothing approach will not be
very effective when the image is impacted with high noise percentage. Instead, we have
conducted some research and experiments to come up with more effective smoothing
technique as we will see in our experiments section (5.1.1). Also, the design presented in
[23] is very complex compared to the original bidirectional algorithm, where complexity
is an important factor which might affect the usability of any algorithm. Nevertheless,
improving the quality of the algorithm obviously requires additional complexity. Thus, to
overcome the additional complexity, we have come up with and efficient parallel
implementation that is based on our solid sequential one as shown in Chapter 4 in the
design section [1],
1.2.3 Canny Edge Detection as a Quality Reference
Canny edge detection algorithm was first designed by John Canny in 1986. It is a
multi-stage detection algorithm based on computational approach [11], Nowadays, Canny
is considered to be industry standard and also as the optimal edge detector that is used to
evaluate other edge detection algorithms [16, 17, 18, 21, 24], Despite the fact that Canny
is considered to be the optimal and standard edge detection, many studies have
highlighted some disadvantage of this operator which can negatively affect its
applicability. The process of Canny edge detection algorithm is briefly explained in
different articles. Joshi and Koju have compared between different edge detection
algorithms and show the main steps followed in in Canny [17], These steps are
smoothing, localization, direction calculation, non-maximum suppression, hysteresis
thresholding. It is also shown that Canny edge detection algorithm over perform all the
8


other gradient operators, such as Robert, Prewitt, and Sobel, working on all different
scenarios in which images are impacted with different types of noise.
Zhao, Qin and Wang in their study Improvement of Canny Algorithm Based on
Pavement Edge detection in 2010 cover some of the weaknesses of Canny edge
detection algorithm. They mention that even though Canny is extensively being
consumed within the detection field, yet, failing of detecting weak edges limits its
applicability to many edge detection applications that requires high quality detection
capability. Also, the original Canny algorithm uses pre-specified thresholding values that
require some different trials before getting the optimal thresholding. However, many
researches have been contributed in providing Canny with an iterative thresholding
mechanisms to overcome these weaknesses [25], [24],
Even though Canny edge detection is known to be the standard edge detection
algorithm yet, different researches have shown that this algorithm has high computational
complexity [17], This complexity is time consuming and probably is reflected on the
usability of such operator. In other word, more complexity leads to unsuitability for
applications that require rapid processing capability such as real-time detection
application [16], In this work, we have used Canny as a quality measurement (reference)
to compare our improved version of Prewitt edge detection algorithm with other edge
detection operators that are already in-use. The comparison between Canny and the
improved proposed version will be on both quality and execution time.
9


1.3 Smoothing Mechanism
In this section, a short introduction about noise suppression and the most well-
known mechanisms are covered. However, Median filter is briefly explained since it is an
important part that is augmented with our improved Prewitt edge detection algorithm.
Finally, Peak Signal-To-Noise Ratio (quality measurement) will be highlighted.
1.3.1 Introduction
Smoothing functionality is at the center of digital image processing in which
many researchers have elaborated to enhance some of the existing smoothing
mechanisms. The reason is that digital images are usually corrupted with different types
of noise [26], Therefore, choosing a suitable smoothing mechanism is a very challenging
process especially when smoothing becomes a part of an edge detection algorithm.
Suitable smoothing mechanism implies not only being effective in noise suppression but
also in keeping important details of an image unchanged. It is known that many of the
smoothing mechanisms are very suitable for suppressing different types of noise, such as
the average filtering, but at the same time it is not guaranteed to keep details of an image
unchanged. In other words, it is significant to ensure that the smoothing filter used (for
the sake of noise suppression) will not destroy real edges which are the most valuable
information for edge detection algorithms. By the time we have shown the importance of
good smoothing mechanism in edge detection, in this work we will present a suitable
smoothing mechanism that can be augmented with our design of the improved edge
detection algorithm that is based on the eight direction Prewitt. The smoothing
mechanism presented in this study is referred to as the Improved Median filter which has
10


been (based on our results) proven to be more capable in suppressing impulse and other
types of noise than the well-known Standard Median filter which will be explained next.
1.3.2 Median Filter
The performance evaluation study done by Kumar, Srinivasan, and Rajaram has
shown, in [26], that most digital images should be smoothed with linear filter known as
low pass filter. However, low pass filter will blur the edge and destroy lines, comers and
other important details found in an image. These reasons have led researchers to be
concentrate more on employing non-linear filters which has less negative impact on real
edges.
Median filter is a nonlinear filter used to remove impulse noise with the least
negative impact on the real edges [8, 26, 27, 28], For each pixel, median filter starts
arranging all the pixels within the specified window in ascending order to choose the
median value [8, 13, 26, 29, 30], This approach allows keeping some real values
unchanged as opposed to the blurring technique that takes the average of all pixels within
that neighborhood. Median filter is used often in image and pattern recognition
application due to its ability of suppressing, impulse and other types of noise, very
effectively with less negative impact on the real edges [8, 13, 26, 28, 29],
1.3.3 Peak Signal-To-Noise Ratio (PSNR) as a Quality Metric
Applying image processing operations on images will cause the loss of important
information and often changes the quality of the image [31], Nowadays, it is becoming
very necessary to allow evaluating the quality of an image within different image
11


processing applications such as information hiding, art, pattern recognition (to name a
few) [32, 33], adaptive image segmentation based on peak signal-to-noise ratio. Peak
Signal-To-Noise Ratio (PSNR) is one of the most common objective quality estimator
techniques which calculates the mean-squared error (MSE) between the processed and
the original image [26, 30, 31, 34, 35, 36, 37, 38], Next, the study will show how this
well-known quality measurement technique works. Having an image with M height, N
width, and B number of bits are used in image construction (based on B the maximum
possible intensity is specified).
pSNR (Imgl, Img2) = 10 log MSE Jmgljmg2)
MSE (Imgl, Img2) S" i (i,j) Img2 (i,j)f
Simplicity is one of the main reasons behind the popularity of PSNR [38], From the
equation above, it can be observed that the main idea is to find the mean squared error
(MSE). MSE is the square summation of the difference between all the intensity values of
the original image and all the corresponding intensities of the processed one. The higher
result (dB) of the PSNR function implies better quality (closer to the original image).
This technique is also utilized to judge the efficiency of any image compression method.
Yet, in the present work, this technique implemented in Matlab to be a proof of the
quality gained from applying our improved smoothing mechanism compared to the result
of the original standard Median filter.
12


1.4 Iterative Thresholding and the Most Well-known Iterative Thresholding
In this part, a short introductory regarding the importance of thresholding will be
given. Also, some of the main iterative thresholding mechanisms, are heavily employed
in the image segmentation and detection field, are covered.
1.4.1 Introduction
As it is mentioned earlier, thresholding mechanism has become one of the main
cores in various image processing applications due to its intuitive properties [8],
Thresholding is widely used in image segmentation due to its simplicity and the ease of
its implementation [39, 40], The idea of having an iterative method of choosing the
threshold value, based on the image data itself, is a preferred approach in image
processing. In the next parts, we will explain some of the well-known iterative
thresholding mechanisms which are currently in use and employed in our improved
Prewitt design.
1.4.2 Otsu Thresholding
Otsu is one of the most popular thresholding mechanisms known of it is
effectiveness [8, 39], This thresholding, which is based on histogram, maximizes the
between-class variance to allow for choosing an optimal thresholding value to segment
the image properly [8, 41, 42], Otsu is mainly designed to relay on 1 dimension (ID)
histogram that is, according to some analytical studies, proven to fail processing obvious
valley located in between two peaks [39], In order to overcome this problem, different
researches have contributed to come up with an improved version of Otsu such as the 2D
13


Otsu explained briefly in [39], On the other hand, there are other global iterative
thresholding mechanisms which can be less complex than OTSU while still producing
satisfactory results as the basic global thresholding which is explained bellow.
1.4.3 Basic Global Thresholding
The basic global thresholding is a well-known thresholding mechanism that is
famed of its computational simplicity, acceptable quality and applicability to a variety of
applications [8], This algorithm works iteratively to find the optimal thresholding value
depending on the data of the image itself. We have implemented this algorithm and
augment it with our improved design of the algorithm as briefly explained in 4.2.b.
1.5 The Significance of the Study
The need for an efficient recognition application has motivated us to come with a
new improved design of the existing edge detection algorithm known as Prewitt. Coming
with an effective design and implantation of this algorithm can expedite the move
towards real-time recognition applications. Our long-term goal is to eventually have an
optimized high quality parallel and sequential versions of the algorithm that are capable
of detecting efficiently. While the best known industry standard algorithm is Canny, it is
shown to have high computational complexity [16], The well-known Prewitt edge
detection algorithm has been used in the present study for its simplicity and
computational efficiency. However, Prewitts drawback is its sensitivity to noise making
it unsuitable for most common image processing applications [8, 16, 22, 23], For this
reason the present work will set toward strengthening Prewitt edge detection algorithm to
14


let it be more capable working in a practical environment while performing at a rapid
speed.
1.6 Statement of the Problem
The original bidirectional Prewitt algorithm is known for its good computational
complexity but noise sensitivity; lack of an affective smoothing mechanism has mostly
disqualified its usage. Analytical studies, conducted, have indicated that Prewitt edge
detection algorithm cannot be used along with practical images that are often corrupted
with impulse noise as well as Gaussian and Poisson noises [8, 16, 22, 23], The majority
of images are often inflicted by varying degree of noise caused by transmission channels
and camera sensors [22], Furthermore, Prewitt algorithm does not incorporate a
thresholding mechanism in order to produce binary images. Most object detection
applications rely on background subtraction. The presence of only two values in the
resulting binary image, one representing the object and the other representing the
background, is desirable in object detection applications [40], Also, through this work our
goal is to answer the research questions shown below:
1. Is it possible to come up with an improved version of Prewitt edge detection
algorithm that results in a better quality?
2. Coming with a better quality algorithm obviously implies additional complexity, so
the question is how to overcome the additional complexity added to the original
bidirectional Prewitt algorithm?
15


3. If we come up with an improved version of the Prewitt, then how efficient the
algorithm will be, both terms of quality and execution time, compared to the original
Prewitt and the well-known Canny edge detections?
4. Will the improved version of this algorithm perform well unconditionally?
5. How suitable the improved version of the algorithm is for real-time edge detection
application?
To answer the first question, the goal of the present study is to enhance the quality of
the bidirectional algorithm to overcome all its weaknesses affecting its practical usage.
The main plan is to augment the original with additional functionalities. These
functionalities should allow the algorithm detecting edges in all eight directions (instead
of two only), adding both an affective smoothing and iterative thresholding mechanisms.
Also, augmenting more functionality to the original Prewitt conflicts the main
characteristic of Prewitt which is good computational complexity. For this reason, if we
are able to come up with an improved version of Prewitt algorithm then it we will be
required to solve the problem of adding more complexity to the original algorithm. More
computation complexity implies more time consuming which can affect the usability of
such algorithm. In order to provide an answer for the second question, efficient sequential
and parallel versions of the algorithm should be implemented to overcome the
complexities added to the original Prewitt.
Measuring the efficiency of the improved version of the algorithm requires
comparing it with an optimal edge detector used to evaluate other edge detection
algorithms which is Canny. Therefore to answer the third question, both quality and
16


execution time of the improved Prewitt and Canny (already implemented in Matlab) will
be compared.
Next, performing well unconditionally implies overcoming many well-known
challenges in the field of image processing. An example of these challenges is effectively
processing different image sizes with extreme high resolution images. Processing high
resolution images is very common practice especially in the field of biomedical image
processing. Also, the need of flexibility in processing is another challenge in which only
specific region of the image needs be processed instead of the whole image.
Finally, if the presented algorithm is efficient then the second question will be, is the
algorithm efficient enough to serve in real-time edge detection application. In other
words, real-time edge detection application requires high process capability to allow
detecting edges in real-time. Many researches, as shown in 2.4.1, have contributed in
specifying the main specifications of an algorithm (implementation) to be qualified for
real-time application. Thus, based on our experimental results, we will show whether it is
possible or not for our improved Prewitt edge detection application to satisfy all
requirements of needed by real-time application.
1.7 Thesis Organization
The rest of the thesis is divided as follows. Chapter 2 gives a brief background of
the terminologies and main concepts which help in the understanding of the later work.
This will include a brief explanation of the image processing operations that are
commonly used. Examples of these image processing operations are smoothing,
localizing and thresholding. Also, in this chapter, we will cover some of the related work
17


of parallel image processing and how to detect parallelism in a sequential algorithm in
addition to the programming languages that is employed in our work. Next in
experimental setup Chapter3 the architectures of the multi-core platforms are briefly
explained. Also, the main accessories applications employed to support our main
implementation are covered. Later, in Chapter 4, a brief description (Pseudocode, flow
chart) of our augmented (design and implementation) of the improved eight direction
Prewitt algorithm is covered. The design presented includes both the sequential and the
parallel version of the improved algorithm. The results are presented in Chapter 5. These
results show the quality of our improved smoothing mechanism, quality of the improved
eight direction Prewitt, execution time taken by the sequential and the parallel
application, and the answers for the research questions. Finally in Chapter 6 the
conclusions and propose the future work will be summarized.
18


2. Background
In this chapter a brief introduction of the terminology used in this work is given to
help in the understanding of the later work. We start explaining some of the fundamental
concepts of digital image processing and computer vision fields with some of the related
works that have been taken place in these fields. Also, we will highlight some ideas of
parallel image processing, detecting parallelism in sequential algorithms, and the main
challenges in the field of image processing.
2.1 Basic Definition
In this section, the definition of the basic and standard notations will be covered in
order to simplify the reading throughout the rest of our work. Most of the notations are
taken from the books Digital Image Processing Using MATLAB by Granzales, Woods
and Edds [8] and Learning OpenCV by Bradski and Kabler [7],
2.1.1 Most Common Types of a Digital Image
Digital image is defined as a two-dimensional function,/(x, y), where x and >' are
spatial coordinates, and the amplitude of / at any pair of coordinates (x,y) is called the
intensity or gray level of image at that point. [8], Each element of any digital image is
usually presented by the term pixel which the range of its value is relevant to the number
of bits used to construct that image [43], For instance, the value of any pixel that is part
o
of a gray scale 8-bit image should be the integer values in the range of 0 and 2 -1. These
values will represent a shade of gray where 0 is the darkest (black) while 2 blts is the
lightest (white). In other words, the number of bits used in representing an image will
19


determine the possible colors (different intensities) that perhaps appear within that image.
Moreover, images can come in other forms such as binary, indexed, and RGB images.
The binary image, known as a logical image, has only two intensities that are either 0
(black) or 1 (white). On the other hand, color images, such as RGB images, are the result
of combining different gray scale channels where each channel represents a specific
colors domain. It is a common practice in the field of image processing to convert from
one image type to another. Digital images can be saved within different formats such
.JPEG, .PNG, .PNG, .BMP, .TIFF, etc. Some of these formats are lossy compressed such
as .JPEG (Joint Photographic Experts Group) while .TIFF (Tagged Image File Format) is
lossless uncompressed file format. Yet, applying the same algorithm on different image
extensions possibly results in a different output especially with algorithms that are highly
dependent on abstracted data. Nevertheless, there are some well-known images which are
considered to be standard dataset that are used for testing different image processing
algorithms. These images, such as Lena, Baboon, Cameraman, Peppers, Living room,
House etc. These images have good textures that qualified them for testing various image
processing operations. In our work, we have depended on some of these images to allow
comparing our results with other related works.
2.1.2 More about Edges
Edges are mainly caused when sharp change of intensity takes place. This
instance changing in intensity is resulted from either geometric or non-geometric events
[17], Edges are mainly caused by the discontinuities in intensities which can be either as
step or Ridge (line) discontinuities [19], As it is explained in the book Digital Image
20


Processing, 3rd. ed\ the Step intensity refers to the instance changes caused by intensity
discontinuities as seen in Figure 2.1.a (happens instantly). On the other hand, we have
Ridge (line) discontinuities when the change of intensity occurs over short distance then
returns to the original value, as seen in Figure 2.1.b. Also it is possible to have the
changing of intensity on finite distance instead of sharp changing, which we call Ramp
and Roof discontinuities as presented in the figure below [11, 17, 19],
1
a. Step Discontinuties b. Ridge Discontinuties
1
c. Ramp Discontinuties d. Roof discontinuties
Figure 2.1: Edge in images.
2.1.3 Noise
The majority of digital images are inflicted by varying degree of noise caused by
transmission channels and camera sensors [22, 26, 44], Noises are known to be artifacts
caused by the capturing device or the transmission media [7], There are various types of
noise, namely, impulse, Gaussian, Poisson, Speckle, etc. Due to the fact most images are
corrupted with different type of noise, denoising (smoothing) has been known to be one
of the most common operations within the image processing fields and its application
[26], Impulse noise is a noise model where pixels are replaced by either maximum or
minimum of the range of intensity values allowed. For instance when we have an 8-bit
gray-scale then the noise value will be either 0 (black) or 255 (white) [45], A very
common impulse noise is known as Salt and Pepper (S&P). This type of noise as shown
21


in different analytical studies cannot be handled by the original Prewitt algorithm that is
known of its noise sensitivity. Therefore, in this work, our focus is set to enhance the
Prewitts capability of suppressing impulse noise and other noise types. The figure below
exhibits Images which are impacted with different types of noise.
Figure 2.2: Lena and Camera man images corrupted with different types of noise.
From Figure 2.2, we can see there is noise percentage added to most of noise types,
this value reflects the noise density. In order to guarantee the integrity of the noise
density we have used Matlab to add these noises to the original image. The function
imnoise is already provided by Matlab for this reason. This function imnoise accepts
some parameters as the following example shows: J = imnoise (I, type, parameters) in
which I is the input image, type is the type of noise, and the other parameter is the density
of noise the result is saved in J [8], This implementation is one of the accessories to
support our main implementation that is shown in 3.2.2 .Generally, noises can be
classified into two groups additive and multiplicative, additive implies that the noise is
independent from the original data while it is dependent in case of multiplicative [19, 46],
22


Different smoothing techniques are used to suppress various types of noise. Suppression
impulse noise is known to be very challenging. Therefore, in this work we are improving
a smoothing mechanism that is capable of suppression impulse and other types of noise
effectively.
2.2 Common Image Processing Operations
In order to understand the rest of the presented work, it is necessary to cover some of
the main classifications and used operations within image processing field that is related
to our work. Image processing operations are classified to fit within three levels, (low,
intermediate and high levels). Low level implies that operations are applied on the pixel
level while intermediate, which is derived from the low level operation, allows making
some decisions such as object tracing and face recognition. Finally, high level, where
images can be classified and different relationships among them can be drawn [3, 8],
Filters are usually categorized into two main classifications linear and nonlinear
filters. Both operations are commonly used in signal and image processing. Linear
filtering is originally used with Fourier transform in the field of signal processing
working on the frequency domain [8], Yet, in our work we are focusing on spatial
domain which is processing the intensity values of the digital image. This is known as
linear spatial filtering.
2.2.1 Linear and Nonlinear Spatial Filtering
Linear filters are known as neighborhood processing, where the operations are
applied on a predefined neighborhood window. The linear operation starts multiplying
the intensity of each pixel by the corresponding coefficient of the windows matrix then
23


sums the results to obtain the value of the anchor pixel [8], The coefficients are arranged
into matrix calls window, filter, Mask, kernel, filter mask or template and the linear
operation is known as convolution filter, convolution mask, or convolution kernel [8],
There are two main operations related to linear spatial filtering namely correlation and
convolutions. The two operations are almost identical except with convolution the
window w is rotated by 180 degree before applying the earlier mentioned step on the
input image [8, 19], A similar figure to the one below is given in the book Digital Image
Processing UsingMatlab by Gonzales, Woods, and Eddins (3 edition, 2010) to exhibit
the one dimensional correlation and convolution:
Correlation Convolution
Origin w 00100 123 Origin w rotated 180 degree 00100 321
0 010 0 (Starting the alignment) 123 0 010 0 (Starting the alignment) 32 1
000010000 (Padding with Zeros) 123 000010000 (Padding with Zeros) 32 1
000010000 (Position after one shift) 123 000010000 (Position after one shift) 32 1
000010000 (Position after two shifts) 123 000010000 (Position after two shifts) 32 1
000010000 (Position after three shifts) 123 000010000 (Position after three shifts) 32 1
000010000 (Position after four shifts) 123 000010000 (Position after four shifts) 32 1
000010000 (Final Position) 123 000010000 (Final Position) 32 1
Correlation Result: 0032100 Correlation Result: 0012300
Figure 2.3: Illustration of one-dimensional correlation and convolution.
Since we have presented an example of linear operations, it is important to show
what distinguishes linear from a nonlinear one. Naming filters to be either linear or
nonlinear, is based on the type of operations they perform. For instance, linear filter can
be the summation of the product, while a nonlinear filter implies nonlinear operations
24


applied on a set of pixels [8], Thus, letting each anchor point to be the maximum among
all neighbors located in 3x3 window is a nonlinear operation.
2.2.2 Denoising
As will be explained in section 2.3.3, noises are unwanted artifacts caused by the
capturing device or the transmission media. In order to suppress these different types of
noise, various smoothing mechanisms (filters) are currently utilized within the field of
image processing. As it has been presented previously, spatial filters are either linear or
nonlinear. There are many linear smoothing techniques such as Gaussian, averaging
(blurring), disk, laplacian, motion, sobel, etc. These linear filters usually have some pre-
specified window that can be convolved with the corrupted image [8], On the other hand,
there are some nonlinear filters such as order static, min, and median filters in which
nonlinear operations are embedded.
The averaging or what is known as blurring is usually used to suppress different
types of noise. The main idea of this filter is to allow each pixel to be the average of a
specific number of the neighbors surrounding that pixel. Some of edge detection
algorithms use blurring as a smoothing mechanisms works prior to the localization step.
However, blurring the image will have a side effect through destroying many of the real
edges during the process of noise suppression. Looking at the Figure 2.4, we can observe
the result from applying the average filter on images impacted with 10 % of impulse
noise. This filter seems not to be very effective to suppress the impulse noise if our goal
is to avoid destroying the edges.
25


Figure 2.4: Applying averaging filter on both Lena and Camera man (a) Lena
impacted with 10 % salt & pepper noise, (b) Image a smoothed with average
filtering by Matlab. (c) Camera man impacted with 10 % salt & pepper noise, (d)
Image b smoothed with average filtering by Matlab.
2.2.3 Thresholding
The thresholding mechanism has become one of the main cores in various image
processing applications due to its intuitive properties [8], Thresholding can be utilized to
convert to a binary image in which only two intensities are used. However, this operation
acquires a decision to specify which pixels to keep as an object or discard as a
background. So thresholding is employed to make a decision whether to keep or discard a
specific pixel [7], This idea is commonly employed in different applications which
require background subtraction functionality. Electing a suitable threshold value will
26


benefit in the decision making process. For instance, let T be the threshold value, and for
each pixel check the following two conditions:
6(x,y) =
a if ffx.v) > T
bif f(x,y) b(x,y) will have the binary results whose values are either a or b that are dependent on the
optimal thresholding value T. This brings another problem that is how to determine the
optimal thresholding value, which is known not to be an obvious process. For instance,
electing a high thresholding for T will result in edges that are not connected; missing
many real edges. On the other hand more false edges will be detected when thresholding
value is too low [39], For this reason, different techniques are currently utilized to help
selecting the suitable threshold values. A good thresholding is described to be the value
which maximizes the between-class variance [8], Choosing thresholding value can be a
very basic process that is judged visually using trial and error or very advanced
mechanisms such as the adaptive thresholding. One of the methods to choose the
thresholding value is through histogram which we will describe next.
Figure 2.5: Convert to a binary image using thresholding mechanism (a) Baboon
RGB image, (b) Baboon binary Image.
27


In the above figure, a color RGB image (Baboon) is converted to a binary image
using thresholding mechanism and with threshold value T = 120. It is necessary to
mention that choosing a very low or high thresholding value for T can result in either
darker or lighter image. In our work, we have augmented our improved Prewitt edge
detection algorithm with an iterative thresholding mechanism that elects the optimal
threshold dynamically based on the input data.
2.2.4 Histogram
Histogram is defined in [8] to be the intensity of transformation functions that is
based on the information abstracted from the image. It is defied here as digital image with
L possible intensities that are in the range of [0, G] where h(rk) is the discrete function.
To find h(r), for each possible level of intensity k (which is in the range [0,G] ) the
number of pixels in the image which has the same intensity value is counted. In other
words, histogram will gives information about each possible intensity values and the
number of times it appears within that image. The following figure exposes the histogram
of a gray scale 8-bit image. As mentioned above, the number of bits will specify the
range of intensities value which are [0,255] in 8 bit image.
28


Figure 2.6: Histograms plot for Baboon image (a) Gray scale image of Baboon, (b)
Plotting the histogram for the image in (a).
Looking at Figure 2.5 we can observe which intensities have more representatives, in
baboon image, are in the range of (110 and 125). Also, the user can decide which
thresholding value to initially start with would probably maximize the between-class
variance. Then according to the visual judgment, we can either increase or decrease the
thresholding value of the initial guess. The above figure is generated using Matlab 7.10.0
(R2010a) using imhist(I) function. Moreover, some adaptive thresholding methods are
directly dependent on histogram functionality such as Otsu [8], Canny edge detection
which we use as a quality reference uses Otsu thresholding that is based on one
dimensional (ID) histogram.
2.3 Some of the Challenges in the Field of Image Processing
Even though the field of digital image processing is based on mathematical
foundation, yet visual judgment is still a very effective way to decide whether to accept
the result or not. In image processing, it is expected that process will be repeated over and
over up until an acceptable solution is achieved [26], On the other hand, some
29


applications require processing specific portion of the image which should be taken in
consideration in the design stage. Sometimes, image processing becomes time consuming
especially when there are many operations that are applied in sequence on the same data
set. This will become even more complex when the data are very large, as in the case of
biomedical image processing. On the other hand, real time image processing, such as
object detection alarm system, expects rapid image processing. After observing some of
the common challenges in digital image processing field, our focus will set toward the
challenges that are tightly coupled with our presented work. It is obviously noticed that
the increasing in problem complexity can be reflected on the applicability of such
algorithm due to the added time consuming methodology. However, in our work we are
designing and implementing some suggestions that aim to decrease the gap between
being complex and time consuming. When the performance, in terms of execution time,
of an algorithm is essential factor of its applicability then choosing the right design and
implementation will be a real necessity. One of the acceleration methods to eliminate or
shrink the gap between complexity and execution time is parallel image processing that
will be covered later.
2.3.1 Real-time Image Processing
Different applications such as tracking and detection systems require real-time
processing capability to process vast amount of data in timely manner [47], There are
different factors that should be considered working on a real-time image processing
applications such as the size of the image, and the platform which will run on. Real-time
applications, as descried by Hiren Patel in [2], require the ability of processing at least 30
30


frames per second (fps). In general, typical applications require the capability of
processing about 30 to 100 of relatively small frame sized per second. However, 100
(fps) is very challenging to be achieved especially to process complex algorithm. For this
reason, the efficiency of the image processing system is dependent on how well both
hardware and software are [47], Parallel image processing model is very essential
approach that is employed in real-time image processing. Therefore, next we will explain
the main essential models of parallel image processing.
2.3.2 Parallel Image Processing
The field of image processing and computer vision is growing tremendously. The
majority of image processing applications that are based on edge detection functionality
require performing complex operations rapidly [48], An example of such application is
object tracking which is used for security purposes. Nowadays, different applications
require more sophisticated functionalities such as tracking multiple objects
simultaneously [47], However, in order to come up with high quality edge detection
algorithm, more complexity will be added which obviously has become a time
consuming practice. In other words, it is required to meet some challenges and probably
some conflicting goals to get high quality edge detection algorithm that is capable of
performing at a rapid speed. The concept of parallel image processing is adapted by
designers who require implementing fast image processing application. Fully
understating of the hardware architecture is required to design algorithms that fit the
given resources to perform efficiently [47],
31


Von Neumann architecture as it is explained in [4] is divided into four categorizes
according to both instruction and data streams. These categorizes are SISD (single
instruction, single data), SIMD (single instruction, multiple data), MISD (multiple
instruction, single data), MIMD (multiple instruction multiple data) [3], Vector
computers such as multicore GPUs (Graphics Processing Unit) rely on SIMD architecture
where the same operation is applied on multiple data simultaneously. Our focus in this
work is set toward shared multicore MIMD module, each processor is a complete CPU,
probably it has more than one core located on the same chip; this allows processing
different tasks concurrently. Yet, the other way to increase the throughput is to allow
these units to finish a single problem faster, yet adding more challenging to the
implementation. Processors in shared MIMD module communicate through memory
whose cost varies depending on type of memory used such as L2, L3 or main memory.
In order to employ multicore shared memory we need some parallel programming
languages to make it possible implementing parallel algorithms instead of the traditional
sequential ones. An example of these tools is a library which can be plugged in to allow
multithreading implementation for shared memory architecture known as OpenMP [57],
Parallel image processing can be applied in three different ways as data parallel,
task parallel and pipeline parallel which is briefly discussed in [3, 23],The main two
categorizations of parallel image processing are task-level parallelism, based on
pipelining, and fine grained data level parallelism [3, 5], Task-level of can be classified
as fine-grained and coarse-grained thread of parallelism [5], An example of fine grained
is performing a relatively small independent operation on pixels in parallel, while
applying smoothing mechanism on two images in parallel is a coarse-grained parallelism
32


[5]. Corse-grained thread of parallelism is known to be more suitable for MIMD or
SPMD environment in which the image can be divided into sub-images so as to process
individually [49], Yet, the simplest way to detect parallelism in a sequential algorithm is
to look for operations that are independent of each other [4], Specifying the way that
instructions are dependent on each other will show how much parallelism exists in that
algorithm [50], Independent operations are the ones that can be executed concurrently
and possibly out of order but the integrity of results is guaranteed. Data parallel and task
parallel, parallel image processings module, employed in our work and will be explained
next.
Data partitioning means that the data of the image are divided among the
computational units (execution core), while task parallel implies that different tasks are
assigned to different computational units. In order to use parallel tasks, these tasks should
be as independent as possible to decrease the communication overhead. Finally, the idea
of pipeline parallel is to combine between data and tasks parallels. Yet, different images
are processed simultaneously [3, 49, 51],
2.3.3 Process High Resolution Images
Biomedical image processing and other fields usually do not only require
processing huge number of images but also each of the images can be extremely large
(high resolution). For instance, such images can be resulted from super-resolution
microscopy and high resolution X-ray fluorescence microscopy where very details and
big pictures are captured. Processing very large images has been known to be very
33


challenging and time consuming especially when the process is repeated several times
until it becomes acceptable.
Processing high resolution images imply huge amount of data whose processing might
require frequent memory allocation and de-allocation [1], Hence, frequently allocating/de-
allocating memory adds more complexity and might conflict the goal of implementing fast
application (requires running rapidly). Also, in biomedical image processing it is common to
have a relatively small region of interest which is required to be processed individually
instead of processing the whole image. In such a case, processing the whole image while only
ROI is required to be processed will add useless computation that becomes time consuming
as well. As we will see in this work, various biomedical image sizes are tested; they grow to
be more than 1 GB for a single image. Hence, to implement image processing applications it
is required to have some tools.
2.4 Common Programming Languages Utilized for Image Processing and Computer
Vision Fields
In this section we will introduce some of the main programming tools employed to
implement image processing algorithms. We agreed earlier that efficiency of an image
processing system is highly dependent on how compatible both hardware and software
are. Yet, relaying on easier programming languages wont necessarily guarantee more
efficient implementations. Thus, in this work we will show how more complex
programming languages, such as C/C++ with OpenCV, will provide more efficient
applications than more convenient applications ( such as the one implemented with
Matlab).
34


2.4.1 Matlab
Matlab, stands for matrix laboratory, is a numerical computing environment that
combines between the environment of application development and the scientific
programming languages [52], Matlab can be employed for different fields such as
mathematics, digital signal and image processing (to name a few) where matrix is the
basic element [8], This high programming language has gotten more popularity due to its
intuitive environment in addition to the high-level abstracted syntax which increases
usability by non-programmers [52], On the other hand, different researches have been
conducted to show how suitable Matlab is for real-time image processing operations. Yet,
many concluded that Matlab is not very efficient environment for such operations [53,
54], Also, biomedical image processing is currently a very hot topic in which not only
huge amount of computations is required but also frequent repetition of the whole process
might be needed. However, having more efficient programming language will result in
more applicable applications can be employed for complex processing operations as will
be shown in Chapter 4. Accordingly, we aim to utilize alternative programming
languages to come up with more efficient implementations. Comparing the
performances efficiency of different implementations is based on the execution time
required to process the same data set on the same platform.
2.4.2 OpenCV
OpenCV is open source computer vision libraries written in C and C++ to run on
different platforms such as Linux, Windows, and Max OS X [7, 15], This Open computer
vision library supports hundreds of optimized image processing algorithms mainly
designed for real time applications contributing to the field of computer vision
35


specifically [7, 43], These libraries lend themselves well to parallel processing [7, 54],
OpenCV can use automatic parallelization by plugin Intels Integrated Performance
Primitives (IPP) libraries to run on multicore Intel platform [55], Nowadays, OpenCV
library contains more than 2500 optimized algorithm which can be used by professional
developer, graduate students and researchers who are interested in image processing and
the computer vision fields [43], OpenCV offers some tools utilized to solve computer
vision problem, in addition to the high-level functionalities sufficient to solve complex
computer vision problems [7], Work on this library (OpenCV) started in 1999 by Intel
team yet only little progress had been achieved between 2002 and 2008. In addition, by
2009 the major developing, including an improvement of the C++ interfaces, took place
[43], NVIDIA joined the development process recently and eventually OpenCV is highly
employed with robotics and real-time applications. Gray Bradski and Adrian Kabler
started this work and they are leading the developing team. Bradski and Kabler wrote a
book entitled Learning OpenCV which we use as a guide through our work to help in the
developing process. Despite the fact that OpenCV is not as well-known as other image
processing programming languages, such as Matlab, many well-known projects are
supported by OpenCV. Well-known projects such as Google Maps, Google Street view,
and Google Earth (to name a few) are mainly based on OpenCV [7], Likewise, OpenCV
is employed in many different security systems, image processing, machine learning,
video searching, and robotics applications.
As described by Gray Bradski and Adrian Kabler in Learning OpenCV book, this
library is basically structured into four components. First, CV component contains basic
and advanced higher-level, computer vision algorithms. Second, ML (machine learning)
36


library includes different statistical classifiers and clustering tools. HighGUI, the third
component, contains I/O kernels used for loading/saving videos and images. Finally,
CXCore component has the basic data structured and content [7],
Our implementation is based on the combination of two open source libraries that
are OpenMP and OpenCV. OpenMP is mainly designed to allow implementing
applications run on for multicore shared memory platforms [56], Moreover, OpenCV can
provide a solid support for image processing operations [7], Therefore, employing both
open source libraries (OpenMP and OpenCV) can allow implementing efficient parallel
image processing operations which we will prove through this work.
37


3. Experimental Setup
In this work, we have implemented sequential and parallel applications each of which
is dedicated for either Windows or Linux operating system. Our implementations are
based on C/C++ programming language. On the other hand, the competitors (quality
references implementations) are already implemented in Matlab 7.10.0. Thus, the goal is
to measure the performances efficiency (in term of execution time) of our
implementations. Within the sequential design, open computer vision library (OpenCV)
is employed, while additional library (OpenMP) is added to the parallel version.
The platforms which have been employed to test our algorithm are: Intel I Core i7-
3720QM CPU @ 2.60 GHz, 16 GB RAM, 64-bit Windows 7 LI D :4 x 32 Kbytes 8-way
set associative, LI I :4 x 32 Kbytes 8-way set associative L2 : 4 x 256 Kbytes 8-way set
associative, L3 : 6 Mbytes 12-way set associative. More details regarding the
architectures of the platforms used to run our parallel application will be covered next.
Moreover, some of supporting programs (accessories applications), we have
implemented, as cooperation tools of our main implementations will be explained.
3.1 Main Multicore Shared Memory Architectures Employed to Run the
Experiments
In a shared memory form, we have multiple processors and memories. Processors are
connected through interconnection network which allows any processor to access any
memory. Yet, with distributed memory form, there are multiple processors each has its
dedicated memory and the data are transmitted between processors [4, 57], However, our
main focus in this thesis is on shared memory multicore architecture. The idea behind
38


multicore processor is to place more than one processor on the same chip instead of
having single core only. These cores can cooperate to have more work done concurrently.
Therefore, in this part, the main characteristics of the clusters architecture used to run
our experiments are briefly explained. This allows better understanding of the behavior of
our implantations.
3.1.1 Opteron Module
Opteron module as presented in [58] is designed by AMD to increase the
throughput through employing relatively large first and second instruction level caches.
Having large caches consequently decreases the memory latency which enhances the
overall performance of the algorithm. Addressing within an Opteron module can be either
with 32 or 64 bit. AMD adds more register to the Opteron which, according to some
benchmarks, enhances the performance for majority of applications. Three instructions
each of size 64 bit can be fetched and decoded at each cycle. Also all instructions used
with Opteron are fixed in length unlike the previous version AMD Athlon were
instructions length varied. There are two schedulers, one for integers and the other for
floating point. The pipelines contain 12 stages for integer and 17 stages for floating point,
7 of them are needed for the process of fetching and decoding. A unique strategy is used
where both LI and L2 are accessed on each request in parallel. Yet, the request on L2 is
discarded when there is a cache hit at LI, while in case of miss the time required to fetch
data from L2 is saved. Another improvement of Opteron is the enhanced branch
prediction unit which can choose between static prediction and history table. Good
prediction unit means less number of miss predictions which each can cost 11 cycles.
39


Our implementations are tested on different platforms multicore shared memory
architectures and one of them is a 12-core computed node. Each core is of type AMD
Opteron with the following specifications. Processor @ 2.2 Ghz of AMD (Opteron) type,
6x64 KB LI instruction cache per processor, 6x64 KB LI data cache per processor,
6x512 KB L2 per processor, 6MB L3 per processor, 24 GB RAM, 64-bit Linux version
2.6.18.
3.1.2 Bulldozer Module
Bulldozer module as briefly explained in [59] is mainly designed by AMD to
improve both throughput and power efficiency. Each Bulldozer module has two cores
where some resources are either shared or private for each of the two cores. The idea of
sharing some of the resources, such as the Floating Point Unit (FPU), comes in handy to
improve the power efficiency. On the other hand, there are other resources that are
dedicated for each of the execution unit such as the integer execution core. There are
many other techniques used to improve the throughput of this module such as the
prefetching for both instruction and data as well as the prediction and dynamic
scheduling which both are parts of the operations life cycle. Four instructions can be
fetched and decoded per one cycle. The instruction life starts by branch prediction then
each of the fetched instruction is decoded in one of the four individual decoders. The
prediction, fetching and decoding units are all shared between the two execution cores
within each of the Bulldozer units. By the time instructions are decoded, they are
dispatched among the execution cores. In addition, if some of the instructions require a
floating point operation then it will be sent to the FPU shared by the two execution cores.
40


Moreover, to solve some dependency issues each of the execution cores is provided with
what is known as instruction retirement where instructions are scheduled fully out of
order. Each of the execution cores has four execution pipelines, two ALUs, and two
Address Generation units AGen each has some increment and decrement operations.
Moreover, each of the cores is capable of serving two loads per cycle which can be as a
total of 128 bit. On the other hand, FPU is able to perform two 128-bit per load.
Nevertheless, AMD has provided a different memory hierarchy to support the power
efficiency. Each of the Bulldozer module has one 64 Kbyte LI instruction shared
between both cores. The translation lookaside buffer (TLB) for LI instruction 24 entry
table is fully associative, while each of the two cores, has its own 16 Kbyte LI data cache
and each has TLB of 32 entry table fully associative. On the other hand, the L2 data are
shared among the two execution cores has each 1024-entry which are 8 ways associative.
One of the nodes used to run our implementation has 64 cores of type Bulldozer; it
also has the following specifications. 64-core compute node (shared memory multi-
processor, processor @ 2.2 Ghz of AMD (Bulldozer) type, 1x64 KB LI instruction cache
per module contains two execution cores, 2x16 KB LI data cache per module, 1x2 MB
L2 per module, 16 MB L3 shared by four modules (located on the same chip), 128 GB
RAM, 64-bit Linux version 2.6.18.
3.2 Main Accessories Implementations
Some accessories implementations have been implemented in the present study to
support the main experiments. The main accessories, shown next, are implemented with
either Matlab or with C/C++ and OpenCV library.
41


3.2.1 Changing to Gray Scale Channel & Converting between Extensions
This multifunctional application, implemented by us, is designed to support some
common practices that are needed within edge detection applications. With this
application we can not only change to gray scale image but also split the image into main
originator channels and save them if needed. Furthermore, images can come with
different extensions, as explained in 2.2.1, Therefore, this application can be employed to
convert between extensions easily. Nevertheless, this application is based on C/C++ and
OpenCV as shown bellow.
/* Change to Gray & Converting Between Extensions
implemented with C/C++ and OpenCV */
Img := imRead(argv[l]); /* Read Image */
Extenstionl := argv[2]; /* Get Input */
ToGray := argv[3];
To Split := argv[4];
/*Change To Gray*/
If (ToGray)
{
ImgGray := cvLoadImage(argv[l],CV_LOAD_IMAGE_GRAYSCALE);
cvSaveImage(GrayChannel.Extenstionl, ImgGray);
}
endif
If (To Split)
{ /* Split RGB image into main channels */
SplitColorlmage (ImgRed, ImgGreenJmgBlue);
cvSaveImage(RedChannel.Extenstionl, ImgRed);
cvSaveImage(GreenChannel.Extenstionl, ImgGreen);
cvSaveImage(BlueChannel.Extenstionl, ImgBlue);
}
endif
cvSaveImage(ExtensionChanged.Extenstionl,img); /* change extension */
Figure 3.1: Pseudocode of changing to gray and converting between image
extensions.
42


3.2.2 Adding Noise
As we have explained in 2.2.3 and will show later in 4.1.3, images can be
corrupted with different types of noise that might require different noise suppression
techniques. However, it is very important to work on accurate data set. Yet, accuracy
might not be guaranteed when the dataset is downloaded especially if we are looking for
detailed information such as noise percentage. Hence we have implemented an
application using Matlab that can be employed to add different types of noise as shown
next.
/* Add Noise Implemented with Matlab */
Img := imread (argv[l]); /* Read Image */
/* add 30 % Salt and Pepper noise*/
ImgSalt&Pepper30 := imnoise (Img, 'salt & pepper', 0.3);
/*Save Noisy Image*/
imwrite(ImgSalt&Pepper 10,'Salt&Pepper3 0 .tif ,'tif);
/* add 10 % Gaussian noise*/
Gaussian20 := imnoise (Img, gaussian',0.2);
/*Save Noisy Image*/
imwrite(Gaussian20,Gaussian20.tif ,'tif);
/* add 10 % Gaussian noise*/
Speckle 10 := imnoise (hug, 'speckle',0.1);
/*Save Noisy Image*/
imwrite(ImgSalt&PepperlO, SpecklelQ.tif,'tif);________
Figure 3.2: Pseudocode of adding different types of noise using Matlabs.
3.2.3 Peak Signal-To-Noise Ratio (PSNR)
As we have explained in 1.3.3 Peak Signal-To-Noise Ratio (PSNR) is one of the
most common objective quality estimator techniques. This technique calculates the mean-
squared error (MSE) between the processed and the original image. Thus, we have
implemented this technique using Matlab to use in measuring the quality. Similar
43


procedure is given in Matlab Help. PSNR is implemented as a function which takes two
arguments as shown bellow.
PSNR (Imgl, Img2) = 10 log,,, MSEJmglJmg2)
MSE (Imgl, Img2) - S" 12%i(lmgl (i,J) Img2 (i,j)f
/* PSNR Pseudocode Implemented with Matlab */
Img 1 : = imread (argv [ 1 ]);
Img2 := imread (argv [2]);
function PSNR = PSNR(Imgl,Img2)
/* print an error message if Image 1 is the same as Image 2*/
if (A == B)
{print('Images are identical: PSNR has infinite value');}
end
/* find the Maximum value in A */
maxValue := double (max(A(:)));
/* find MSE */
mselmage := (double(A) double(B)) ,A 2;
/* specify the rows and the columns of image A */
[rows columns] := size(A);
/* apply the equation shown above */
mse := sum(mselmage(:)) / (rows columns);
PSNR = 10 loglO( 256A2 / mse);
/* Print the dB value, 5 digits only */
print(('PSNR = +%5.5f dB',PSNR));
Figure 3.3: PSNR Pseudocode.
44


4. Research Methodology
This present research is conducted to come up with a new design and implementation
of the eight direction Prewitt edge detection algorithm [1], The original Prewitt algorithm
is known of its noise sensitivity and lacking both smoothing and thresholding
mechanisms. Our long term goal is to come up with an efficient design and
implementation that is mainly proposed for multicore shared memory architectures. This
chapter contains three main parts which are simulation and evaluation of Prewitt edge
detection, our improved design of the Prewitt algorithm, and finally the implementation
of our improved design [1],
Simulating and evaluating of the Prewitt edge detection algorithm will highlight some
of the weaknesses taken place in some common scenarios. This assists in better
understanding of the main pro and cons of the algorithm. Thus, visualizing these main
weaknesses of Prewitt helps drawing better picture that in turn describes how significant
these problems are. Therefore, solving such significant problems is valuable
achievements which can push forward the detection filed. In the second part, our design
is mainly chosen to overcome some of the problems found in part 4.1. Finally, in the third
part, the implementation of our augmented design (4.2.) is covered. These three parts
demonstrate the main target problems that we will overcome and solve through this work.
Augmenting the sub-solutions, as we will show, leads to an efficient high quality Prewitt
edge detection algorithm which performs rapidly in different scenarios.
45


4.1 Simulation & Evaluation of the Eight Direction Prewitt
In this part we will simulate and evaluate the eight direction Prewitt edge detection
algorithm to highlight the main pros and cons of this algorithm. This stage will assist with
the process of coming with an improved version of the algorithm that is based on solid
foundations. In the first part of this section, the work is set to portray the main
weaknesses that different Prewitt versions suffer from. This should clarify the
disqualifications of Prewitt that we will attempt to overcome through our improved
approach of the algorithm.
4.1.1 Bidirectional Prewitt vs. Eight Directions Prewitt
Original bidirectional Prewitt algorithm, as mentioned earlier, detects edges in
two directions within 90 and 0 degrees. The idea of adding six more directions to the
original bidirectional Prewitt is first proposed in 2008 by Ci and Chen [17], However,
before augmenting this idea with our improved design of the algorithm, the goal was to
evaluate that improvement proposed in [17], For the sake of evaluation, we implement
the eight direction Prewitt edge detection (shown in Figure 4.1) to simulate and evaluate
the improvement of adding additional directions to the original algorithm. The
importance of our evaluation is not only significant to visualize the quality of the
improved Prewitt but also to judge the additional complexity added.
As shown in both the flow chart and the Pseudocode (4.1 and 4.2 respectively),
the eight direction Prewitt edge detection convolves the image with eight different
kernels (windows) in order to allow edges detection in all directions. The result of
46


convolving the image with the eight kernels is eight candidates. The maximum of all
these eight candidates is chosen to represent that pixel within its final destination.
In order to simulate the difference of detecting edges in different angles, figure
4.3 includes samples of these results. It is important to restate the criteria for optima edge
detection. To have a good edge detection algorithm, mainly two requirements should be
satisfied. First, is minimizing the probability of both missing real edges and detecting
false edges. Second, a good edge detection algorithm should provide good localization
which implies that detected edges should be as close as possible to the true edges.
Figure 4.1: Flow chart of Eight Direction Prewitt edge detection algorithm.
47


The Pseudocode of the eight directions Prewitt is as follows:
/* Where f(x,y) is the input image, kl(x,y)and k2(x,y)are the two mask windows in direction
0 and 90 respectively and g(x,y) is the output image */
I[ imgHcight, imgWidth] := imRead(image); /* imgHeight: images height, imgWidth: images
width. */
maskHeight := 3; /* maskHeight: masks height */
maskWidth := 3; /* maskWidth: masks width */
kl[maskHeight,maskWidth] := [1,0,-1 ; 1,0,-1 ; 1,0,-1]; /* 0 */
k2[maskHeight,maskWidth] := [0,-l,-l ; 1,0,-1 ; 1,1,0];/* 45*/
k3[maskHeight,maskWidth] := [-1,-1,-1 ; 0,0,0 ; 1,1,1]; /* 90 */
k4[maskHeight,maskWidth] := [-1,-1,0 ; -1,0,1 ; 0,1,1];/* 135*/
k5[maskHeight,maskWidth] := [-1,0,1 ; -1,0,1 ; -1,0,1]; /* 180*/
k6[maskHeight,maskWidth] := [0,1,1 ; -1,0,1 ; -1,-1,0]; /* 225*/
k7[maskHeight,maskWidth] := [1,1,1 ; 0,0,0 ; -1,-1,-1]; /* 270*/
k8[maskHeight,maskWidth] := [1,1,0 ; 1,0,-1 ; 0,-l,-l]; /* 315*/
h := (maskHeight-1)/2 ;
w := (maskWidth-1)/2 ;
for y := 0 until imgHeight do
for x := 0 until imgWidth do
{ Temp:= 0 ; /* initialization */
suml := 0 ; sum2 := 0 ; sum3 := 0 ; sum4 := 0;
sum5 := 0 ; sum6 := 0 ; sum7 := 0 ; sum8 := 0;
for i := -h until h do
for j := -w until w do
{
suml + := kl(j,i)*I(x-j,y-i); /*convolution in the first direct*/
sum2 + := k2(j,i)*I(x-j,y-i); /* convolution in the second direct*/
sum3 + := k3(j,i)*I(x-j,y-i); /*convolution in the third direct*/
sum4 + := k4(j,i)*I(x-j,y-i); /* convolution in the fourth direct*/
sum5 + := k5(j,i)*I(x-j,y-i); ^convolution in the fifth direct*/
sum6 + := k6(j,i)*I(x-j,y-i); /* convolution in the sixth direct*/
sum7 + := k7(j,i)*I(x-j,y-i); /*convolution in the seventh direct*/
sum8 + := k8(j,i)*I(x-j,y-i); /* convolution in the eighth direct*/
/* select the max intensity */
Temp := Max (suml,sum2, sum3,sum4,sum5,sum6,sum7,sum8);
}
g(x,y) := Temp; /* result image */
}__________________________________________________________________________________
Figure 4.2: Eight Direction Prewitt edge detections Pseudocode
48


Figure 4.3: Results of applying different directions of the Prewitt algorithm (a)
House original image, (b) Prewitt detecting in both 90 and 180 degrees, (c) Prewitt
detecting in both 125 and 225 degrees, (d) Prewitt detecting in both 270 and 0
degrees, (e) Prewitt detecting in both 315 and 45 degrees, (f) Eight Direction Prewitt.
By looking at Figure 4.3, we can clearly see the difference of applying various
kernels on the same image. Obviously, it is very noticeable that some real edges are
missed when only two directions kernels are applied. Yet, the result of eight direction
Prewitt indeed leads to a good edge detection algorithm that has better detection and
localization than all the other presented cases. The same algorithm has been applied on
many images to guarantee its generality. Figure 4.4 clearly emphasizes difference
between the original bidirectional Prewitt and the improved version of the algorithm.
49


Figure 4.4: Original bidirectional Prewitt vs. Eight Direction Prewitt (a) Lena
original image, (b) Original-bidirectional Prewitt applied on Lean, (c) Eight
direction Prewitt applied on Lena.
On the other hand, convolving the image with six more kernels implies additional
operations that are added to the original bidirectional Prewitt. The number of operations,
of applying Prewitt with eight direction on an image of size N x N as it is calculated in
[60], will be as follows. Convolving the image with eight kernels requires 64 N addition
and 8 N multiplication operations, while the cost of convolution used in bidirectional
Prewitt only 8 N addition and 2 N multiplication operations is needed. Having more
number of operations will negatively affect the speed processing. Through this work, we
will provide some solutions to handle the additional complexity added to the original
algorithm which is based on parallel image processing.
4.1.2 Processing Different Color Spaces can Impact the Quality
Color images, as mentioned earlier, are the combination between individual
monochrome images. For instance, RGB image is an example of color image in which
the image is constructed from three individual channels (R) red, (G) green and (B) blue.
Most edge detection algorithms are applied on gray scale instead of the individual
50


channels. There are many well-known techniques that are currently employed to convert
between color spaces. Gray scale image is a monochrome channel that represents the
average of all channels. Data reduction is the main reason behind choosing the gray scale
image instead of the all individual channels [8], Many analytical studies have been
conducted to highlight the difference between gray and color image processing [8, 12, 41,
61], Novak and Shafer in [61] have claimed that 90 percent of the edges are about to be
the same between the gray and color images. However, there still 10 percent that will not
be found in the gray scale image which can be very essential in edge detection
application that relies on very high quality results [12], For this reason, in this part we
want to visualize the difference among applying the eight direction algorithm on all
channels (red, green, blue) and the gray scale one. Showing a difference means a
possibility of getting a better quality algorithm. In order to demonstrate and visualize the
difference in quality, we started by splitting the color image into its originator
monochrome channels in addition to the gray scale one. Then we processed each channel
individually with our implemented eight direction Prewitt algorithm. The results of
simulations, applying the same algorithm on different channels of the Jellyfish color
image are presented below (Figure 4.5).
51


Figure 4.5: Processing different channels with eight direction Prewitt (a) Processing
red channel, (b) Processing green channel, (c) Jellyfish original color image, (d)
Processing blue channel, (e) Processing gray channel.
By looking at Figure 4.5, it is obvious to visualize the difference of applying the
same eight direction Prewitt algorithm on different channels. After we have agreed that
processing different channels can lead to different results, it is clear to notice that many
real edges are missed from processing the green channel (b). On the other hand, the edges
in (d) looks very clear compared to all the others. Yet, the result of processing the gray
channel is still acceptable. Nevertheless, there is no true and general answer to decide
which channel will always provide better quality working on different datasets. Therefore
many algorithms have been designed to combine between all channels to produce better
quality edge detection. Again, improving the quality of an image processing algorithm is
52


not free and usually conflicts with the goal of having fast application that performs at a
rapid speed. For instance, as shown in section 4.1.1, the eight direction Prewitt requires
8x addition and 2x multiplication operations than the original Prewitt. These estimations
are from applying the algorithm on gray scale channel only (1 channel). Obviously to
process all three channels, we will require 24x more addition operations and 8x more
multiplication operations than the original bidirectional algorithm. Obviously such
algorithm will not be able to perform rapidly anymore compared to the original Prewitt
algorithm. In our design and implementation we are suggesting some solutions to
decrease the impact of the additional complexity added to the algorithm.
4.1.3 Eight Directions Prewitt Working on Noisy Images
Prewitt edge detection algorithm is known of its noise sensitivity. The implicit
smoothing technique used in [23] calculates the gradient magnitude of the eight
directions as a basis to find the final magnitude to reduce the effect of noise. Our
experiments show that the gradient based on implicit smoothing approach will not be
very effective when the image is impacted with high noise percentage, while it fails to
suppress impulse noise. In this section, we will compare between applying eight direction
edge detection on image impacted with impulse noise. Then we will compare the result
with the same algorithm applied on image without noise. This will support our evaluation
of the eight direction algorithm by visualizing Prewitt noise sensitivity to exhibit how
major this problem is.
53


Figure 4.6: Demonstrate Prewitts sensitivity working on image impacted with
impulse noise (a) Original Living Room image, (b) Eight direction Prewitt applied
on the image in a. (c) Living Room image impacted with 3 % Salt and Pepper noise,
(d) Eight direction Prewitt applied on the image in (d).
By examining Figure 4.6, we can clearly recognize the big difference between the
results of processing noisy and noiseless images using the same eight direction Prewitt.
The image in b indicates good detected results while many false edges, caused by the
noise, are shown in d. Also, the image in c is impacted with very small percentage (3%)
while in practical case images can easily be corrupted with more noise percentage. This
54


clearly determines the dilemma of high noise sensitivity which has been a very serious
problem for Prewitt edge detection. Therefore, we will represent this figure again in 5.1.2
section to demonstrate the solution achieved through our augmenting eight direction
Prewitt with better noise suppression mechanism.
4.2 Design
After describing the main pros and cons of Prewitt edge detection algorithm, in
this section we will cover the design of our proposed augmented algorithm. The strategy
we have followed in our design is to solve each of the problems mentioned in 4.1
individually. Then by augmenting all solutions we will come up with an improved design
of the Prewitt algorithm as we will demonstrate through this work. In this section our
plan is to group problems into sub groups as shown next. First, in order to overcome the
problem of noise sensitivity, we will select an appropriate smoothing mechanism to apply
as a prefix step to the localization of the eight direction Prewitt. Second, our choice is to
employ the eight direction Prewitt instead of the bidirectional for its better quality as
shown in 4.1.1. Third, we will augment the improved algorithm with iterative
thresholding functionality, used when needed, and works dynamically. Finally, we will
deal with all the complexities added to the algorithm in its extremist case scenario. This
worst case scenario includes suppression noise of level 5 (highest possible noise
percentage can be up to 60 % of impulse noise), detecting in eight directions and
applying iterative thresholding. Enhancing the performance of the algorithm running its
worst case scenario will not only guarantee a certain level of performance, will not go
below, but also provide faster performing on general cases that are less complex.
55


4.2.1 Selecting an Appropriate Filter to Use as Prefix Step to Eight-direction Prewitt
The stand-alone Prewitt algorithm does not incorporate any mechanisms to deal
with noisy images. The Canny edge detection algorithm uses Gaussian filter or Blurring
as a prefix smoothing mechanism [8, 17] to address the noise issue. Blurring can destroy
some real edges during the process of noise suppression. Bilateral is known to be a better
choice because it has a less negative impact on real edges than Blurring. In our work, we
propose to use an explicit smoothing technique that can be applied on noisy images as a
prefix step to the eight direction Prewitt edge detection algorithm. To select a suitable
smoothing method, we conducted several experiments using Blurring, Median Blurring,
Gaussian, Bilateral, Median filtering and an improved version of Median filter. Our
improved Median filter, added as a prior step to the localization in the eight direction
Prewitt algorithm, shows the best results for suppressing impulse noises such as Salt and
Pepper and at the same time is able to suppress other types of noises such as Gaussian
and Poisson. Median filter is a nonlinear filter used to remove impulse noise with the
least negative impact on the real edges [8, 26, 28], This filter starts arranging all the
pixels within the range of the specified window in ascending order to choose the median
value. This approach helps keep some real values unchanged as opposed to the blurring
technique that takes the average of all pixels within that neighborhood. The main idea
behind our improved Median filter is to allow variable window size instead of the fixed
one employed in the standard Median [26], Larger window sizes can be used effectively
in processing images with higher noise percentage relying on the same mechanism of
Median filter [1], Original Median filter has a fixed window size, which does not
differentiate whether the image is noiseless or corrupted with very high noise percentage.
56


In fact, usually standard Medina filter uses a window of size 3x3, while using larger
window size requires more number of operations as will explain bellow.
Applying Medina filter with window of size 3x3 will require (for each pixel)
sorting 9 intensities. In other words, for an image of size MxN we will need 9 MN
operations to apply Medina filter. On the other hand, applying Medina filter of window
size 9x9 will require (for each pixel) sorting 81 intensities that is about 9 times more than
the standard 3x3 Median filter. Choosing unsuitable window size can lead to two
different conflicted results which are either adding useless computation or employing less
affective smoothing mechanism. To avoid both cases, our idea is to combine between the
noise level (the image corrupted with) and the size of the window used with the Median
filter by having flexibility in the size of window employed. In the current version, we
provide the users full control to judge the intensity of the noise in order to determine the
desired window size. We recommend starting with a window of size 3x3, considered to
be the standard case (level 1). If the resulting output reflects detection of false edges,
users can increase the size of the window to 5x5 (level 2), 7x7 (level 3), 9x9 (level 4), or
double 9x9 (level 5) respectively. Our results demonstrate clear detection of edges in the
presence of noise when smoothing is enabled. The experimental results are presented in
the results section (Chapter 5, Table 1 and Figure 5.3). We have measured the quality of
our improved Median filter (fixed window size) and compared it with the standard
Medina filter (window size 3x3) working on different images impacted with different
type of noise.
57


4.2.2 Dynamic Iterative Thresholding
Thresholding is desirable in many applications; we have added this functionality
to the algorithm to be used when needed. There are many thresholding techniques
available ranging from visual judgment using trial and error to reliance on global or local
methods [7, 8, 39], In this work, we have chosen to add the basic global thresholding
method due to its computational simplicity, acceptable quality and applicability to a
variety of applications [8], The technique works iteratively to find the thresholding value
as described in Figure 4.7.
Step 1: Start with initial guess for the thresholding value T.
/* For faster divergence, choose initial T to be the average of all intensities of the
assigned work [1] */
Step 2: For each pixel, marked as group 1 (Gi) or group 2 (G2):
a. if img[i,j] > T, img[i,j] s Gi
b. else img[ij] e G2
Step 3: For each group, find the average of intensities Avi and Av2 respectively.
Step 4 Find the new threshold value. Tug^ (Av 1 Av2 j/2.
Step 5: Stop if | Tnew T| < tolerant value; Otherwise T = Tnew and repeat the
process from step 2.
Figure 4.7: The procedure of the Basic Global Thresholding.
The procedure as it is explained by Gonzalez, Woods and Eddins, the basic global
thresholding starts by assuming an initial guess of the threshold value T. In order to have
a good guess, and to offer faster convergence, the initial T should be the average of all
intensities in the original image (img). Then, the pixels of the original image are divided
into two groups Gj and G2 respectively. Gj will contain all the intensities which are larger
than T, while the other intensities will be part of G2. In the next step and in order to
calculate the new threshold value Tnew, we will find the average of the two averages of
intensities within each group. Let Avj be the average of G/ and Avj be average of G2. The
58


process will stop if the absolute difference between the old threshold (7) and the new
threshold (Tnew) is less than or equal the tolerant value. Otherwise, let T = Tnew and then
start over from the second step. We have implemented this algorithm and added it to our
improved design of Prewitt edge detection algorithm to sue it when needed. The
implementation section exhibits how dividing the image into sub images improves the
result of this thresholding. The reason is that when the basic global thresholding
mechanism is applied on different sub images then it will be possible to get more than
one optimal thresholding than can be used to segment the image more effectively. The
results of this iterative thresholding is shown in the results section 5.3 column 6.
4.3 The Implementation of the Improved Eight Direction Prewitt Edge Detection
As we have gone through the main parts of our improved augmented design of the
algorithm, in this section we will cover the implementation of the proposed design. The
main goal is to implement efficient sequential and parallel versions of the algorithm that
performs well running on different platforms. In order to provide a fair comparison of the
efficiency of our algorithms and implementation choices, we compare the sequential
execution times of our proposed method with the existing sequential implementations of
bidirectional Prewitt and Canny algorithms with Matlab which are already in-use running
on the same platform [1], Canny, which is the industrial standard, is known of its
complexity, while bidirectional the Prewitt is known of its simplicity. Therefore,
comparing both runs will give a clue about how efficient our implementation is with all
complexities added. Our applications are implemented using C/C++ as the base language
with the open computer vision library (OpenCV) and additional library (OpenMP) is
59


employed with the parallel version. Despite the fact that Matlab is more convenient
programming language (can be used in image processing) than C/C++, we have looked
for solid based language (which is C/C++) that supports our main motivation of
implementing a high performance edge detection algorithm. Also, OpenCV which is
based on C++ is contributed to the computer vision field by supporting hundreds of the
optimized image processing algorithms which lean themselves well for parallel
processing [7], This open source library as it is explained in Chapter 2, is mainly
designed to support real-time application by having efficient implementation of the most
usable image processing operations such as convolution, correlation, reading and writing
images. Our parallel implementation, which is based on the efficient sequential one, will
not only contribute in overcoming the additional complexities added to the algorithm but
also allow performing at a rapid speed which might be utilized by some real-time edge
detection applications. We are offering two implementations for each of the sequential
and the parallel implementation that can work on two operating systems Windows and
Linux. Next, we will explain our sequential implementation that becomes our solid
basement for the parallel version of the algorithm.
4.3.1 Sequential Implementation
As the flow chart and the procedure below show, the main operations that can be
applied on the images are taking places in sequence. First, the images can be smoothed
with the improved median filter. Next, the smoothed (or the input image if smoothed is
not enabled) will be localized with the eight direction Prewitt algorithm. Finally, the
results of the localization can be processed with the basic global thresholding mechanism
60


that works dynamically to return the detected edges of that image. We have given the
user full control of running our application. Hence, we are allowing the applications to be
adjusted easily to satisfy the main needs. In other word, for the sake of data and
operations reduction users can choose to enable or disable the additional functionalities
added to the eight direction Prewitt edge detection algorithm and use them only when
needed. In the results section, we will present both the quality and the execution time of
all the different case scenarios. In the Figures (4.8.a & 4.8.b) bellow the procedure and
the flow chart of our sequential proposed algorithm are presented.
Procedure of the sequential proposed algorithm:
1) Initialization
a) Input Image
b) User specifies the following options:
i) Enabled or disabled: Smoothing technique, Thresholding. [toSmooth],[toThreshold];
ii) Noise level of the image (visual guess): 1 very low 5 very high. [levelOfNoise];
2) If not gray convert the image to gray then do the following:
a) De-noising
i) If smoothing is enabled :
(1) Apply improved median filter with specific window size on the image.
(2) The size of the window depends on the visual guess specified by the user at l.b.v. specified level
in l.b.v.
b) Localization
Eight-direction Prewitt (as described in Sections 1.2.2 figure 1.3 and 4.1.1, Figure 4.2)
c) Thresholding
i) If thresholding is enabled :
(1) Apply the basic iterative global thresholding (as described in Section 4.2.2, Figure 4.7)
d) Return processed image
Figure 4.8.a: Procedure of the sequential proposed algorithm.
61


Figure 4.8.b: The Flow chart of the sequential version of the improved eight
direction Prewitt edge detection application.
62


In order to measure the efficiency (in term of execution time) of our improved version
of the eight direction Prewitt edge detection algorithm, we have compared it with the
Canny and the bidirectional Prewitt which both are already used in Matlab. The results in
which we show the efficiency of our algorithm are presented in the results section in
Table 2. As we have explained in section 4.1.1 it is required to have 8 N multiplication
operations to perform the eight direction Prewitt algorithm. Yet, for the sake of operation
reduction OpenCV allows us to employ lookup table (LUT) approach. This mechanism
can pre-compute all the possible outputs and assign a unique index to each of these
possible configurations. For instance, when we have a 8-bit image and a eight fixed 3x3
o
windows that we convolve the image with, then we will have a maximum of 2 x 9 x 8
that are 18432 possible outputs which looks a bit expensive especially if it is required to
process relatively small images. However, in our case of eight direction Prewitt, as
Figure 1.3 shows, we have only three different values used in all different eight windows
which are -1, 0 and -1. For this reason, the need will be to index only 28 x 3 that are 768
values. This few numbers of values is very little and will come in handy by saving many
numbers of multiplication operations that will be needed if LUT was not in use. To
clarify this idea we will calculate the number of multiplications required to localize 8-bit
256 x 256 gray scale image with the eight direction Prewitt. In normal case it is required
to perform 8 x (N x M) multiplications that is 524288 multiplications. However, we will
need to index the multiplication of only 768 values with LUT. Then for each value we
will find the corresponding value in the indexed lookup table and replaced it without
performing actual multiplication that is known to be expensive in term of number of
cycles. The given example was on extremely small image size. Not surprisingly,
63


increasing the dimension of the image indeed will not increase the number of values
needed to be pre-computed, while this is not the case when LUT is not employed as we
will see next. In one of our experiments, we have applied the eight direction Prewitt edge
detection on a biomedical image of size 29649 x 22008 which means the number of
multiplication needed will be about 5 million multiplication operations which is
obviously a time consuming process.
In addition to the efficiency gained from employing powerful programming
language (C/C++) and powerful open computer vision library (OpenCV), data and
operation reduction has been one of the main cores of this design. In particular, data and
operation reduction in our design is categorized into four main parts. First, users are
eligible to enable the additional functionalities (smoothing & thresholding) when needed.
Second visualizing the noise percentage by the user indeed will allow choosing a suitable
window size for our improved filter and cause larger window size which implies the need
of more operations. Third, the lookup table (LUT) approach assists in reducing the
number of multiplication operations needed as explained previously. Finally, to satisfy
both desirable of some biomedical image processing (processing ROI) and for the sake of
data reduction, we are allowing a neat functionality in our design that is processing
region of interest since it will help avoiding useless computation. To illustrate the
implication of our improved algorithm, the interface of our application is presented and
its work is explained.
64


Figure 4.9: Snapshot of our Region of our application that allows processing ROI
(a) Original image impacted with 10 % of impulse noise, (b) Crop sub image (ROI)
to process, (c) Localizing the sub image in b with eight direction Prewitt, (d) Sub
image in b smoothed with the improved median filter, (e) Localizing smoothed
image in d with eight direction Prewitt, (f) Thresholding the localized sub image in e
with the basic global iterative thresholding, (g) Terminal which shows the iterative
steps that is followed to choose the optimal thresholding value.
This application works in real-time in which the user chooses the name of the
image no matter which extension it has. Then users can specify the region of interest by
pressing left click and drag the ROI. The results as shown Figure 4.9, will be five
interfaces. Obviously the sub image will be smoothed, localized and thresholded as
65


shown in the above figure. Our application response immediately without any delay to
process any ROI specified. Pressing right click will clear our previous ROI choice and we
can start again. This application is not only provided to process only specific region of
the image, but also helps the user to choose the best configurations such as noise level,
and to decide whether to use thresholding or not.
Finally by looking at the runtime results in the results section we can observe that
our sequential implementation overperforms both the bidirectional Prewitt and Canny
which are already implemented in Matlab. In other words, our implementation based on
C/C++ is very efficient especially when we see how complex it is compare to the original
bidirectional Prewitt that is already implemented in Matlab. Showing the efficiency of
our sequential implementation is very significant in our study. The main reason is that we
want to show that our improved quality eight direction Prewitt edge detection can be
employed with real-time edge detection application. Our next step of improving Prewitt
edge detection algorithm is to implement a parallel version of our improved Prewitt edge
detection algorithm. This version is mainly designed for shared memory MIMD
multi core platforms.
4.3.2 Parallel Implementation to Accelerate the Process of Detection
As we have mentioned earlier, Prewitt edge detection is known of its simplicity in
terms of number of computation. Adding a smoothing mechanism with variable window
sizes, six additional directions to the localization step and a global thresholding
mechanism adds computational complexity to the original bidirectional Prewitt algorithm
[1], These additional computations affect the overall performance. A parallel version of
66


the proposed algorithm for the new shared memory MIMD multicore platforms is
designed and implemented in order to speed up the computation. Our parallel application
is based on the efficient sequential implementation which overperforms both Canny and
the bidirectional Prewitt edge detection algorithm which have already used in Matlab. To
measure the performance of the parallel algorithm we will compare its performance in
terms of execution time with our sequential implementation whose efficiency is already
proven. The parallel algorithm is designed and implemented using C/C++ as a base
language and two open source libraries OpenMP and OpenCV to overcome complexities
added to the original algorithm. OpenCV library supports hundreds of optimized image
processing algorithms mainly designed for real time applications contributing to the field
of computer vision specifically [7, 43], These libraries lend themselves well to parallel
processing [7, 54],
As we have mentioned in 2.4.2 fully understating the architecture is required to
design high performance application. Moreover, parallel image processing can be applied
in three different ways as data parallel, task parallel and pipeline parallel which is briefly
discussed in [3, 23], Task-level parallelism and fine-grained data level parallelism are the
main two categorizations of parallel image processing. In our design we have tried both
approaches, yet coarse-grained proves, according to different experiments, to better
match the MIMD multicore shared memory platform. With this approach we aim to
process different parts of the image individually and simultaneously. Therefore, we are
required to have data partitioning mechanism in which image is divided among the
computation units. Parallel tasks are not matching the design of our algorithm since main
operations such as smoothing, localization and thresholding are dependent on each other.
67


In other words, thresholding will be applied on the results of the localization that should
be smoothed previously. So, this chain of operations is required to be processed in
sequence in order to provide correct output. On the other hand, dividing the image into
suitable sized sub images will provide independent chunk of data that can be processed in
parallel. Yet, one of the main challenges working on shared memory multiprocessors is
the work distribution [3, 4],
Work, for good distribution, can be divided into rows, columns or blocks as
presented in Figure 4.12. Users are eligible to specify the mechanism of partitioning and
the number of sub partitions (# of sub images). For instance, when we have M x N image,
the sub images are specified as follows. If the number of sub images equals x then we
will have x sub images each of size M/X x N. While partitioning into column will lead to
have x sub images each of size M x N/X. Partitioning into blocks results in x2 sub
images each of size M/X x N/X processing the sub images can be processed in parallel
using multicore processors and then can merge the results into one final image. Using
appropriate mechanisms for data partitioning not only provides an independent chunk of
data that can be processed concurrently but also will add flexibility in tuning the
algorithm. In particular, tuning mechanism allows the algorithm to work more efficiently
on different architectural platforms which have been proven to enhance data locality
which in turn reduces the number of time-consuming cache misses.
Work distribution starts by allowing each of the processors to copy its assigned
private data to its local memory. To gain efficiency, it is desired to decrease the number
of times needed to copy data from slower shared memory to local memory at each
computational unit [1], The algorithm starts dividing the image into a number of equal
68


sized tiles (sub-images). Parallel processes will work independently on different sub-
images using a self-scheduling technique for work distribution. Each processor applies
smoothing, localization, and iterative thresholding before writing the processed data to its
final location as shown in Figure 4.10.
Figure 4.10: Processing life.
As seen in Figure 4.10, the parallelism is applied at the highest possible level in which
all work, in between the time of loading the main image up until merging sub-images, is
done in parallel. The original input image is defined as an object (Region Of Interest,
ROI) in the OpenCV library resulting in the need for some synchronization when the
final result is written back to this object. This is a requirement enforced by OpenCV to
guarantee data integrity, therefore, each processor will write the data to its original region
of interest atomically. It is best to choose self-scheduling instead of pre-scheduling for
good work balance [56], This will prevent the problem of having some processors idle
while the others have excess work due to varying amount work required in each region of
the image. It is important to note that sub-images are completely treated individually and
69


are processed within the iterative thresholding mechanism separately. This strategy,
according to our experiments, can result in better detection where more real edges are
detected since each of the sub-images can have its own optimal thresholding value. Self-
scheduling will overcome the difference in time required by applying the iterative
thresholding mechanism on the assigned data as the number of computations may vary
depending on the data itself. The procedure and the Pseudocode of the parallel
application is shown in below in Figure 4.11. Also, the partitioning methods which can
be employed with our application is exhibited in Figure 4.11. Finally the flow chart of the
complete design of our parallel application is presented in 4.13.
Procedure of the proposed algorithm:
3) Initialization
a) Input Image
b) User specifies the following options:
i) Partitioning Method: rows, columns, or blocks
ii) Number of desired sub images, [numOfSublmage];
iii) Number of processes, [numOlProcl];
iv) Enabled or disabled: Smoothing technique, Thresholding, [toSmooth],[toThreshold];
v) Noise level of the image (visual guess): 1 very low 5 very high. [levelOfNoise];
4) Partitioning data
a) Divide into the sub work as specified by the users chosen mechanism of partitioning.
b) Divide the work among the assigned processors. Processes will self-schedule to obtain the next
available work
5) On each of the sub works (if any) do the following:
a) De-noising
i) If smoothing is enabled :
(1) Apply improved median filter with specific window size on the image.
(2) The size of the window depends on the visual guess specified by the user at l.b.v. specified
level in l.b.v.
b) Localization
Eight-direction Prewitt (as described in Sections 1.2.2 Figure 1.3 and 4.1.1, Figure 4.2)
c) Thresholding
(1) If thresholding is enabled then apply the basic iterative global thresholding (as described
in Section 4.2.2, Figure 4.7)
d) Merging data
i) Merge the processed sub work to its final destination.
e) Return processed image
70


Pseudocode of partitioning into rows (The Pseudocode conventions are taken from [4]):
procedure parmin(initialization as specified in step 1)
Img [imgHeight,imgWidth] := imRead[imgName]; /* Loading the image */
workLoad := imgHeight/numOfSublmage;
shared img, destlmage, numOfSublmage, levelOfNoise, numOfProc, toThreshold,toSmooth,
workLoad, imgWidtli, imgHeight;
private i, sublmg,x,y;
Self-Scheduled forall i := 0 until numOfSublmage /* Dynamic Scheduling of parallel processes */
begin /* partitioning data into sub-images each having the same imgWidth but with
specific height*/
x: = 0;
y := i workLoad;
subimage := Rectangle( Img, Rect(x, y, imgWidth, imgHeight/numOfSublmage)); /* copy
specified rect to subimage */
if (to Smooth) then /* De-noising: if smoothing is enabled */
subimage := Smoothing (subimage,LevelOfNoise);
end /* Localization: Eighth direction Prewitt */
subimage := eightDirections (subimage);
if (Thresholding) then /* Thresholding: if thresholding is enabled */
subimage := iterativeThresholding (subimage);
end
critical work; /* Merging Data: /* Writing Data to its final destination*/
mergeSublmage (destlmage, subimage, Rect (x, y, imgWidth, imgHeight /numOfSublmage ));
end critical;
Release/ subimage);
end
Return (destlmage); /* Return Processed image*/
End Procedure
Figure 4.11: Procedure of the parallel version of the proposed algorithm.
71


Figure 4.12: Partitioning the image into sub images and apply eight direction
Prewitt on each of the sub images (a) Original Image (b) Divide into 4 rows, (c)
Divide into 4 columns, (d) Divide into 4x4 blocks.
72


Start Loading the Image
Stop
Figure 4.13: Flow chart of the proposed parallel application.


5. Experimental Results and Analysis
In this section, we will go through the main experimental results and analysis of our
work. This section is divided into three main parts where in the first part we will exhibit
the improvement in quality of our proposed algorithm. Then in the second part, we will
expose the performance of our algorithm in terms of execution time. Finally, we will
answer each of the researchs questions (stated in 1.6) individually based on both design
and results covered in both Chapters 4 and 5.
5.1 Quality of Proposed Algorithm
In this part, we will demonstrate the improvement in quality of our proposed
algorithm. Hence, this part is divided into three main subparts. In the first one, we will
prove (through dB numbers) and visualize (with images) that our proposed Improved
Median filter (uses variable window size, which we augmented with our improved design
of the algorithm) is more capable of suppressing impulse noise than the Standard Medina
filter (uses fixed window of size 3x3). Thus, to show the difference in quality we have
used the well-known Peak Signal-to-Noise Ratio (PSNR) that we briefly explained in
1.3.3. Also, we will visualize the difference between both Standard and Improved filters
by applying them on images impacted with up to 70 % of impulse noises. In the second
subsection of part one, we will visualize the difference in quality between our improved
algorithm and both Canny (industrial standard) and the bidirectional Prewitt where both
Canny, and the bidirectional Prewitt are implemented with Matlab.
5.1.1 The Improvement of Proposed Median filter
To compare the quality of the proposed Improved Median filter that uses variable-
sized windows with the standard Median that uses a fixed window of size 3x3, we use the
74


well-known Peak Signal-to-Noise Ratio PSNR (dB) [13], The results are shown in Table
1 in which the test image (Lena see, Figure 5.1 and Figure 5.3) is corrupted with varying
degrees of impulse noise. As shown in Table 1, the improved Median filter generates
significantly higher quality output than the standard in every case [1], The comparison of
various window sizes and various noise levels and types can be seen from the first (input
image indicating noise type and level) and fourth Median filter and the corresponding
window size. Each row of the figure represents application of different algorithms to the
input image in column 1. The higher dB implies closer image to the original. To visually
demonstrate the quality of the proposed Median filter compared to the Standard filter, the
output of both filters applied on an image corrupted with 70 % of impulse noise is shown
in Figure 5.1.
Table 1: Comparison of quality between standard Median filter and our proposed one using PSNR applied on Lena impacted with different percentage of impulse noise
Image: Lena 512 x 512, (see Figure 7 for image)
Impulse Noise Percentage Standard Median filter PSNR(dB) Improved Median filter
PSNR (dB) Window size Noise level
10 33.5798 33.5798 3x3 level 1
20 29.5107 30.2259 5x5 level 2
30 24.1262 29.3523 5x5 level 2
40 19.1859 27.8228 5x5 level 2
50 15.3806 26.646 7x7 level 3
60 12.4786 25.2654 9x9 level 4
70 10.098 23.2942 Double 9x9 level 5
75


Figure 5.1: Visualize the difference between our Improved Median filter and the
Standard Median filter (a) Lena original Image, (b) Lena image corrupted with
70% of impulse noise, (c) Image in b smoothed with Standard Medina filter, (d)
Image in b smoothed with our Improved Median filter (noise level 5).
Figure 5.2: Qualify of the Improved Median filter and the Standard Median filter
76


By looking at Figure 5.1, we can clearly observe the difference in quality of
applying our Improved Median filter and the Standard one. The image in b is an extreme
case scenario in which 70% of the data is corrupted. The standard Median filter cannot
handle such a case obviously for the small window size used in the smoothing process.
When small window size is used to suppress such a high noise percentage, it will be very
possible to replace the noisy pixel with another corrupted one which clearly shown in c.
On the other hand, our Improved Median filter indeed shows a better capability of
suppressing impulse noise even in the extreme cases (such as having image corrupted
with 70% of impulse noise). Also same shown in Figure 5.2, we can clearly see how the
dB decreases from only from 33 to 23 only (when improved Median filter is applied)
while noise increases from 10 to 70%. Unlikely, when standard Median filter used, the
dB goes from 33 to 13; this shows how quality is impacted when noise percentage
increases and we keep using 3x3 window size.
5.1.2 Visualizing the Quality of our Improved Eight Direction Prewitt Algorithm
In the previous section 5.1.1, we have demonstrated the improvement in quality
that is achieved by applying Improved Median filter. Furthermore, this Improved Median
filter as explained in the design section (Chapter 4), is part of our augmented design of
the improved eight direction Prewitt edge detection algorithm. Figure 5.3 is presented to
indicate the quality of our improved eight direction Prewitt edge detection algorithm.
From Figure 5.3 we can visualize the difference of applying our improved edge detection
algorithm on images that are impacted with various types of noise [1], Also, from the
same figure, we can compare between the quality of our proposed edge detection
77


algorithm and both bidirectional Prewitt and Canny (industry standard). The firs column
of Figure 5.3 contains the noisy image that we want to process. Column 2, has the output
of applying the Standard Canny which refers to the default run with Matlab without
any tuning while best Canny (Column 3) has the best results that can be produced by
tuning the supported parameters.
Lena 512 x 512
Canny Canny Prewilt Proposed with Proposed
Default Run Tuned Default Run Improved Median With
(Matlab 7.10.0) (Matlab 7.10.0) (Matlab 7.10.0) filter & no Thresholding
Thresholding

Oripitl Image
Defaah

\ y

3 id window, Level 1
l /r/'
W&t:.

Gaussian 10
T-Sipaa 3
Doable 9s9 window. Lev el 5
Figure 5.3: Output of Canny (industry standard) and bidirectional Prewitt vs. our
proposed work.
78


From Figure 5.3, we can see the capability of the proposed algorithm to suppress
Salt & Pepper noise (impulse noise) even when the image is impacted with very high
noise percentage. Furthermore, the algorithm is able to suppress different types of noise
with less impact on the real edges than the bidirectional Prewitt implemented with
Matlab. In fact, our proposed algorithm overperforms the bidirectional Prewitt algorithm
(that is implemented in Matlab) in all cases. Thus, our answer for the first question is
yes, indeed our Prewitt edge detection algorithm is an improved version of the Prewitt
which is more capable of working on different images impacted with various types of
noise. Since additional complexities have been added to our improved algorithm, this
brought us to the second question that is how to overcome the additional complexity
added to the original bidirectional Prewitt algorithm^ In Chapter 4, we have briefly
explained our idea of coming with an efficient parallel implementation that is based on an
efficient sequential one to overcome the additional complexities added to our design.
Therefore, next we will cover efficiency of our algorithm in terms of execution time and
compare it with both other edge detection applications that are in use.
In addition, in Figure 4.6 we have shown the result of applying the original eight
directional Prewitt on image impacted with impulse noise. Yet, now we can clearly
visualize the difference between both cases (smoothed with our improved filter and the
unsmoothed) as shown below.
79


80


Figure 5.4: Demonstrate the noise sensitivity of Prewitt edge detection working on
image impacted with impulse noise (a) Original Living Room image, (b) Eight
direction Prewitt applied on the image in a. (c) Living Room image impacted with 3
% Salt and Pepper noise, (d) Eight direction Prewitt applied on the image in d. (e)
Image in c smoothed with our Improved Median filter (noise level 1) then the Eight
direction Prewitt applied on the resulted image, (f) Applying the basic global
iterative thresholding mechanism on the image in e.
From the above figure, we can clearly observe the importance of our smoothing
technique which has strengthen the original eight direction Prewitt in (c) applied on noisy
image. The result in e (smoothed with our improved filter) is almost identical with the
image in b where we have applied eight direction Prewitt on the noiseless image. Finally,
the image in (f) helps visualizing the result of the basic global iterative thresholding
mechanism that is employed to produce binary image.
5.2 The Efficiency of our Proposed Implementations
As we have shown the performance of our augmented design (in term of quality), in
this section we will cover the performance of our implementation in terms of execution
time. In fact, our main two goals in this study (enhancing the quality of Prewitt while
performing rapidly) are conflicted. Enhancing the quality implies additional complexity
that will slow down the application. Due to this reason, in the first section we will show
how efficient our sequential implementation is compared to some well-known edge
detection algorithms that are already in-use. Then in the second part, we will present the
efficiency of our parallel version of the algorithm that is based on our efficient sequential
one.
81


5.2.1 The Efficiency of the Sequential Eight Direction Prewitt Edge Detection
Implementation
In order to provide a fair comparison of the efficiency of our algorithms and
implementation choices, we compare the sequential execution times of our proposed
method with the existing implementations of bidirectional Prewitt and Canny algorithms,
which are already in-use with Matlab; the results are shown in Table 2. In this table
(Table 2) the results of all tests-cases of running sequentially on the same platform are
presented. As mentioned in the previous section (Chapter 4), our algorithms are
implemented using C/C++ as the base language with OpenCV.
Table 2: Sequential run of the presented algorithm vs. the bidirectional Prewitt and Canny edge detections implemented within Matlab
Image Details Intel I Core i7-3720QM CPU @ 2.60 GHz, 16 GB RAM, 64-bit Windows 7 LI D :4 x 32 Kbytes 8-way set associative, LI 1:4 x 32 Kbytes 8-way set associative L2 : 4 x 256 Kbytes 8-way set associative, L3 : 6 Mbytes 12-way set associative
Image Size Proposed 8 Directions Prewitt by itself Proposed: Smoothing (level 1) & 8Directions Prewitt & Iterative Thresholding Matlab Bidirectional Prewitt Matlab Canny Edge detection
15315x 11624 169 MB 2.456 sec 3.51 sec 8.719067 sec 55.23517 sec
2250x2250 14.4 MB 0.063 sec 0.171 sec 0.8991 sec 2.411595
1536x2048 5.91 MB 0.047 sec 0.109 sec 0.263233 sec 1.494402 sec
512x512 768 KB 0.0145 sec 0.016 sec 0.137101 sec 0.334044 sec
256x256 192 KB 0.001 sec 0.0025 sec 0.026171 sec 0.2432 sec
As shown in Table 2, our sequential implementation outperforms bidirectional
Prewitt and Canny edge detections. It is important to note that in the table, execution
times reported for our implementation include smoothing (with different levels),
localization and the iterative thresholding technique. From examining Table 2 (running
on Intel i7) processing an image of dimension 512 x 512 takes 16 ms resulting in 62.5 fps
(Frames per Second) with level 1 smoothing [43 ms (23.2 fps) when it is smoothed with
level 5], Obviously higher noise percentage requires more computations for good
82


suppression. However, we are able to accelerate the previous runs using 4 processors,
running on the same platform, to 6 ms at 166 fps for level 1 smoothing [and 11 ms at 90.9
fps for level 5], Also, for instance relatively small dimension images (256x256) run about
10 x times faster than bidirectional Prewitt and 100 x times faster than Canny edge
detection implemented in Matlab.
5.2.2 The Efficiency of Our Parallel Eight Direction Prewitt Edge Detection
Implementation
Having shown how efficient the algorithm works on images of small dimensions, we
present the execution times of our parallel implementation working on larger images
using two different multi core platforms as it is shown in Table 3 and Table 4 respectively
[1], The tables below (Table 3 and 4) are the results of four different scenarios (cases).
Table 3: Execution time for the proposed algorithm running on 12-core compute nodes
Case 1: Eight directions Prewitt, Case 2: Smoothing (level 1) & eight directions Prewitt, Case 3: Smoothing (level 1) & eight directions Prewitt & thresholding, Case 4: Smoothing (level 5) & eight directions Prewitt & thresholding
Image of size 29649 x 22008 Image of size 15315 x 11624
N cores Best execution time Tn (sec) Sequential Tj (sec) Speed up Number of cores Best execution time Tn (sec) Sequential Tj (sec) Speed up
Case 1 12 2.890 22.850 7.91 12 0.803 6.021 7.50
Case 2 12 3.145 23.798 7.57 12 0.843 6.534 7.75
Case 3 12 4.521 33.532 7.42 12 1.381 8.854 6.41
Case 4 12 13.015 143.52 11.03 12 3.724 39.45 10.59
Image of size 2250 x 2250 Image of size 1536 x 2048
Case 1 12 0.028 0.180 6.43 8 0.024 0.108 4.50
Case 2 12 0.032 0.189 5.91 8 0.026 0.114 4.38
Case 3 12 0.067 0.491 7.33 12 0.051 0.305 5.98
Case 4 12 0.147 1.589 10.81 12 0.130 1.016 7.82
Image of size 2048 x 2048 Image of size 2704 x 4064
Case 1 8 0.031 0.147 4.74 8 0.069 0.374 5.42
Case 2 8 0.031 0.156 5.03 8 0.075 0.403 5.39
Case 3 12 0.051 0.372 7.29 12 0.160 1.297 8.11
Case 4 12 0.132 1.201 9.10 12 0.348 3.959 11.38
12-core compute node (shared memory multi-processor, processor @2.2 Ghz of AMD (Opteron) type, 6x64 KB LI instruction cache per processor, 6x64 KB LI data cache per processor, 6x512 KB L2 per processor, 6MB L3 per processor, 24 GB RAM, 64-bit Linux version 2.6.18.
83


Table 4: Execution time for the proposed algorithm running on 64-core compute nodes
Case 1: Eight directions Prewitt, Case 2: Smoothing (level (level 1) & eight directions Prewitt & thresholding, Case 4: thresholdin ) & eight directions Prewitt, Case 3: Smoothing Smoothing (level 5) & eight directions Prewitt & T
Image of size 29649 x 22008 Image of size 15315 x 11624
N cores Best execution time Tn (sec) Sequential T, (sec) Speed up Number of cores Best execution time TN(sec) Sequential T, (sec) Speed up
Case 1 28 1.35 19.07 14.13 28 0.39 5.22 13.38
Case 2 32 1.35 19.89 14.73 24 0.42 5.46 13.00
Case 3 32 1.86 28.14 15.13 32 0.65 7.64 11.75
Case 4 44 4.35 115.56 26.57 40 1.32 31.41 23.80
Image of size 2250 x 2250 Image of size 1536 x 2048
Case 1 12 0.027 0.155 5.74 12 0.019 0.096 5.05
Case 2 12 0.029 0.163 5.62 12 0.023 0.102 4.43
Case 3 16 0.051 0.460 9.02 16 0.040 0.284 7.10
Case 4 24 0.094 1.461 15.54 24 0.073 0.979 13.41
Image of size 2048 x 2048 Image of size 2704 x 4064
Case 1 12 0.023 0.126 5.48 16 0.040 0.324 8.10
Case 2 12 0.025 0.139 5.56 16 0.043 0.342 7.95
Case 3 12 0.043 0.336 7.81 24 0.117 1.135 9.70
Case 4 24 0.092 1.067 11.60 32 0.184 3.364 18.28
64-core compute node (shared memory multi-processor, processor @ 2.2 Ghz of AMD (Bulldozer) type, 1x64 KB LI instruction cache per module which contains two execution cores, 2x16 KB LI data cache per module, 1x2 MB L2 per module, 16 MB L3 shared by four modules (located on the same chip), 128 GB RAM, 64-bit Linux version 2.6.18.
The two multicore platforms used for our experiments are: A 12-core AMD with
Opteron module (Table 3) and a 64-core AMD with Bulldozer module (Table 4). The
organizations of the two processors are quite different as briefly shown in 3.1.1 and 3.1.2.
The Opteron consists of two chips each has 6 cores. Each Opteron core has its own
private first level instruction and data caches of 64 KB size each and a private unified L2
cache of 512 KB. Every 6 core/chip share 6 MB of L3 cache.
The 64-core AMD with Bulldozer consists of eight Bulldozer modules on one chip
(16 cores). A Bulldozer module consists of two integer execution cores that share a first
level instruction cache of 64 KB and a floating point unit. Each core has a private first
level data cache of 16 KB. The two cores within one Bulldozer module share a unified L2
cache (1 MB). Every four modules share 16 MB of L3 cache.
84


The LI data cache in Opteron is four times larger than the Bulldozers LI data
cache. The idea of partitioning the image into a number of equal sized tiles helps enhance
the locality of data. The main difference between Opteron and Bulldozer is that more
resources are shared within the Bulldozer module. For instance, when we process
relatively large images (29649 x 22008) using 32 cores, running on the Bulldozer, the
best results are achieved when the parallel task is distributed such that each task is
executed on every other core. This means that the 32 Bulldozer modules, one active core
per module, will be working compared to the case in which only 16 Bulldozer modules,
two active cores per unit, are used. Using more Bulldozer units not only provides larger
L3 cache, a total of total 128 MB on the four chips, but also allows each core to use the
entire shared L2 cache as a dedicated second level cache and have exclusive use of the
single floating point unit that is shared by the two integer cores. Although the
communication is costly especially if the cores do not have shared cache space, i.e., the
cost of one core accessing cache memory located on different chips, having a better data
locality (when larger caches are used) enhances the overall performance of the
application. This statement will not apply to all applications, but in our design we aimed
to divide the input image to independent tiles in order to decrease the amount of
communication required at the same time increase the locality of data gained from using
larger cache space.
Looking at the results in Table 3, data collected from running on Opteron module,
and Table 4, data collected from running on Bulldozer module, reveal that our parallel
algorithm performs very well. Now, let us look at the different cases in which these
results are collected. In Case 7, images are processed by applying the proposed eight
85


direction Prewitt edge detection algorithm without smoothing and thresholding
mechanisms. In Case 2, level 1 smoothing is used prior to the eight direction Prewitt,
while in Case 3, the proposed thresholding is added to Case 2. Finally in Case 4, the
images are processed with level 5 smoothing, eight direction Prewitt and the proposed
thresholding mechanism. The number of cores used in each case is represented in the N
cores column of Tables 3 and 4, while the Best execution time Tv column shows the time
taken in seconds of the best parallel run using N cores. The column Sequential Tj
contains the execution time of the above specified cases on a single core. The next
column Speed up is calculated from Ti/Tn which represents the number of times the
parallel run is faster than the sequential.
In order to understand how each case performs on either of the two platforms we
will compare between the cases by testing various image sizes. Also, for each image size
we will run all four cases on two different platforms that are Bulldozer and Opteron.
5.3 Answers for the Research Questions
In the introduction chapter, we have specified the main research questions we aim to
answer. Therefore, based on the results explained in Chapter 5 we will answer these main
questions in this section.
5.3.1 Is it Possible to Come Up with Improved Version of Prewitt Edge Detection
Algorithm?
The answer for this question is shown in both Figure 4.4 and 5.3 and explained
briefly in Chapter 4 and 5. Figure 4.4 shows the improvement in quality of applying
86


bidirectional Prewitt vs. Eight direction Prewitt, while in Figure 5.3 we demonstrate the
improved quality of our proposed Prewitt which is indeed proven to be an improved
version of Prewitt edge detection algorithm. The whole improvement of our proposed
Prewitt is based on sub-improvements that are grouped together to make the final version
of the algorithm. These sub-improvements are as follows:
a. First, detecting edges in eight directions instead of two difference shown in Figures
4.3 and 4.4. However, the eight direction Prewitt as we have shown in 4.1.3 and
Figure 4.6 fails to work on images that are impacted with impulse and other types of
noise.
b. In order to decrease the noise sensitivity of Prewitt we have augmented our improved
design with smoothing mechanism we called it the improved Median filter. Our
smoothing mechanism as we have proved in Table 1 and Figure 5.1 and briefly
covered in 4.2.1. In Table 1, Figure 5.1 and Figure 5.2 we can observe the capability
of our improved Median filter of suppressing impulse noise and other type of noise
using the idea of flexible window size.
c. Prewitt algorithm does not incorporate a thresholding mechanism in order to produce
binary images. The presence of binary image, one representing the object and the
other representing the background, is desirable in many object detection applications.
For this reason, we have added iterative thresholding functionality (sub-improvement)
to our improved Prewitt edge detection known as basic global thresholding that is
briefly covered in 4.2.2. We can visualize the output of this thresholding in the
column number six of Figure 5.3.
87


5.3.2 How to Overcome the Additional Complexity Added to the Original
Bidirectional Prewitt Algorithm?
As we have explained in the answer of the previous question, our improved version of
the Prewitt edge detection is the augmentations of additional functionalities which all
work together to produce better quality detection algorithm. These additional
functionalities added more complexity to the original bidirectional algorithm.
Accordingly, we have implemented efficient implementations of the algorithm to
overcome the additional functionalities added and they are as follows:
a. In our application design we are allowing flexibility in running the application so as
to use functionalities only when needed for the sake of operations reduction. For
instance, as shown in Figure 4.13, the user is eligible to disable and enable both
smoothing and thresholding in addition to the noise level guess. All these flexibilities
in the design will save some operations that might not be needed in some scenarios.
b. The design of our efficient sequential implantation is comprehensively covered in
4.3.1. Our sequential implementation, that is based on C/C++ programming language
with OpenCV library, overperforms both bidirectional Prewitt and Canny (industry
standard) which are already in use with Matlab. The comparison in execution times
(of running various image sizes) between our improved Prewitt edge detection and
the bidirectional Prewitt and Canny is shown in 5.1.1 Table 1. Indeed, our algorithm
within its general case (Smoothing (level 1) & 8Directions Prewitt & Iterative
Thresholding) is between 2.4 to 10.47 times faster than the original bidirectional
Prewitt and 13.7 to 97.28 faster than Canny. This is very impressive due to the fact
88


Full Text

PAGE 1

IMPROVED SEQUENTIAL & PARALLEL DESIGN S AND IMPLEMENTATION S OF THE EIGHT DIRECTI ON PREWITT EDGE DETECTION by MOHAMMED B. MOHAMMED B.Sc., Mosul University, 2009 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Master of Science Computer Science 2013

PAGE 2

ii This Thesis for the Master of Science degree by Mohammed B Mohammed has been approved for the Computer Science Program By Gita Alaghband, Chair Tom Altman Bogdan Chlebus September 6 2013

PAGE 3

iii Mohammed, Mohammed, B. (Master of Science in Computer Science) Improved Sequential & Parallel Design s and Implementation s of the Eight Direction Prewitt Edge Detection Thesis directed by Professor Gita Alaghband ABSTRACT on our lives; we are witnessing an expansion in computer power combined with a noticeable development of digital camera capabilities. To keep up with the requirements of the digitalized world, the focus has been set on the computer vision field. One of the most popular co mputer vision applications is recognition. The word recognition can imply different computer vision areas such as object tracking, face and pattern recognition, human computer interaction, traffic monitoring, vehicle navigation, etc. Edge detection algorit hms are widely used within the computer vision and the image processing field. Edge detection algorithms are at the center of the recognition process in computer vision and image processing. This work presents design and implementation of efficient sequent ial and parallel edge detection algorithms capable of producing high quality results and performing at high speed [1] The parallel version, derived from our efficient sequential algorithm, is designed for the new shared memory MIMD multicore platforms. Th e edge detection algorithm presented here is designed to effectively work on images impacted with different noise percentage s. This has been achieved through augmenting our edge detection algorithm with a n improved median filter capable of suppressing impulse noise and other noises more effectively than the original standard

PAGE 4

iv Median filter A global thresholding method augments our design to dynamically find a suitable thresholding value. In order to measure the quality and execution time, we test image s with different sizes along with the original Prewitt and Canny edge detection algorithm s already implemented in Matlab, to show the possibility of using our design within different applications. This work will demonstrate the ability to process relativel y small and medium images in real time as well as effectively processing extremely large images useful for biomedical image processing rapidly The form and content of this abstract are approved. I recommend its publication. Approved: Gita Alag hband

PAGE 5

v DEDICATION This work is dedicated to my generous mother and supportive father who have sacrificed their lives in order for me to succeed. This work is dedicated to my wonderful sisters and my gentle brother in which their love has kept me focused. This work is dedicated to the soul of a friend, Ahmed Al Nuri, who will never be forgotten. This work is dedicated to my only uncle who passed away without seeing me graduate, to those who are working to rebuild my destroyed coun try, to the innocent being murdered on a daily basis and to all my beloved friends who are my treasures, forever.

PAGE 6

vi AKCNOWLEDGEMENTS I acknowledge my fellowship from the Higher Committee for Education Development in Iraq (HCED). I want to show my gratefulness and gratitude to my consultant, advisor and mentor Professor Gita Alaghband for her guidance, support and encouragement throughout my study. Without her invaluable help, it would not be possible for me to achieve this level of academic proficiency; I would not have the chance to collaborate with such knowledgeable and incredible researchers. I would like to express my gratitude to all the faculty members in the computer science department at the University of Colorado Denver and to all my professors at the University of Mosul in Iraq.

PAGE 7

vii TABLE OF CONTENTS C hapter 1 Introduction ................................ ................................ ................................ ..................... 1 1.1 Edge Detection Stands in the Digitized World ................................ ......................... 2 1.2 Edge Detection and the Most Well known Detection Algorithms ........................... 3 1.2.1 Introduction ................................ ................................ ................................ ........ 3 1.2.2 Prewitt Edge Detection and the Most Recent Improvements ............................ 4 1.2.3 Canny Edge Detection as a Quality Reference ................................ .................. 8 1.3 Smoothing Mechanism ................................ ................................ ........................... 10 1.3.1 Introduction ................................ ................................ ................................ ...... 10 1.3.2 Medi an Filter ................................ ................................ ................................ .... 11 1.3.3 Peak Signal To Noise Ratio (PSNR) as a Quality Metric ............................... 11 1.4 Iterative Thresholding and the Most Well known Iterative Thresholding ............. 13 1.4.1 Introduction ................................ ................................ ................................ ...... 13 1.4.2 Otsu Thresholding ................................ ................................ ............................ 13 1.4.3 Basic Global Thresholding ................................ ................................ .............. 14 1.5 The Significance of the Study ................................ ................................ ................. 14 1.6 Statement of the Problem ................................ ................................ ........................ 15 1.7 Thesis Organization ................................ ................................ ................................ 17 2. Background ................................ ................................ ................................ ................... 19 2.1 Basic Definition ................................ ................................ ................................ ...... 19 2.1.1 Most Common Types of a Digital Image ................................ ........................ 19 2.1.2 More about Edges ................................ ................................ ............................ 20 2.1.3 Noise ................................ ................................ ................................ ................ 21 2.2 Common Image Processing Operations ................................ ................................ .. 23 2.2.1 Linear and Nonlinear Spatial Filtering ................................ ............................ 23 2.2.2 Denoising ................................ ................................ ................................ ......... 25 2.2.3 Thresholding ................................ ................................ ................................ .... 26 2.2.4 Histogram ................................ ................................ ................................ ......... 28 2.3 Some of the Challenges in the Field of Image Processing ................................ ...... 29 2.3.1 Real time Image Processing ................................ ................................ ............ 30 2.3.2 Parallel Image Processing ................................ ................................ ................ 31

PAGE 8

viii 2.3.3 Process High Resolution Images ................................ ................................ ..... 33 2.4 Common Programming Languages Utilized for Image Processing and Computer Vision Fields ................................ ................................ ................................ ................. 34 2.4.1 Matlab ................................ ................................ ................................ .............. 35 2.4.2 OpenCV ................................ ................................ ................................ ........... 35 3. Experimental Setup ................................ ................................ ................................ ....... 38 3.1 Main Multicore Shared Memory Architectures Employed to Run the Experiments ................................ ................................ ................................ ................................ ....... 38 3.1.1 Opteron Module ................................ ................................ ............................... 39 3.1.2 Bulldozer Module ................................ ................................ ............................ 40 3.2 Main Accessories Implementations ................................ ................................ ........ 41 3.2.1 Changing to Gray Scale Channel & Converting between Extensions ............. 42 3.2.2 Adding Noise ................................ ................................ ................................ ... 43 3.2.3 Peak Signal To Noise Ratio (PSNR) ................................ ............................... 43 4. Research Methodology ................................ ................................ ................................ 45 4.1 Simulation & Evaluation of the Eight Direction Prewitt ................................ ........ 46 4.1.1 Bidirectional Prewitt vs. Eight Directions Prewitt ................................ ........... 46 4.1.2 Processing Different Color Spaces can Impact the Quality ............................. 50 4.1.3 Eight Directions Prewitt Working on Noisy Images ................................ ....... 53 4.2 Design ................................ ................................ ................................ ..................... 55 4.2.1 Selecting an Appropriate Filter to Use as Prefix Step to Eight direction Prewitt ................................ ................................ ................................ ................................ ... 56 4.2.2 Dynamic Iterative Thresholding ................................ ................................ ...... 58 4.3 The Implementation of the Improved Eight Direction Prewitt Edge Detection ..... 59 4.3.1 Sequential Implementation ................................ ................................ .............. 60 4.3.2 Parallel Implementation to Accelerate the Process of Detection ..................... 66 5. Experimental Results And Analysis ................................ ................................ ............. 74 5.1 Quality of Proposed Algorithm ................................ ................................ ............... 74 5.1.1 The Improvement of Proposed Median filter ................................ ................... 74 5.1.2 Visualizing the Quality of our Improved Eight Direction Prewitt Algorithm 77 5.2 The Efficiency of our Proposed Implementations ................................ .................. 81 5.2.1 The Efficiency of the Sequential Eight Direction Prewitt Edge Detection Implementation ................................ ................................ ................................ ......... 82 5.2.2 The Efficiency of Our Parallel Eight Direction Prewitt Edge Detection Implementation ................................ ................................ ................................ ......... 83 5.3 Answers for the Research Questions ................................ ................................ ...... 86

PAGE 9

ix 5.3.1 Is it Possible to Come Up with Improved Version of Prewitt Edge Detection Algorithm? ................................ ................................ ................................ ................ 86 5.3.2 How to Overcome the Additional Complexity Added to the Original Bidirectional Prewitt Algorithm? ................................ ................................ .............. 88 5.3.3 How Efficient the Algorithm is Compare to the Original Prewitt and the Well kn own Canny Edge Detections? ................................ ................................ ............... 89 5.3.4 Will the Improved Version of this Algorithm Perform Well Unconditionally? ................................ ................................ ................................ ................................ ... 90 5.3.5 How Suitable is the Improved Version of this Algorithm for Real time Edge Detection Application? ................................ ................................ ............................. 90 6. Conclusion ................................ ................................ ................................ .................... 93 R eference s .9 4 Appendix 100

PAGE 10

x LIST OF TABLES Table 1: Comparison of quality between standard Median filter and our proposed one using PSNR applied on Lena impacted with different percentage of impulse noise ................. 75 2: Sequential run of the presented algorithm vs. the bidirectional Prewitt and Canny edge detections implemented within Matlab ................................ ................................ ............. 82 3: Execution time for the proposed algorithm running on 12 core compute nodes .......... 83 4: Execution time for the proposed algorithm running on 64 core compute nodes .......... 84

PAGE 11

xi LIST OF FIGURES Figure 1.1: Windows used with original Bidirectional Prewitt ................................ ...................... 5 ................................ ..... 6 1.3: Templates of different directions used in bidirectional improved eight d irection Prewitt ................................ ................................ ................................ ................................ 7 2.1 : Edge in images ................................ ................................ ................................ ........... 21 2.2 : Len a and Camera man images corrupted with different type s of noise ..................... 22 2.3 : Illustration of one dimensional correlation and convolution ................................ ..... 24 2.4 : Applying averaging filter on both Lena and Camera man ................................ ......... 26 2.5 : Convert to a binary image using thresholding mechanism ................................ ........ 27 2.6 : ................................ ................................ ........... 29 3.1 : Pseudocode of changing to gray and converting between image extension s ........... 42 3.2 : Pseudocode of adding different types of noise using Matlab ................................ ... 43 3.3 : PSNR Pseudocode ................................ ................................ ................................ ..... 44 4.1 : Flow char of Eight Direction Prewitt edge detection algorithm ................................ 47 4 .2 : ................................ ............... 48 4.3 : Results of applying different directions of the Prewitt algorithm ............................. 49 4.4 : Original bidirectional Prewitt vs. Eight Direction Prewitt ................................ ........ 50 4.5 : Processing different channels with eight direction Prewitt ................................ ....... 52 4.6 : 54 4.7 : The procedure of the Basic Global Thresholding ................................ ...................... 58 4.8. a: Procedure of the sequential proposed algorithm ................................ ..................... 61

PAGE 12

xii 4.8 b: The Flow chart of the sequential version of the improved eight direction Prewitt e dge detection application ................................ ................................ ................................ 61 4.9 : Snapshot of our Region of our application that allows processing ROI .................... 65 4.10 : Proces sing life ................................ ................................ ................................ .......... 69 4 .11 : Procedure of the parallel version of the proposed algorithm ................................ ... 71 4.12 : Partitioning the image into sub images and apply eight direction Prewitt on each of the sub images ................................ ................................ ................................ ................... 72 4.13 : Flow chart of the proposed paralle l application ........ Error! Bookmark not defined. 5.1 : Visualize the difference between our Improved Median filter and the Standard Median filter ................................ ................................ ................................ ...................... 76 5.2 : Qualify of the Improved Median filter and the Standard Median filter ..................... 76 5.3 : Output of Canny (industry standard) and bidirectional Prewitt vs. our proposed work ................................ ................................ ................................ ................................ ........... 78 5.4 : Demonstrate the noise sensitivity of Prewitt edge detection working on image impacted with impulse noise ................................ ................................ ............................. 81

PAGE 13

1 1. I ntroduction At the rate in which current processor technology is advancing, Moore's law is no longer an accurate unit for measurement of computation output of processors [2 3] This has taken place after the industrial race for higher CPU clock frequency is reaching its limits The sequential world view needs to constantly adapt in order to keep up with accelerated development platforms [4] This will be achieved through parallel implementation of algorithms that is based multithreading programming languages [4] Hea vy computations needed by image processing applications whose usage has increased and become a very essential part of our life belongings [5] Future multimedia systems need to have high performance architecture. The required level of performance is not being achieved anymore by the single core architecture [6] This limitation has affected the progression of architecture design in which distribu ted and parallel systems are rapidly growing. impact on our lives; we are witnessing an expansion of computer power combined with a noticeable development of the digital camera capabilities. In keeping up with the digitalized world, much focus has been set on the computer vision field. In this study different concepts in image processing and computer vision have been covered where we start from very fundamental concepts to end w ith future work ideas that are not limited with a boundary. In this chapter the significance of edge detection in the image processing and more precisely the detection filed will be explained After that, the most common edge detection algorithms relevan t to our work and related researches and developments which

PAGE 14

2 have contributed to them will be covered Next, we briefly define smoothing and describe the most related smoothing mechanisms utilized in this work After that, thresholding, the need for it and the most common thresholding mechanism s will be covered Finally the significance of our study will be shown and the main r esearch questions will be highlighted 1.1 Edge Detection Stands in t he Digitized World Computer vision is th e field in which digital images are mainly transferred from capturing device (camera or video), to be processed using different image processing mechanisms that help achieving particular goal [7] The main aim of the computer vision field is to simulate human vision where Artificial Intelligence (AI) can be used to make a decision while image analysis which falls between the image processing and the computer vision, is exploited in image understanding [8] Edge detection is one of the most usable operations within the computer vision and the image processing fields [9 10, 11, 12, 13, 14, 15, 16 ] The goal of edge detection is to specify the pixels at image where sharp changing of intensity is taken place [1 7] Detection algorithms are central for image analysis and recognition applications [ 3, 8, 13] The word recognition can imply different computer vision areas such as object tracking, face and pattern recognition, human computer interaction, traffic moni toring, vehicle navigation, image segmentation etc. [13] Therefore, having an efficient recognition algorithm both in terms of quality and execution time can implicitly push forward the real time recognition applications [1]

PAGE 15

3 1.2 Edge Detection and the Most Well known Detection Algorithms In this section, some of the well known edge detection algorithms will be covered while the main focus is set toward Prewitt edge detection and the most recent improvements have taken place on it. Also, C anny edge detectio n, the most sophisticated edge detection operator, is highlighted since it becomes a quality reference of the improved Prewitt algorithm presented in this work. 1.2.1 Introduction Edge detection is one of the most usable operations within the computer v ision and the image processing f ields which becomes central for image analysis and recognition applications [3, 8, 9, 10, 11, 13, 14, 16] Edge detection essentially looks for sharp changes in brightness of intensities [8 17, 18 ] Edge detection algorithm s as explained by Gonzales and Woods usually contain four steps one of them is optional. The first step is smoothing which is the mechanism of suppressing noise from the original image to decrease the number of false detected edges The second step is enhancement (sharpening), in which a n attempt is made to reconst ruct some of the edges which have been destroyed during the smoothing step. However, sharpening usually is an optional operation where some algorithms are used while others are not. A fter that detection (localization) is applied to decide which non zero pixels are consider ed an edge and which are not. Finally, thresholdin g mechanism, is applied on the gray scale image resulted from the localization step to a binary image. To measure the quality of edge detection we need to describe the criteria of optimal edge detection algorithm which are good localization and good detection [11 19] Good detection implies decreasing the

PAGE 16

4 probability of detecting false edges while good localization means the detected edges are as close as possible to the true edges [11 19] Edge detection algorithms are usually categorized, as explained by Phueakjeen, Jindapetch, Kuburat and Suvorn, into either G radient or Laplacian method. Within Gradient method edges are detected through looking for maximum and minimum intensity values takes place at the first derivative. On the other hand, with Laplacian edges are detected from zero crossing in the seco nd derivative where zero values represent the location of edges. Prewitt, Sobel, Robert, and Canny are examples of gradient edge de tection operators while Laplaci a n of Gaussian is an example of Laplacian operators [18] Prewitt and Sobel gradient operator s offer to detect edges with certain orientations [ 16, 20 21 ] On the other hand, the best known industry standard algorithm is Canny ; it is shown to have high computational complexity [ 16, 18 In our work the Prewitt edge detection algorithm which is known for its simplicity and computational efficiency has been employed in the present study Even though Prewitt algorithm is known for its efficiency in number of computation, yet its noise sensitivity, according to different analytical studies, has negatively affected its usability for most common image p rocessing application [ 8, 16, 17, 22 23 ] Next, we will focus on two main edge detection algorithms where Prewitt is employed as a core in our work (Prewitt) and the other (Canny) as a quality reference. 1.2 .2 Prewitt Edge Detection and the Most Recent Improvements Prewitt edge detection algorithm as defined by Granzales, Woods and Edds is a discrete differentiation operator with the goal of calculating the gradient of the image intensity function, where the convolution operation is done on each pixel i n both

PAGE 17

5 horizontal and vertical directions (0 and 90 degrees) [8] This gradient edge detection algorithm looks for maximum and minimum derivatives of the image [13] Prewitt as well as Sobel edge detection algorithms offer the capability to detect edges wi th certain orientations [ 16, 21] In order to detect edges in both horizontal and vertical directions, the input image is masked with two matrices (windows) each of size 3 by 3 (Figur e 1.1 ). Figure 1 .1 : Windows used with original Bidirectional Prewitt Masking the input image with the windows in Figure 1.1 results in two intermediate templates (candidates). This means for each pixel there are at most two candidates to choose from. The strategy of election is to elect the pixel which has maximum intensity value (sharper changing) to represent that pixel within its final destination. However, limiting the algorithm to choose between only two candidates is not sufficient to produce accurate results It is possible to use all possible eight d irections as shown in Figure 1.2 and later in section 4.1.1, to detect edges from all eight candidates. An improved version of the original bidirectional algorithm that handles eight directions was proposed in 2 008 by Ci and Chen [17] The Pseudocode corresponding to the bidirectional Prewitt edge detection is represented below in Figure 1.2.

PAGE 18

6 and g(x,y) is the output image */ width. */ width. */ k2[maskHeight,mWidth] := width. */ h := (maskHeight 1)/2 ; w := (maskWidth 1)/2 ; for y := 0 until imgHeight do for x := 0 until imgWidth do { sum = 0 ; sum1 = 0 ; sum2 = 0 ; /* initialization */ for i := h until h do for j := w until w do { sum1 + = k1(j,i)*I(x j,y i) ; /*convolution in the first direct*/ sum2 + = k2(j,i)*I(x j,y i); /* convolution in the second direct*/ sum = max(sum1, sum2) /* select the max intensity */ } g(x,y) = sum; } /* result image */ Figure 1. 2

PAGE 19

7 Figure 1. 3 : Template s of different directions used in bidirectional improved eight direction Prewitt Adding more directions to the original algorithm improves the quality of the original Prewitt algorithm. Additional improvements such as incorporating thresholding and implicit smoothing techniques were suggested by Lei, Dewei, Xiaoyu, and Hui in 2011 [23] The significant contribution of their work is the ability to use an implicit smoothing technique to enable the algorithm to work on noisy images. This was particularly important as the Prewitt edge detection algorithm is known to be sensitive to noise and the refore may not generate satisfactory results in the presence of noise. The implicit smoothing technique used in [23] calculates the gradient magnitude of the eight directions as the basis to find the final magnitude to reduce the effect of noise. Our

PAGE 20

8 exper iments show that the gradient based on implicit smoothing approach will not be very effective when the image is impacted with high noise percentage. Instead, we have conducted some research and experiments to come up with more effective smoothing technique as we will see in our experiments section ( 5.1.1 ) Also, the design presented in [23] is very complex compare d to the original bi directional algorithm, where complexity is an important factor which might affect the u sability of any algorithm. Nevertheless improving the quality of the algorithm obviously requires additional complexity. Thus to overcome the additional complexity, we have come up with and efficient parallel implementation that is based on our solid sequential one as shown in C hapter 4 in th e design section [1] 1.2 .3 Canny Edge Detection as a Quality Reference Canny edge detection algorithm was first designed by John Canny in 1986 It is a multi stage detection algorithm based on computational approach [11] Nowadays, Canny is considered to be industry standard and also as the optimal edge detector that is used to evaluate other edge detection algorithms [16, 17, 18, 21, 24] Despite the fact that Canny is considered to be the optimal and standard edge detection, many studies have highlighte d some disadvantage of this operato r which can negatively affect its applicability. The process of Canny edge detection algorithm is briefly explained in different articles. Joshi and Koju have compared between different edge det ection algorithms and show the main step s followed in in Canny [17] These steps are smoothing, localization, direction calculation, non maximum suppression, hysteresis thresholding. I t is also shown that Canny edge detection algorithm over perform all the

PAGE 21

9 other gradient operators, such as Robert, Prewitt, and Sobel, workin g on all different scenarios in which images are impacted with different type s of noise in 2010 cover some of the weaknesses of Canny edge detection algorithm. They mention that even th ough Canny is extensively being consumed with in the de tection field, yet, failing of detect ing weak edge s limit s its applicability to many edge detection applications that requires high quality detection capability. Also, the original Canny algorithm uses pre specified thresholding values that require some different trials before getting the optimal thresholding. However, many res earches have been contributed in p rovid ing Canny with an iterative threshol ding mechanisms to overcome these weakness es [25] [24] Even though Canny edge detection is known to be the standard edge detection algorithm yet, different researches have shown that this algorithm has high computation al complexity [17] This complexity is time consuming and probably is reflected on the usability of such operator. In other word, more co mplexity leads to unsuitability for applications that require rapid processing capability such as real time detection application [16] In this work we have used Canny as a quality measurement (reference) to compare our improved version of Prewitt edge detection algorithm with other edge detection operators that are alre ady in use The comparison between Canny and the i mproved proposed version will be on both quality and execution time.

PAGE 22

10 1.3 Smoothing Mechanism In this section, a short introduction about noise suppression and the most well known mechanisms are covered. However, Median filter is briefly explained since it is an important part that is augmented with our improved Prewitt edge detection algorithm. Fi nally, Peak Signal To Noise Ratio (quality measurement) will be highlighted. 1.3.1 Introduction Smoothing functionality is at the center of digital image proc essing in which many researchers have elaborated to enhance some of the existing smoothing mechanism s The reason is that digital image s are usually corrupted with different type s of noise [26] Therefore, choosing a suitable smoothing mechanism is a very challenging process especially when smoothing becomes a part of an edge detection algorithm Suitable smoothing mechanism implies not only being e ffective in noise suppression but also in keeping important details of an image unchanged. It is known that many of the smoothing mechanisms are very suitable for suppressing different type s of noise such as the average filtering, but at the same time it is not guarantee d to keep details of an image unchanged In other word s it is significant to ensure that the smoothing filter used ( for the sake of noise suppression ) will not destroy real edges which are the most valuable information for edge detection algorithms. By the time we have shown the importance of smoothing mechanism that can be augmented with our design o f the improved edge detect ion algorithm that is based on the eight direction Prewitt The smoothing mechanism presented in this study is referred to as the Improved Median filter which has

PAGE 23

11 been (based on our results) proven to be more capable in suppressing impulse and other types of noise than the well known Standard Median filt er which will be explained next. 1.3.2 Median F ilter The performance evaluation study done by Kumar, Srinivasan, and Rajaram h as shown in [26] that most digital image s should be smoothed with linear filter known as low pass filter. However, low pass filter will blur the edge and destroy lines, corners and other important details found in an image. These reas ons have led researchers to be concentrate more on employing n on linear filters which has less negative impact on real edges. Median filter is a nonlinear filter used to remove impulse noise with the least negative impact on the real edges [ 8, 26, 27 28 ] For each pixel, m edian filter starts arranging all the pixels within the specified window in ascending order to choose the median value [ 8, 13 26, 29, 30 ] This approach allows keeping some real values unchanged as opposed to the blurring technique that takes the average of all pixel s within that neighborhood. Median filter is used often in image and pattern recognition application due to its abil ity of suppressing, impulse and other type s of noise very effectively with less negative impact on the real edges [ 8, 13 26 28 29] 1.3 .3 Peak Sig nal To Noise Ratio (PSNR) as a Quality M etric Applying image processing operations on images will cause the loss of important information and often change s the quality of the image [31] Nowadays, it is becoming very necessary to allow evaluating the quality of an image within different image

PAGE 24

12 processing applications such as information hiding, art, pattern recognition (to name a few) [32, 33] a daptive image segmentation based on peak signal to noise ratio Peak Signal To Noise Ratio (PSNR) is one of the most common objective quality estimator techniques which calculate s the mean squared error (MSE) between the processed and the original image [26, 30, 31, 34, 35, 36, 37, 38] Next, the study will show how this well known quality measu rement technique works. Having an image with M height, N width, and B number of bits are used in image construction (based on B the maximum possible intensity is specified). PSNR (Img1, Img2) = 10 log 10 MSE (Img1, Img2) = Simplicity is o ne of the main reasons behind the popularity of PSNR [38] From the equation above it can be observe d that the main idea is to find the mean squared error ( MSE ). MSE is t he square summation of the difference between all the intensity values of the original image and all the corresponding intensities of the pro cessed one. The higher result (d B) of the PSNR function implies better quality ( closer to the original image ) This technique is also u tilized to judge the efficiency of any image compression method. Yet, in the present work this technique implemented i n Matlab to be a proof of the quality gained from applying our improved smoot hing mechanism compare d to the result of the original standard Median filter

PAGE 25

13 1.4 Iterative Thresholding an d the Most Well known Iterative Thresholding In this part, a short introductory regarding t he importance of thresholding will be given. Also, some of the main iterative t hresholding mechanisms, are heavily employed in the image segmentation and detection field, are covered. 1.4.1 Introduction As it is mentioned earlier, thresholding mechanism has become one of the main cores in various image processing applications due to its intuitive properties [8] Thresholding is widely used in image segmentation due to its simplicity and the ease of its implementation [39 40] The idea of having an iterative method of choosing the threshold value, based on the image data itself, is a preferred approach in image processing. In the next parts we will explain some of the well known iterative thresholding mechanisms which are currently in use and employed in our improved Prewitt design. 1.4.2 O tsu Thresholding Otsu is one of the most popular thresholding mechanisms known of it is effectiveness [ 8, 39] This thresholding, which is based on histogram, maximizes the between class variance to allow for choosing a n optimal threshold ing value to segment the image prop erly [ 8, 41 42 ] Otsu is mainly designed to relay on 1 dimension (1D) histogram that is, according to some analytical studies proven to fail process ing obvious valley located in between two peaks [39] In order to overcome this p roblem, different researches have contributed to come up with an improved version of Otsu such as the 2D

PAGE 26

14 Otsu explained briefly in [39] On the other hand, there are other global iterative thresholding mechanisms which can be less complex than OTSU while still producing sat isfactory results as the basic global thresholding which is explained bellow 1.4.3 Basic Global T hresholding The basic global thresholding is a well known thresholding mechanism that is famed of its computational simplicity, acceptable quality and applic ability to a variety of applications [8] This algorithm works iteratively to find the optimal thresholding value depending on the data of the image itself. We have implement ed this algorithm and augment it with our improved design of the algorithm as brie fly explained in 4.2.b. 1.5 The Significance of the S tudy The need for an efficient recognition application has motivated us to come with a new improved design of the existing edge detection algorithm known as Prewitt. Coming with an effective design and implantation of this algorithm can expedite the move towards real time recognition applications. Our long term goal is to eventually have an optimized high quality para llel and sequential versions of the algorithm that are c apable of detecting efficiently. While the best known industry standard algorithm is Canny, it is shown to have high computational complexity [16] T he well known Prewitt edge detection algorithm has been used in the present study for its simplicity and it unsuitable for most common image processing applications [8, 16, 22, 23] For this reason the present work wil l set toward strengthen ing Prewitt edge detection algorithm to

PAGE 27

15 let it be more capable working in a practical environment while performing at a rapid speed. 1.6 Statement of the P roblem The original bidirectional Prewitt algorithm is known for its good computational complexity but noise sensitivity; lack of an affective smoothing mechanism has mostly disqualified its usage. Analytical studies conducted have indicate d that Prewitt edge d etection algorithm cannot be used along with practical images that are often corrupted with impulse noise as well as Gaussian and Poisson noises [8, 16, 22, 23]. The m ajority of images are often inflicted by varying degree of noise caused by transmission c hannels and camera sensors [22] Furthermore, Prewitt algorithm does not incorporate a thresholding mechanism in order to produce binary images. Most object detection applications rely on background subtraction. The presence of only two values in the resul ting binary image, one representing the object and the other representing the background, is desirable in object detection applications [40] Also, through this work our goal is to answer the re search questions shown below : 1. Is it possible to come up with an improved version of Prewitt edge detection algorithm that results in a better quality ? 2. Coming with a better quality algorithm obviously implies additional complexity so the question is h ow to overcome the additional complexity added to the original bi directional Prewitt algorithm?

PAGE 28

16 3. If we come up with an improved version of the Prewitt, then how efficient the algorithm will be both term s of quality and execution time, compare d to the original Prewitt and the well known Canny edge detections ? 4. Will the im proved version of this algorithm perform well unconditionally? 5. How suitable the improved version of the algorithm is for real time edge detection application? T o answer the first question, the goal of the present study is to enhance the quality of the bidirectional algorithm to overcome all its weaknesses affecting its practical usage. The main plan is to augment the original with ad ditional functionalities. These functionalities should allow the algorithm detecting edges in all eight directions (inste ad of two only) adding both an affective smoothing and iterative thresholding mechanisms. Also augmenting more functionality to the original Prewitt conflicts the main characteristic of Prewitt which is good computational complexity. For this reason, if we are able to come up with an improved version of Prewitt algorithm then it we will be required to solve the problem of adding more complexity to the original algorithm More computation complexity implies more time consuming which can affect the u sabili ty of such algorithm. In order to provide an answer for the second quest ion, efficient sequential and parallel version s of the algorithm should be implemented to overcome the complexities added to the original Prewitt Measuring the efficiency of the improved version of the algorithm requires comparing it with an optimal edge detector used to evaluate other edge detection algorithms which is Canny. Therefore to answer the third question, both quality and

PAGE 29

17 execution time of the improved Prewitt and Canny (already implemented in Matlab) will be compared Next performing well implies o vercoming many well known challenges in the field of image processing. An e xample of these challenges is effectively processing different image sizes with extreme high resolution images. Processing high resolution images is very common practice especially in the field of b iomedical image processing Also, the need of flexibility in processing is another challenge in which only specific region of the image needs be processed instead of the whole image. Finally, if the presented algorithm is efficient then the second question will be, is the algorithm efficient enough to serve in real time edge detection application In other word s real time edge detection application requires high process capability to allow detecting edges in real time. Many researches, as show n in 2.4.1 have contributed in specify ing the main specifications of an algorithm (implementation) to be qualified for real time application. Thus, based on our experimental results we will show whether it is possible or not for our improved Prewitt edge detection application to satisfy all requirements of needed by real time application. 1.7 Thesis Organization The rest of the thesis is divided as follows Chapter 2 gives a brief background of the terminologies and main concepts which help in the understanding of the later work. This will include a brief explanation of the image processing operations that are commonly used. Examples o f these image processing operations are smoothing, localizing and thresholding. Also, in this chapter, we will cover some of the related work

PAGE 30

18 of parallel image processing and how to detect parallelism in a sequential algorithm in addition to the programmin g languages that is employed in our work. Next in experimental setup C hapter 3 the architectures of the multi core platforms are briefly explained. Also, the main accessories applications employed to support our main implementation are covered Later, in Ch apter 4 a brief description (Pseudocode, flow chart) of our augmented ( design and implementation) of the improved eight direction Prewitt algorithm is covered T he design presented includes both the sequential and the parallel version of the improved algo rithm. The results are presented in Chapter 5 These results show the quality of our improved smoothing mechanism, quality of the improved eight direction Prewitt, execution time taken by the sequential and the parallel application, and the answers for th e research questions. Finally in C hapter 6 the conclusions and propose the future work will be summarized

PAGE 31

19 2. B ackground In this chapter a brief introduction of the terminology used in this work is given to help in the understanding of the later work. We start explaining some of the fundamental concepts of digital image processing and computer vision fields with some of the related works that have been taken place in these fields. Also, we will highlight s ome ideas of parallel image processing, detecting parallelism in sequential algorithms, and the main challenges in the field of image processing. 2.1 Basic Definition In this section, the definition of the basic and standard notations will be covered in orde r to simplify the reading throughout the rest of our work. Most of the notations are Digital Image Processing Using MATLAB Granzales, Woods and Edds [8] Learning OpenCV [7] 2. 1 .1 Most Common Types of a Digital Image Digital image is defined as dimensional function, f(x, y) where x and y are spatial coordinates and the amplitude of f at any pair of coordinates ( x y ) is called the intensity or gray level [8] Each element of any digital image is usually presented by the term pixel which the range of its value is relevant to the number of bits used to construct that image [43] For instance the value of any pixel that is part of a gray scale 8 bit image shoul d be the integer values in the range of 0 and 2 8 1. These values will represent a shade of gray where 0 is the darkest (black) while 2 bits is the lightest (white). In other word s the number of bits used in representing an image will

PAGE 32

20 determine the possible colors (different intensities) that perhaps appear within that image. Moreover, images can come in other forms such as b inary, indexed, and RGB images. The binary image, known as a logical image, has only two intensities that are eit her 0 (black) or 1 (white). On the other hand color images, such as RGB images, are the result of combining d ifferent gray scale channels where each channel represent s a specific color It is a common practice in the field of image processing to convert from one image type to another. Digital images can be saved within different formats such .JPEG, .PNG, .PNG, .BMP, .TIFF, etc. Some of these formats are lossy compressed such as .JPEG ( Joint Photographic Experts Group) while .TIFF (T agged Image File Format ) is lossless uncompressed file format. Yet, applying the same algorithm on different image extensions possibly results in a different output especi ally with algorithms that are highly dependent on abstracted data. Nevertheless, there are some well known images which are considered to be standard dataset that are used for testing different image processing algorithms. These images such as Lena, Baboo n, Cameram an, Peppers, Living room, House etc. These images have good textures that qualified them for testing various image processing operations In our work, we have depended on some of these images to allow comparing our results with other related work s. 2. 1 .2 More about Edges Edges are mainly caused w hen sharp change of intensity takes place This instance changing in intensity is resulted from either geometric or non geometric events [17] Edges are mainly caused by the discontinuities in intensities which can be either as step or Ridge (line) discontinuities [19] Digital Image

PAGE 33

21 Processing, 3 rd ed S tep intensity refers to the instance changes caused by intensity discontinuiti es as seen in Figure 2.1.a (happens instantly) On the other hand, we have Ridge (line) discontinuities when the change of intensity occurs over short distance then return s to the original value, as seen in Figure 2.1.b. Also it is possible to have the changing of intensity on finite distance instead of sharp changing, which we call Ramp and Roof discontinuities as presented in the figure below [11 17, 19 ] a. Step Discontinuties b. Ridge Discontinuties c. Ramp Discontinuties d. Roof discontinuties Figure 2.1 : Edge in images 2.1 .3 Noise The m ajority of digital images are inflicted by varying degree of noise caused by transmission channels and camera sensors [22 26 44 ] Noises are known to be artifacts caused by the capturing device or the transmission media [7] There are various types of noise, namely, impulse, Gaussian, Poisson, Speckle, etc. Due to the fact most images are corrupted with different type of noise, den oising (smoothing) has been known to be one of the most common operations within the image processing fields and its application [26] Impulse noise is a noise model where pixels are replaced by either maximum or minimum of the range of intensity values al lowed. For instance when we have an 8 bit gray scale then the noise value will be either 0 (black) or 255 (white) [ 4 5 ] A very common impulse noise i s known as Salt and Pepper (S&P). This type of noise as shown

PAGE 34

22 in different analytical studies cannot be han dled by the original Prewitt algorithm that is known of its noise sensitivity Therefore, in this work, our focus is set to enhance the exhibit s Images which are impacted with different type s of noise Lena (original) Salt &Pepper 10 % noise Poisson noise Gaussian 10 % Speckle 10% Camera man (original) Salt &Pepper 10 % noise Poisson noise Gaussian 10 % Speckle 10% Figure 2.2: Lena and Camera man images corrupted with different type s of noise From Figure 2.2 we can see there is noise percentage added to most of noise types, this value reflects the noise density. In order to guarantee the integrity of the noise density we have used Matlab to add these noises to the original image. The function imnoise is alrea dy provided by Matlab for this reason. This function imnoise accepts some parameters as the following example shows: J = imnoise (I, type, parameters) in which I is the input image, type is the type of noise, and the other parameter is the density of noise the result is saved in J [8] This implementation is one of the accessories to support our main implementation that is shown in 3.2.2 Generally noises can be classified into two groups additive and multiplicative, additive implies that the noise is ind ependent from the original data while it is dependent in case of multiplicative [ 19, 4 6 ]

PAGE 35

23 Different smoothing techniques are used to suppress various types of noise. Suppression impulse noise is known to be very challenging. Therefore, in this work we are improving a smoothing mechanism that is capable of suppression impulse and other types of noise effectively. 2.2 Common Image P rocessing O perations In o rder to understand the rest of the presented work, it is necessary to cover some of the main classification s and used operations within image processing field that is related to our work. Image processing operation s are classified to fit within three levels, (low, intermediate and high levels). Low level implies that operations are applied on the pixel level while intermediate, which is derived from the low level operation, allows making some decisions such as object tracing and face recognition. Finally, high level, where images can be classified a nd different relationship s among them can be drawn [ 3, 8] Filters are usually categorized into two main classifications linear and nonlinear filters B oth operations are commonly used in signal and image pro cessing. Linear filtering is originally used with Fourier transform in the field of signal processing working on the frequency domain [8] Yet, in our work we are focusing on spatial domain which is processing the intensity values of the digital image T hi s is known as linear spatial filtering. 2.2 .1 Linear and Nonlinear Spatial F iltering Linear filters are known as neighborhood processing, where the operations are applied on a predefine d neighborhood window. The linear operation starts multiplying the intensity of each pixel by the corresponding coefficient of the window matrix then

PAGE 36

24 sums the results to obtain the value of the anchor pixel [8] The coefficients are arranged into matrix calls window, filter, Mask, kernel, filter mask or template and the linear operation is known as convolution filter convolution mask or convolution kernel [8] There are two main operations related to linear spatial filtering namely correlation and conv olutions. The two operations are almost identical except with convolution the window w is rotated by 180 degree before applying the earlier mentioned step on the input image [ 8, 19] A similar figure to the one below Digital Image Pro by Gonzales, Woods, and Eddins (3 rd edition, 2010) to exhibit the one dimensional correlation and convolution: Correlation Convolution Origin w 0 0 1 0 0 1 2 3 0 0 1 0 0 (Starting the alignment) 1 2 3 0 0 0 0 1 0 0 0 0 (Padding with Zeros) 1 2 3 0 0 0 0 1 0 0 0 0 (Position after one shift) 1 2 3 0 0 0 0 1 0 0 0 0 (Position after two shifts) 1 2 3 0 0 0 0 1 0 0 0 0 (Position after three shifts) 1 2 3 0 0 0 0 1 0 0 0 0 (Position after four shifts) 1 2 3 0 0 0 0 1 0 0 0 0 (Final Position) 1 2 3 Correlation Result: 0 0 3 2 1 0 0 Origin w rotated 180 degree 0 0 1 0 0 3 2 1 0 0 1 0 0 (Starting the alignment) 3 2 1 0 0 0 0 1 0 0 0 0 (Padding with Zeros) 3 2 1 0 0 0 0 1 0 0 0 0 (Position after one shift) 3 2 1 0 0 0 0 1 0 0 0 0 (Position after two shifts) 3 2 1 0 0 0 0 1 0 0 0 0 (Position after three shifts) 3 2 1 0 0 0 0 1 0 0 0 0 (Position after four shifts) 3 2 1 0 0 0 0 1 0 0 0 0 (Final Position) 3 2 1 Correlation Result: 0 0 1 2 3 0 0 Figure 2.3: Illustration of one dimensional correlation and convolution Since we have presented an example of linear operations it is important to show what distinguishes linear from a nonlinear one Naming filters to be either linear or nonlinear, is based on the type of operations they perform. For instance, linear filter can be the summation of the product while a nonlinear filter implies nonlinear operations

PAGE 37

25 applied on a set of pixels [8] Thus letting each anchor point to be the maximum among all neighbo rs located in 3x3 window is a nonlinear operation. 2.2 .2 Denoising As will be explained in section 2.3.3 noises are unwanted artifacts caused by the capturing device or the transmission media. In order to suppress these different types of noise, various smoothing mechanisms (filters) are currently utilized within the field of image processing. As it has been presented previously spatial filters are either linear or nonlinear. There are many linear smoothing techniques such as Gaussian, averaging (blurring), disk, laplacian, motion, sobel, etc. These linear filters usually have some pre specified window that can be convolved with the corrupted image [8] On the other hand, there are some nonlinear filters such as order static, min, and median filters in which nonlinear operations are embedded. The averaging or what is known as blurring is usually used to suppress different type s of noise. The main idea of this filter is to allow each pixel to be the average of a specific number of the neighbors surrounding that pixel. Some of edge detection algorithms use blurring as a smoothing mechanisms works prior to the localization step. How ever, blur ring the image will have a side effect through destroying many of the real edge s during the process of noi se suppression. Looking at the F igure 2.4 we can observe the result from applying the average filter on images impacted with 10 % of impuls e noise. This filter seems not to be very effective to suppress the impulse noise if our goal is to avoid destroying the edges.

PAGE 38

26 (a) (b) (c) (d) Figure 2.4: Applying averaging filter on both Lena and Camera man (a) Lena impacted with 10 % salt & pepper noise. (b) Image a smoothed with average filtering by Matlab. (c) Camera man impacted with 10 % salt & pepper noise. (d) Image b smoothed wi th average filtering by Matlab 2.2 .3 Thresholding The thresholding mechanism has become one of the main cores in various image processing applications due to its intuitive properties [8] Thresholding can be utilized to convert to a binary image in which only two intensities are used. However, this operat ion acquires a decision to specify which pixels to keep as an object or discard as a background. So thresholding is employed to make a decision whether to keep or discard a specific pixel [7] This idea is commonly employed in different application s which require background subtraction functionality. Electing a suitable threshold value will

PAGE 39

27 benefit in the decision making process. For instance let T be the threshold value, and for each pixel check the following two conditions: b(x,y) wi ll have the binary results whose values are either a or b that are dependent on the optimal thresholding value T This brings another problem that is how to determine the optimal thresholding value which is known not to be an obvious process. For instance, electing a high thresholding for T will result in edges that are not connected; missing many real edges. O n the other hand more false edges will be detected when thresholding value is too low [39] For this reason, different techniques are currently utilized to help selecting the suitable threshold values. A good thresholding is described to be the value which maximizes the between class variance [8] Choosing thresholding value can be a very basic process that is ju dg ed visually using trial and error or very advanced mechanisms such as the adaptive thresholding. One of the methods to choose the thresholding value is through histogram which we will describe next. (a) (b) Figure 2.5: Convert to a binary image using thresholding mechanism (a) Baboon RGB image. (b) Baboon binary Image a if f(x,y) > T b if f(x,y ) T b (x,y) =

PAGE 40

28 In the above figure, a color RGB image ( Baboon ) is converted to a binary image using thresholding mechanism and with threshold value T = 120 It is necessary to mention that choos ing a very low or high thresholding value for T can result in either darker or lighter image. In our work, we have augmented our improved Prewitt edge detection algorithm with an iterative thre sholding mechanism that elects the optimal threshold dynamically based on the input data. 2.2 .4 Histogram Histogram is defined in [8] to be the intensity of transformation functions that is based on the information abstracted from the image. It is defied here as digital image with L possible intensities that are in the range of [0, G] where h(r k ) is the discrete function. To find h(r) f or each possible level of intensity k ( which is in the range [0,G] ) the number of pixels in the image which has the same intensity value is counted In other word s histogram will gives information about each possible intensity values and the number of times it appears within that image. The following figure expose s the histogram of a gray scale 8 bit image. As mentioned above, the number of bits will specify the range of intensities value which are [0,255] in 8 bit image.

PAGE 41

29 (a) (b) Figure Plotting the histogram for the image in ( a ). Looking at F igure 2.5 we can observe which intensities have mo re representatives, in baboon image, are in the range of (110 and 125) Also, the user can decide which thresh olding value to initially start with would probably maximize the between class variance. Then according to the visual judgment we can either increase or decrease the thresholding value of the initial guess. The above figure is generated using Matlab 7.10.0 (R2010a) using imhist(I) f unction. Moreover some adaptive thresholding methods are dir ectly dependent on histogram functionality such as Otsu [8] Canny edge detection which we use as a quality reference uses Ots u thresholding that is based on one dimensional (1D) histogram. 2.3 Some of the C hallenges in the Field of Image P rocessing Even though the field of digital image processing is based on mathematical foundation yet visual judgment is still a very effective way to decide whether to accept the result or not In image processing it is expected that process will be repeated over and over up until an acceptable solution is achieved [26] On the other hand, some

PAGE 42

30 applications require processing specific portion of the image which should be taken in consideration in the design stage Sometimes image processing become s time consu ming especially when there are many operation s that are applied in sequence on the same data set. This will become e ven more complex when the data are very large as in the case of biomedical image processing. On the other hand real time image processing, such as object detection alarm system, expect s rapid image processing. After observing some of the common challenges in digital image processing field our focus will set toward the challenges that are tightly coupled with our presented work It is obviou s ly notic e d that the increasing in problem complexity can be reflected on the applicability of such algorithm due to the added time consuming methodology. However, in our work we are designing and implementing some suggestions that aim to decrease the gap between being complex and time consuming. When the performance in term s of execution time of an algorithm is essential factor of its applicability then choosing the right design and implementation will be a real necessity. One of the acceleration methods to eliminate or shrink the gap between complexity and execution time is parallel image processing that will be covered later. 2.3 .1 Real time Image P rocessing Different applications such as tracking and detection systems require re al time processing capability to process vast amount of data in timely manner [4 7 ] There are different factors that should be considered working on a real time image processing applications such as the size of the image, and the platform which will run on Real time applications, as descried by Hiren Patel in [2] require the ability of processing at least 30

PAGE 43

31 frame s per second (fps) In general, t ypical applications require the capability of processing about 30 to 100 of relatively small frame sized per second However 100 (fps) is very challenging to be achieved especially to process complex algorithm. For this reason, the efficiency of the image processing system is dependent on how well both hardware and software are [4 7 ] Parallel image pr ocessi ng model is very essential approach that is employed in real time image processing Therefore, next we will explain the main essential models of parallel image processing. 2.3 .2 Parallel Image P rocessing The field of image processing and computer vision is growing tremendously. The m ajority of image processing applications that are based on edge detection functionality require performing complex operations rapidly [48] An example of such application is ob ject tracking which is used for security purposes. Nowadays, different applications require more sophisticated functionalities such as tracking multiple objects simultaneously [4 7 ] However, in order to come up with high quality edge detection algorithm m ore complexity will be added which obviously has become a time consuming practice. In other word s it is required to meet some challenges and probably some conflicting goals to get high quality edge detection algorithm that is capable of performing at a rapid speed. The concept of parallel image processing is adapted by designers who require implementing fast image processing application. Fully understating of the hardware architecture is required to design algorithms that fit the given resources to perform efficiently [4 7 ]

PAGE 44

32 Von Neumann architecture as it is explained in [4] is divided into four categorizes according to both instruction and data streams. These categorizes are SISD (single instruction, single data), SIMD (single instruction, multiple data), MISD (multiple instruction, single data), MIMD (multiple instruction multiple data) [3] Vector computers such as multicore GPUs (Graphics Processing Unit) rel y on SIMD architecture where the same operation is applied on multiple data simultaneousl y. Our focus in this work is set toward shared multicore MIMD module, e ach processor is a complete CPU, probably it has more than on e core located on the same chip; this allows processing different tasks concurrently. Yet, the other way to increase the thr oughput is to allow these units to finish a single problem faster yet adding more challenging to the implementation Processors in shared MIMD module communicate through memory wh ose cost varies depending on type of memory used such as L2, L3 or main memo ry. In order to employ multicore shared memory we need some parallel programming languages to make it possible implementing parallel algorithms instead of the traditional sequential ones. An example of these tools is a library which can be plugged in to allow multithreading implementation for shared memory architecture known as OpenMP [57] Parallel image processing can be applied in three different ways as data parallel, task parallel and pipeline parallel which is briefly discussed in [3 23] The main two categorization s of parallel image processing are task level parallelism, based on pipelining and fine grained data level parallelism [ 3, 5] Task level of can be classified as fine grained and coarse grained thread of parallelism [5] An example of fine grained is performing a relatively small independent operation on pixels in parallel, while applying smoothing mechanism on two images in parallel is a coarse grained parallelism

PAGE 45

33 [5] Corse grained thread of parallelism is known to be more suitable for MIMD or SPMD environment in which the image can be divided into sub images so as to process individually [49] Yet, t he simplest way to detect parallelism in a sequential algorithm is to look for operations that are independent of each ot her [4] Specifying the way that instructions are dependent on each other will show how much parallelism exists in that algorithm [50] Independent operations are the ones that can be executed concurrently and possibly out of order but the integrity of res ults is guaranteed Data parallel and task next. Data partitioning means that the data of the image are divided among the compu tational units (execution core), w hile task parallel implies that different tasks are assigned to different computational units. In order to use parallel tasks, these tasks should be as independent as possible to decrease the communication overhead. Finally, the idea of pipeline parallel is to combine between data and tasks parallels. Yet, different images are processed simultaneously [ 3, 49, 51] 2.3 .3 P rocess High Resolution Images Biomedical image processing and other fields usually do not only require processing huge number of images but a lso each of the image s can be extremely large (high resolution) For instance, such images can be resulted from super resolution microscopy and high resolution X ray fluorescence microscopy where very details and big pictures are captured. Processing very large images has been known to be very

PAGE 46

34 challenging and time consuming especially when the process is repeated several time s until it becomes acceptable. Processing h igh resolution images imply huge amount of data whose p rocessing might require frequent me mory allocation and de allocation [1] Hence, frequently allocating/de allocating memory adds more complexity and might conflict the goal of implementing fast application (requires running rapidly ) Also, in biomedical image processing it is common to have a relatively small region of interest which is required to be processed individually instead of processing the whole image. In such a case, p rocessing the whole image while only ROI is required to be processed will add useless computation that becomes time consuming as well As we will see in this work, various bi omedical image sizes are tested; they grow to be more than 1 GB for a single image. Hence, to implement image processing applications it is required to have some tools 2.4 Common Programming Languages Utilized for Image Processing and Computer Vision Fields In this section we will introduce some of the main programming tools employed to implement image processing algorithms. We agreed earlier that efficiency of an image processing system is hi ghly dependent on how compatible both hardware and software are. Yet, relaying on easier programming langu il y guarantee more efficient implementation s Thus, in this work we will show how more complex programming languages, such as C/C++ with OpenCV, will provide more efficient applications than more convenient applications ( such as the one implemented with Matlab).

PAGE 47

35 2.4 .1 Matlab Matlab, stands for mat rix laboratory, is a numerical computing environment that combines between the environment of application development and the scientific programming languages [52] Matlab can be employed for different fields such as mathematics, digital signal and image p rocessing (to name a few) where matrix is the basic element [8] This high programming language has gotten more popularity due to its intuitive environment in addition to the high level abstracted syntax which increases usability by non programmers [52] O n the other hand, different researches have been conducted to show how suitable Matlab is for real time image processing operations. Yet, many concluded that Matlab is not very efficient environment for such operations [53 54 ] Also, biomedical image proc essing is currently a very hot topic in which not only huge amount of computations is required but also frequent repetition of the whole process might be needed. However, having more efficient programming language will result in more applicable application s can be employed for complex processing operations as will be shown in Chapter 4 Accordingly we aim to utilize alternative programming language s to come up with more efficient implementation s Comparing the required to process the same data set on the same platform. 2.4 .2 OpenCV OpenCV is open source computer vision libraries written in C and C++ to run on different platforms such as Linux, Windows, and Max OS X [7 15] This Open computer vision library supports hundreds of optimized image processing algorithms mainly designed for real time applications contributing to the field of computer vision

PAGE 48

36 specifica lly [7 43] These libraries lend themselves well to parallel processing [7 54] Primitives (IPP) libraries to run on multicore Intel platform [55] Nowadays, OpenCV library contains more than 2500 optimized algorithm which can be used by professional developer, graduate students and researchers who are interested in image processing and the computer vision fields [43] OpenCV offers some tools utilized to solve computer vision problem, in addition to the high level functionalities sufficient to solve complex computer vision problems [7] Work on this library (OpenCV) started in 1999 by Intel team yet only little progr ess had been achieved between 2002 and 2008. In addition, by 2009 the major developing including an improvement of the C++ interfaces took place [43] NVIDIA joined the development process recently and eventually OpenCV is highly employed with robotics a nd real time applications. Gray Bradski and Adrian Kabler started this work and they are leading the developing team. Bradski and Kabler wrote a book entitled Learning OpenCV which we use as a guide through our work to help in the developing process. Despi te the fact that OpenCV is not as well known as other image processing programming languages, such as Matlab, many well known projects are supported by OpenCV Well known projects such as Google Maps, Google Street view, and Google Earth (to name a few) ar e mainly based on OpenCV [7] Likewise OpenCV is employed in many different security systems, image processing, machine learning, video searching, and robotics applications. As described by Gray Bradski and Adrian Kabler in Learning OpenCV book, this library is basically structured into four compon ents. First, CV component contains basic and advanced higher level, computer vision algorithms. Second, ML (machine learning)

PAGE 49

37 library includes different statistical classifiers and clustering tools. HighGUI the third component contains I/O kernels used for loading/ saving videos and images. Finally, CXCore compon ent has the basic data structured and content [7] Our implementation is based on the combination of two open source libraries that are O penMP and O penCV. OpenMP is mainly designed to allow implementing applications run on for multicore shared memory platforms [56] Moreover, OpenCV can provide a solid support for image processing operations [7] Therefore, employing both open source libraries (OpenMP and OpenCV) can allow implementing efficient parallel image processing operations which we will prove through this work.

PAGE 50

38 3. E xperimental Setup In this work, we have implemented sequential and parallel applications each of which is dedicated for either Win dows or Linux operating system Our implementations are base d on C/C++ programming language On the other hand, the competitors (quality references implementations) are already implemented in Matlab 7.10.0 Thus, the goal is of our implementations. With in the sequential design, open computer vision library (O penCV) is employed while a dditional library ( O penM P ) is added to the parallel version. The platforms which have been employed to test our algorithm are: 3720QM CPU @ 2.60 GHz, 16 GB RAM, 64 bit Windows 7 L 1 D :4 x 32 Kbytes 8 way set associative, L1 I :4 x 32 Kbytes 8 way set associative L2 : 4 x 256 Kbytes 8 way set associative, L3 : 6 Mbytes 12 way set associative More details regarding the architectures of the platforms used to run our parallel application will be covered next. Moreover, some of supporting programs (accessories applications) we have implemented, as cooperation tools of our main implementations wil l be explained 3.1 Main Multicore Shared Memory Architectures Employed to Run the Experiments In a shared memory form, we have m ultiple processors and memories. P rocessors are connected through interconnection network which allows any processor to a ccess any memory. Yet, with distributed memory form, there are multiple processors each has its dedicated memory and the data are transmitted between processors [4 57] However, our main focus in this thesis is on shared memory multicore architecture. The idea behind

PAGE 51

39 multicore processor is to place more than one processor on the same chip instead of having single core only. These cores can cooperate to have more work done concurrently Therefore i n this part the main characteristics rchitecture used to run our experiments are briefly explained This allows better understand ing of the behavior of our implantations. 3.1.1 Opteron Module Opteron module as presented in [58] is designed by AMD to increase the throughput through employing relatively large first and second instruction level caches. Having large caches consequently decreases the memory latency which enhances the overall performance of the algorithm Addressing within an Opteron module can be either with 32 or 64 bi t. AMD adds more register to the Opteron which according to some benchmarks enhances the performance for majority of applications. Three instructions each of size 64 bit can be fetched and decoded at each cycle. Also all instructions used with Opteron ar e fixed in length unlike the previous version AMD Athlon were instructions length varied There are two schedulers, one for integers and the other for floati ng point. The pipelines contain 12 stages for integer and 17 stages for floating point, 7 of them a re needed for the process of fetching and decoding. A unique strategy is used where both L1 and L2 are accessed on each request in parallel Yet the request on L2 is discarded when there is a cache hit at L1 while in case of miss the time required to fet ch data from L2 is saved Another improvement of Opt e ron is the enhanced branch prediction unit which can choose between static prediction and history table. Good prediction unit means less number of miss predictions which each can cost 11 cycles.

PAGE 52

40 Our imp lementations are tested on different platforms multicore shared memory architectures and one of them is a 12 core computed node. Each core is of type AMD Opteron with the following specifications P rocessor @ 2.2 Ghz of AMD (Opteron) type, 6x64 KB L1 instr uction cache per processor, 6x64 KB L1 data cache per processor, 6x512 KB L2 per processor, 6MB L3 per processor, 24 GB RAM, 64 bit Linux version 2.6.18. 3. 1. 2 Bulldozer Module Bulldozer module as briefly explained in [59] is mainly designed by AMD to improve both throughput and power efficiency. Each Bulldozer module has two cores where some resources are either shared or private for each of the two cores The idea of sharing some of the resources such as the Floating Point Unit (FPU) com es in handy to improve the power efficiency. On the other hand there are other resources that are dedicated for each of the execution unit such as the integer execution core. There are many other techniques used to improve the throughput of this module s uch as the prefetching for both instruction and data as well a s the prediction and dynamic scheduling which both are part s of the operations life cycle. F our instructions can be fetched and decoded per one cycle The instruction life star t s by branch predi ction then each of the fetched instruction is decoded in one of the four individual decoders. The prediction, fetching and decoding units are all shared between the two execution cores within each of the Bulldozer unit s. By the time instructions are decode d, they are dispatched among the execution cores. In addition, if some of the instructions require a floating point operation then it will be sent to the FPU shared by the two execution core s

PAGE 53

41 Moreover, to solve some dependency issues each of the execution cores is provided with what is known as instruction retirement where instructions are scheduled fully out of order. Each of the execution cores has four execution pipelines, two ALUs and two Address Generation units AGen each has some increm ent and decrement operations. Moreover, e ach of the cores is capable of serving two loads per cycle which can be as a total of 128 bit. On the other hand FPU is able to perform two 128 bit per load. Nevertheless AMD has provided a different memory hierarchy to support the power efficiency. Each of the Bulldozer module has one 64 Kbyte L1 instruction shared between both cores. The translation lookaside buffer (TLB) for L1 instruction 24 entry table is fully associative, w hile each of the two cores, has its own 16 Kbyte L1 data cache and each has TLB of 32 entry table fully associative. On the other hand the L2 data are shared among the two execution cores has each 1024 entry which are 8 ways associative. One of the nodes used to run our implementation has 64 core s of type Bulldozer ; it also has the following specification s 64 core compute node (shared memory multi processor, processor @ 2.2 Ghz of AMD (Bulldozer) type, 1x64 KB L1 instruction cache per module contains two execution cores, 2x16 KB L1 data cache pe r module, 1x2 MB L2 per module, 16 MB L3 shared by four modules (located on the same chip), 128 GB RAM, 64 bit Linux version 2.6.18. 3.2 Main Accessories Implementations Some accessories implementations have been implemented in the present study to support the main experiments. The main accessories, shown next, are implemented with either Matlab or with C/C++ and OpenCV library.

PAGE 54

42 3.2 .1 Changing to Gray Scale Channel & Converting between Extensions This multifunctional application, implemented by us, is designed to support some common practices that are needed within edge detection applications. With this application we can not only chang e to gray scale image but also split the image into main originator channels and save them if needed. Furthermore, images can come with different extensions, as explained in 2.2.1 Therefore, this application can be employed to convert between extensions easily Nevertheless, this applicatio n is based on C/C++ and OpenCV as shown bellow. /* Change to Gray & Converting Between Extensions implemented with C/C++ and OpenCV */ Img := imRead(argv [1] ); /* Read Image */ Extenstion1 := argv [2] ; /* Get Input */ ToGray := argv [3] ; ToSplit := argv [ 4] ; /*Change To Gray*/ If (ToGray) { ImgGray := cvLoadImage(argv [1] ,CV_LOAD_IMAGE_GRAYSCALE); } end if If (ToSplit) { /* Split RGB image into main channels */ SplitColorImage (ImgRed, ImgGreen,ImgBlue); } endif nge extension */ Figure 3.1: Pseudocode of changing to gray and converting between image extension s

PAGE 55

43 3.2.2 Adding Noise As we have explained in 2.2.3 and will show later in 4.1.3 images can be corrupted with different types of noise that might require different noise suppression technique s However, it is very importa nt to work on accurate data set. Yet, accuracy might not be guaranteed when the dataset is downloaded especial ly if we are looking for detailed information such as noise percentage. Hence we have implemented an application using Matlab that can be employed to add different type s of noise as shown next 3.2.3 Peak Signal To Noise Ratio (PSNR) As we have explained in 1.3.3 Peak Signal To Noise Ratio (PSNR) is one of the most common objective quality estimator techniques This technique calculate s the mean squared error (MSE) between the processed and the original image Thus, we have implemented this technique using Matlab to use in measuring the quality. Similar /* Add Noise Implemented with Matlab */ Img := imread (argv [1] ); /* Read Image */ /* add 30 % Salt and Pepper noise*/ ImgSalt& Pepper3 0 := imnoise (Img, 'salt & pepper' 0.3); /*Save Noisy Image*/ imwrite( ImgSalt&Pepper10 'Salt&Pepper30.tif' 'tif' ); /* add 10 % Gaussian noise*/ Gaussian20 := imnoise (Img, 'gaussian' ,0.2); /*Save Noisy Image*/ imwrite( Gaussian20 Gaussian20.tif' 'tif' ); /* add 10 % Gaussian noise*/ Speckle 10 := imnoise (Img, speckle ,0.1); /*Save Noisy Image*/ imwrite( ImgSalt&Pepper10 Speckle 10.tif' 'tif' ); Figure 3.2: Pseudocode of adding different types of n

PAGE 56

44 procedure is given in Matlab Help. PSNR is implemented as a function which takes two arguments as shown bellow PSNR (Img1, Img2) = 10 log 10 MSE (Img1, Img2) = /* PSNR Pseudocode Implemented with Matlab */ Img1 := imread (argv [1] ); Img2 := imread (argv [2] ); function PSNR = PSNR(Img1,Img2) /* print an error message if Image1 is the same as Image 2*/ if (A == B) { print ('Images are identical: PSNR has infinite value');} end /* find the Maximum value in A */ maxValue : = double(max(A(:))); /* find MSE */ mseImage : = (double(A) double(B)) .^ 2; /* specify the rows and the columns of image A */ [rows columns] : = size(A); /* apply the equation shown above */ mse : = sum(mseImage(:) ) / (rows columns); PSNR = 10 log10( 256^2 / mse); /* Print the d B value 5 digits only */ print( ('PSNR = +%5.5f dB',PSNR)) ; Figure 3.3: PSNR Pseudocode

PAGE 57

45 4. Research Methodology This present research is conducted to come up with a new design and implementation of the eight direction Prewitt edge detection algorithm [1] The original Prewitt algorithm is known of its noise sensitivity and lacking both smoothing and thresholding mechanisms. Our long term goal is to come up with an efficient design and implementation that is mainly proposed for multicore shared memory architecture s. This chapter contains three main parts which are simulation and evaluation of Prewitt edge detection, our improved design of the Prewitt algorithm and f inally the implementation of our improved design [1] Simulating and evaluating of the Prewitt edge detection algorithm will highlight some of the weaknesses taken place in some common scenarios. This assists in better unde rstanding of the main pro and cons of the algorithm. Thus, v isualizing these main weaknesses of Prewitt helps drawing better picture that in turn describes how significant these problems are Therefore, solving such significant problems is valuable achievements which can push forward the detection filed. In the second part, our design is mainly chosen to overcome some of the pr oblems found in part 4.1 Finally, in the third part, the implementation of our augmented design ( 4.2. ) is covered. These thr ee parts demonstrate the main target problems that we will overcome and solve through this work. Augmenting the sub solutions as we will show lead s to an efficient high quality Prewitt edge detection algorithm which performs rapidly in different scenarios.

PAGE 58

46 4. 1 Simulation & Evaluation of the Eight Direction Prewitt In this part we will simulate and evaluate the eight direction Prewitt edge detection algorithm to highlight the main pros and cons of this algorithm. This stage will assist with th e process of coming with an improved version of the algorithm that is based on solid foundation s In the first part of this section, the work is set to portray the main weaknesses that different Prewitt versions suffer from This should clarify the disqualifications of Prewitt that we will attempt to overcome through our impr oved approach of the algorithm. 4.1 .1 Bidirectional Prewitt vs. Eight Directions Prewitt Original bidirectional Prewitt algorithm, as mentioned earlier, detects edges in two dir ections within 90 and 0 degree s The idea of adding six more directions to the original bidirectional Prewitt is first proposed in 2008 by Ci and Chen [17] However, before augmenting this idea with our improved design of the algorithm, the goal was to evaluate that improvement proposed in [17] For the sake of evaluation we implement the eight direction Prewitt edge detection (shown in F igure 4.1 ) to simulate and eval uate the improvement of adding additional directions to the original algorithm. Th e importance of our evaluation is not only significa nt to visualize the quality of the improved Prewitt but also to judge the additional complexity added. As shown in both the flow chart and the Pseudocode (4.1 and 4.2 respectively ) the eight direction Prewitt edge detection convolve s the image with eight different kernels (windows) in order to allow edges detection in all directions. The result of

PAGE 59

47 c onvolving the image with the eight kernels is eight candidates. The maximum of all these eight candidates is chosen to represent that pixel within its final destination. In order to simulate the difference of detecting edges in different angles, figure 4.3 includes samples of these results. It is important to restate the criteria for optima edge detection. T o have a g ood edge detection algorithm, mainly two requirements should be satisfied First, is minimizing the probability of both missing real edges and detecting false edges. Second, a good edge detection algorithm should provide good localization which i mplies that detected edges should be as close as possible to the true edges. Figure 4.1: Flow chart of Eight Direction Prewitt edge detection algorithm Eight Direction Prewitt Edge Detection M M M M Max of Both Images Max of Both Images Max of Both Images Max of Both Images M M Max of Four Images Max of Four Images M Return Processed Image Convolve the Input Image with the Following Kernels

PAGE 60

48 The Pseudocode of the eight directions Prewitt is as follows : /* Where f(x,y) is the input image, k1(x,y)and k2(x,y)are the two mask windows in direction I[imgHeight, imgWidth] := imRead(image); /* imgHeight width. */ k2[maskHeight,maskWidth] := [0, 1, 1 ; 1,0, 1 k3 [maskHeight,maskWidth] := [ 1, 1, 1 ; 0,0,0 ; 1,1,1 ] ; k5 [maskHeight,maskWidth] := [ 1,0,1 ; 1,0,1 ; 1,0,1 ] ; k6[maskHeight,maskWidth] := k7 [maskHeight,maskWidth] := [ 1,1,1 ; 0,0,0 ; 1, 1, 1 ] ; k8 [maskHeight,maskWidth] := [ 1,1,0 ; 1,0, 1 ; 0, 1, 1 ] ; h := (maskHeight 1)/2 ; w := (maskWidth 1)/2 ; for y := 0 until imgHeight do for x := 0 until imgWidth do { Temp : = 0 ; /* initialization */ sum1 := 0 ; sum2 : = 0 ; sum3 := 0 ; sum4 := 0; sum5 := 0 ; sum6 := 0 ; sum7 := 0 ; sum8 := 0; for i := h until h do for j := w until w do { sum1 + : = k1(j,i)*I(x j,y i) ; /*convolution in the first direct*/ sum2 + : = k2(j,i)*I(x j,y i); /* convolution in the second direct*/ sum3 + := k3 (j,i)*I(x j,y i); /*convolution in the third direct*/ sum4 + := k4 (j,i)*I(x j,y i); /* convolution in the fourth direct*/ sum5 + := k5 (j,i)*I(x j,y i); /*convolution in the fifth direct*/ sum6 + := k6 (j,i)*I( x j,y i); /* convolution in the sixth direct*/ sum7 + := k7 (j,i)*I(x j,y i); /*convolution in the seventh direct*/ sum8 + := k8 (j,i)*I(x j,y i); /* convolution in the eighth direct*/ /* select the max intensity */ Temp := M ax (sum1, sum2 sum3,sum4,sum5,sum6,sum7,sum8 ) ; } g(x,y) : = Temp ; /* result image */ } Figure 4

PAGE 61

49 (a) (b) (c) (d) (e) (f) Figure 4.3: Results of applying different directions of the Prewitt algorithm (a) House original image. (b) Prewitt detecting in both 90 and 180 degrees. (c) Prewitt detecting in both 125 and 225 degrees. (d) Prewitt detecting in both 270 and 0 degrees. (e) Prewitt detecting in both 315 and 45 degrees. (f) Eight Direction Prewitt By looking at F igure 4.3 we can clearly see the difference of applying various kernels on the same image. Obviously, it is very noticeable that some real edges are missed when only two dire ctions kernels are applied Yet, the result of eight direction Prewitt indeed leads to a go od edge detection algorithm that has better detection and localization than all the other presented cases. The same algorithm has been applied on many images to guarantee its generality. Figure 4.4 clearly emphasizes difference between the original bidirectional Prewitt and the improved version of the algorithm

PAGE 62

50 (a) (b) (c) Figure 4.4: Original bidirectional Prewitt vs. Eight Direction Prewitt (a) Lena original image. (b) Original bidirectional Prewitt applied on Lean. (c) Eight direction Prewitt applied on Lena. On the other hand, convolving the image with six more kernels implies ad ditional operations that are added to the original bidirectional Prewitt. The number of operations, of applying Prewitt with eight direction on an image of size N x N as it is calculated in [60] will be as follows. Co nvolving the image with e ight kernels requires 64 N 2 addition and 8 N 2 multiplication operations, w hile the cost of convolution used in bidirectional Prewitt only 8 N 2 addition and 2 N 2 multiplication operations is needed Having more number of operations will negatively affect the spe ed processing. Through this work, we will pro vide some solutions to handle the additional complexity added to the original algorithm which is based on parallel image processing 4.1 .2 Processing Different Color Spaces can Impact the Quality Color image s, as mentioned earlier, are the combination between individual monochrome images. For instance, RGB image is an example of color image in which the image is constructed from three individual channels (R) red, (G) green and (B) blue. Most edge detection algorithms are applied on gray scale instead of the individual

PAGE 63

51 channels. There are many well known techniques that are currently employed to convert between color spaces. Gray scale image is a monochrome channel that represents the aver age of all channels. D ata reduction is t he main reason behind choosing the gray scale image instead of the all individual channels [8] Many analytical studies have been conducted to highlight the difference between gray and color image processing [ 8, 12, 41, 61 ] Novak and Shafer in [61] ha ve claimed that 90 percent of the edges are about to be the same between the gray and color images. However, there still 10 percent that will not be found in the gray scale image which can be very essential in edge dete ction application that relies on very high quality results [12] For this reason, in this part we want to visualize the difference among applying the eight direction algorithm on all channels (red, green, blue) and the gray scale one. Showing a difference means a possibility of getting a better quality algorithm. In order to demonstrate and visualize the difference in quality, we started by splitting the color image into its originator monochrome channels in addition to the gray scale one Then we processed each channel individually with our implemented eight direction Prewitt algorithm. The results of simulations, applying the same algorithm on different channels of the Jellyfish color image are presented below (Figure 4.5) (a) (b)

PAGE 64

52 (c) (d) (e) Figure 4.5: Processing different channels with eight direction Prewitt (a) Processing red channel. (b) Processing green channel. (c) Jellyfish original color image. (d) Processing blue channel. (e) Processing gray channel. By looking at F igure 4.5 it is obvious to visualize the difference of applying the same eight direction Prewitt algorithm on different channels. After we have agreed that processing different channels can lead to different results, it is clear to notice that many real edges are missed from processing the green channel (b) On the other hand, the edges in (d) looks very clear compare d to all the others. Yet, the result of processing the gray channel is still acceptable. Nevertheless, there is no true and general answer to decide which channel wi ll always provide better quality working on different datasets Therefore many algorithms have been designed to combine b etween all channels to produce better quality edge detection. Again, improving the quality of an image processing alg orithm is

PAGE 65

53 not free an d usually conflicts with the goal of having fast application that performs at a rapid speed. For instance, as shown in section 4.1.1 t he eight direction Prewitt requires 8x addition and 2x multiplication operations than the original P rewitt. These estimations are from applying the algorithm on gray scale channel only (1 channel). Obviously to process all three channels we will require 24x more addition operations and 8x more multiplication operation s than the original bidirectional algorithm. Obviously such algorithm will not be able to perform rapidly anymore compare d to the original Prewitt algorithm. In our design and implementation we are suggesting some solutions to decrease the impact of the ad ditional co mplexity added to the algorithm. 4.1 .3 Eight Directions Prewitt Working on Noisy Images Prewitt edge detection algorithm is known of its noise sensitivity. The implicit smoothing technique used in [23] calculates the gradient magnitude of the eight directions as a basis to find the final magnitude to reduce the effect of noise. Our experiments show that the gradient based on implicit smoothing approach will not be very effective when the image is impacte d with high noise percentage while it fail s to suppress impulse noise In this section we will compare between applying eight direction edge detection on image impacted with impulse noise. Then we will compare the result with the same algorithm applied o n image without noise. This will support our evaluation of the eight direction algorithm by visualizing Prewitt noise sensitivity to exhibit how major this problem is

PAGE 66

54 (a) (b) (c) (d) Figure 4.6: impulse noise (a) Original Living Room image. (b) Eight direction Prewitt applied on the image in a. (c) Living Room image impacted with 3 % Salt and Pepper noise. (d) Eight direction Prewitt applied on the image in (d) By examining F igure 4.6 we can clearly recognize the big difference between the results of processing noisy and noiseless images using the same eight direction Prewitt. The image in b indicate s good detected results while many false edges, caused by the noise, are shown in d. Also, the image in c is impacted with very small percentage (3%) while in practical case images can easily be corru pted with more noise percentage. This

PAGE 67

55 clearly determines the dilemma of high noise sensitivity which has been a very serious problem for Prewitt edge detection. Therefore, we will represent this figure again in 5.1.2 section to demonstrate the solution achieved through our augmenting eight direction Prewitt with better noise suppression mechanism. 4.2 Design After describing the main pros and cons of Prewitt edge detection algorithm, in this section we will cover the design of our proposed augmented algorithm. The strategy we have follo wed in our design is to solve each of the problems mentioned in 4.1 individually. T hen by augmenting all solutions we will come up with an improved d esign of the Prewitt algorithm as we will demonstrate through this work. In this section our plan is to group problems into sub gro ups as shown next First, in order to overcome the problem of noise sensitivity we will select an appropriate smoothing mechanism to apply as a prefix step to the localization of the e ight direction Prewitt. Second, our choice is to employ the eight direction Prewitt instead of the bidirectional for its better quality as shown in 4.1.1 Third we will augment the improved algorithm with iterative thresholding functionality, used when needed, and works dynamically Finally, we will deal with al l the complexities added to the algorithm in its extremist case scenario This worst case scenario includes suppression noise of level 5 (highest possible noise percentage can be up to 60 % of impulse noise ), detecting in eight directions and applying iter ative thresholding Enhancing the performance of the algorithm running its w orst case scenario will not only guarantee a certain level of performance will not go below but also provide fast er performing on general cases that are less complex.

PAGE 68

56 4.2.1 Selecting an Appropriate F ilter to Use as Prefix Step to E ight direction Prewitt The stand alone Prewitt algorithm does not incorporate any mechanisms to deal with noisy images. The Canny edge detection algorithm uses Gaussian filter or Blurring as a prefix smoothing mechanism [8 17] to address the noise issue. Blurring can destroy so me real edges during the process of noise suppression Bilateral is known to be a better choice because it has a less negative impact on real edges than Blurring. In our work, we propose to use an explicit smoothing technique that can be appli ed on noisy i mages as a prefix step to the eight direction Prewitt edge detection algorithm. To select a suitable smoothing method, we conducted several experiments using Blurring, Median Blurring, Gaussian, Bilateral, Median filtering and an improved version of Median filter. Our improved Median filter, added as a prior step to the localization in the eight direction Prewitt algorithm, shows the best results for suppressing impulse noises such as Salt and Pepper and at the same time is able to suppress other types of n oises such as Gaussian and Poisson. Median filter is a nonlinear filter used to remove impulse noise with the least negative impact on the real edges [8 26 28] This filter starts arranging all the pixels within the range of the specified window in ascending order to choose the median value. This approach helps keep some real values unchanged as opposed to the blurring technique that takes the average of all pixe ls within that neighborhood. The main idea behind our improved Median filter is to allow variable window size instead of the fixed one employed in the standard Median [26] Larger window sizes can be used effectively in processing images with higher noise percentage rel ying on the same mechanism of Median filter [1] Original Median filter has a fixed window size, which does not differentiate whether the image is noiseless or corrupted with very high noise percentage.

PAGE 69

57 In fact, usually standard Medina filter uses a window of size 3x3 while using larger window size requires more number of operations as will explain bellow Applying Medina filter with window of size 3x3 will require (for each pixel) sorting 9 intensities. In other word s for an image of size M xN we will need 9 MN operations to apply Medina filter. On the other hand, applying Medina filter of window size 9x9 will require ( for each pixel ) sorting 8 1 intensities that is about 9 times more than the standard 3x3 Median filter. Choosing unsuitable window size can lead to two different conflicted results which are either adding useless computation or employing less affective smoothing mechanism. To avoid both cases, our i dea is to combine between the noise level (the image corrupted with) and the size of the window used with the Median filter by having flexibility in the size of window employed In the current version, we provide the users full control to judge the intensi ty of the noise in order to determine the desired window size. We recommend starting with a window of size 3x3, considered to be the standard case (level 1). If the resulting output reflects detection of false edges, users can increase the size of the wind ow to 5x5 (level 2), 7x7 (level 3), 9x9 (level 4), or double 9x9 (level 5) respectively. Our results demonstrate clear detection of edges in the presence of noise when smoothing is enabled. The experimental results are presented in the results section (C ha pter 5, Table 1 and Figure 5.3) We have measured the quality of our improved Median filter (fixed window size) and compare d it with the standard Medina filter (window size 3x3) working on different images impacted with different type of noise.

PAGE 70

58 4.2.2 Dynamic Iterative T hresholding Thresholding is desirable in many applications; we have added this functionality to the algorithm to be used when needed. There are many thresholding techniques available ranging from visual judgment using trial and error to reliance on global or local methods [7 8, 39] In this work we have chosen to add the basic global thresholding method due to its computational simplicity, acceptable quality and applicability to a variety of applications [8] The technique works iteratively to find the thresholding value as described in Figure 4.7 The procedure as it is explained by Gonzalez, Woods and Eddins, the basic global thresholding starts by assuming an initial guess of the threshold value T In order to have a good guess, and to offer faster convergence the initial T should be the average of all intensities in the original image (img). Then the pixels of the original image are divided into two groups G 1 and G 2 respectively. G 1 will contain all the intensities which are larger than T while the other intensities will be part of G 2 In t he n ext step and in order to calculate the new threshold value T new we will find the average of the two averages of intensities within each group. Let Av 1 be the average of G 1 and Av 2 be average of G 2 The Step 1 : Start with initial guess for the thresholding value T. /* For faster divergence, choose initial T to be the average of all intensities of the assigned work [1 ] */ Step 2 : For each pixel, marked as group 1 (G 1 ) or group 2 (G 2 ): a. if img[i,j] > T, img[i,j] G 1 b. else img[i,j] G 2 Step 3 : For each group, find the average of intensities Av 1 and Av 2 respectively. Step 4 : Find the new threshold value: T new = (Av 1 + Av 2 )/2. Step 5 : Stop if | T new new and repeat the process from step 2. Figure 4.7: The procedure of the Basic Global Thresholding

PAGE 71

59 process will stop if the absolute difference between the old threshold ( T ) and the new threshold ( T new ) is less than or equal the tolerant value. Otherwise, let T = T new and then start over from the second step. We have implemented this algorithm and add ed it to our improved design of Prewitt edge detection algorithm to sue it when needed. T he implementation section exhibit s how dividing the image into sub images improves the result of this thresholding. The reason is that when the basic global thresholding mechanism is applied on different sub images then it will be possible to get more than one optimal thresholding than can be use d to segment the image more effectively. The results of this iterative thresholding is shown in the results section 5.3 column 6 4.3 The Implementation of the Improved Eight Direction Prewitt Edge Detection As we have gone through the main parts of our improved augmented design of the algorithm, in this section we will c over the implementation of th e proposed design The main goal is to implement efficient s equential and parallel versions of the algorithm that perform s well running on different platforms. In order to provide a fair comparison of the efficiency of our algorithms and implementation choices, we compare the sequential execution times of our propo sed method with the existing sequential implementations of bidirectional Prewitt and Canny algorithms with Matlab which are already in use running on the same platform [1] Canny, which is the industrial standard, is known of its complexity while bidirectional the Prewitt is known of its simplicity. Therefore, comparing both runs will give a clue about how efficient our implementation is with all complexities added. Our applications are implemented using C/C++ as the base language with the o pen computer vision library ( OpenCV ) and additional library ( OpenMP ) is

PAGE 72

60 employed with the parallel version. Despite the fact that Matlab is more convenient programming language (can be used in image processing) than C/C++, we have looked for solid based la nguage (which is C/C++) that supports our main motivation of implementing a high performance edge detection algorithm. Also, OpenCV which is based on C++ is contributed to the computer vision field by supporting hundreds of the optimized image processing a lgorithms which lean themselves well for parallel processing [7] This open source library as it is explained in C hapter 2, is mainly designed to support real time application by having efficient implementation of the most usable image processing operation s such as convolution, correlation, reading and writing images. Our parallel implementation, which is based on the efficient sequential one, will not only contribute in overcoming the additional complexities added to the algorithm but also allow performing at a rapid speed which might be utilized by some real time edge detection application s We are offering two implementations for each of the sequential and the parallel implementation that can work on two operating systems Windows and Linux. Next, we will explai n our sequential implementation that becomes our solid basement for the par allel version of the algorithm 4.3.1 Sequential Implementation As the flow chart and the procedure below show the main operations that can be applied on the images are taking places in sequence. First, the images can be smoothed with the improved median filter Next, the smoothed (or the input image if smoothed is not enabled ) will be localized with the eight direction Prewitt algorithm. Finally, the results of the local ization can be processed with the basic global thresholding mechanism

PAGE 73

61 that works dynamically to return the detected edge s of that image We have given the user full control of running our application Hence we are allowing the applications to be adjusted easily to satisfy the main needs. In other word, for the sake of data and operations reduction users can cho o se to enable or disable the additional functionalities added to the eight direction Prewitt edge detection algorithm and use them only wh en needed. In the result s section, we will present both the quality and the execution time of all the different case scenarios. In the F igures (4.8. a & 4.8. b) bellow the procedure and the flow chart of our sequential proposed algorithm are presented. Pro cedure of the sequential proposed algorithm: 1) Initialization a) Input Image b) User specifies the following options: i) Enabled or disabled: Smoothing technique, Thresholding. [toSmooth],[toThreshold]; ii) Noise level of the image (visual guess): 1 very low 5 very high. [levelOfNoise]; 2) If not gray convert the image to gray then do the following: a) De noising i) If smoothing is enabled : (1) Apply improved median filter with specific window size on the image. (2) The size of the window depends on the visual guess specified by the user at 1.b.v. specified level in 1.b.v. b) Localization Eight direction Prewitt ( as described in Section s 1.2.2 figure 1.3 and 4. 1 .1 Figure 4.2 ) c) Thresholding i) If thresholding is enabled : (1) Apply the basic iterative global thresholding ( as described in Section 4.2.2 Figure 4.7 ) d) Return processed image Figure 4.8 a : Procedure of the sequential proposed algorithm

PAGE 74

62 No Apply the Improved Median with Window Size Matches the Noise Level Specified Applying the Eight Direction Prewitt as a Localization Step Apply Thresholdin g Apply the Basic Iterative Thresholding Yes Return the Processed Image Stop No Apply Smoothin g Convert to Gray Start Loading the Image Initialization Color Imag e Yes No Yes Return Gray Image Figure 4.8. b: The Flow chart of the sequential version of the improved eight direction Prewitt edge detection application

PAGE 75

63 In order to measure the efficiency (in term of execution time) of our improved version of the eight direction Prewitt edge detection algorithm, we have compared it with the Canny and the bidirectional Prewitt which b oth are already used in Matlab The results in which we show the efficiency of our algorithm are pres e nted in the results section in Table 2 As we have explained in section 4.1.1 it is required to have 8 N 2 multiplication operations to perform the eight direction Prewitt algorithm. Yet, for the sake of operation reduction OpenCV allows us to employ lookup table (LUT) approach. This mechanism can pre compute all the possible outputs and assign a unique index t o each of these possible configurations. For instance, when we have a 8 bit image and a eight fixed 3x3 window s that we convolve the image with then we will have a maximum of 2 8 x 9 x 8 that are 18432 possible outputs which looks a bit expensive especially if it is required to process relatively small images However, in our case of eight direction Prewitt, as Figure 1.3 shows, we have only three different values used in all different eight windows which are 1, 0 and 1. For this reason, the need will be to index only 2 8 x 3 that are 768 values. This few numbers of values is very little and will come in handy by saving many number s of multiplication operations that will be needed if LUT was not in use To clarify this idea we will calculate the number of multiplications required to localize 8 bit 256 x 256 gray scale image with the eight direction Prewitt. In normal case it is required to perf o r m 8 x ( N x M ) multiplications that is 524 288 multiplications. However, we will need to index the multiplication of only 768 values with LUT. Then for each value we will find the corresponding value in the indexed lookup table and replaced it without performing actual multiplication that is known to be expensive in term of number of cycles. The given example was on extremely small image size. Not surprisingly,

PAGE 76

64 increasing the dimension of the image indeed will not increase the number of values need ed to be pre computed, w hile this is not the case when LUT is not employed as we will see next. In one of our experiments we have applied the eight direction Prewitt edge detection on a biomedical image of size 29649 x 22008 which means the number of multiplication needed will b e about 5 million multiplication operations which is obviously a time consuming process I n addition to the efficiency gained from employing powerful programming language (C/C++) and powerful open computer vi sion library (OpenCV), data and operation reduction has been one of the main cores of this design In particular data and operation reduction in our design is categorized into four main parts. First, users are eligible to enable the additional functionalities (smoothing & threshold ing) when needed. Second visualizing the noise percentage by the user indeed will allow choosing a suitable window size for our improved filter and cause larger window size which implies the need of more operations. Third, the lookup table ( LUT ) approach a ssists in reducing the number of multiplication operations needed as explained previously. Finally to satisfy both desirable of some biomedical image processing (processing ROI) and for the sake of data reduction, we are allowing a neat functionality in o ur design that is processing region of interest since it will help avoiding useless computation. To illustrate the implication of our improved algorithm, the interface of our application is presented and its work is explained.

PAGE 77

65 (a) (g) (b) (c) (d) (e) (f) Figure 4.9: Snapshot of our Region of our application that allows processing ROI (a) Original image impacted with 10 % of impulse noise. (b) Crop sub image (ROI) to process. (c) Localizing the sub image in b with eight direction Prewitt. (d) Sub image in b smoothed with the improved median filter. (e) Localizing smoothed image in d with eight direction Prewitt. (f) Thresholding the localized sub image in e with the basic glob al iterative thresholding. (g) Terminal which shows the iterative steps that is followed to choose the optimal thresholding value This application works in real time in which the user chooses the name of the image no matter which extension it has. Then u sers can specify the region of interest by pre ssing left click and drag the ROI The results as shown F igure 4.9 will be five interfaces. Obviously the sub image will be smoothed, localized and thresholded as

PAGE 78

66 shown in the above figure. Our application response immediately without any delay to process any ROI specified. Pressing right click will clear our previous ROI choice and we can start again This application is not only provided to process only specific region of the image but als o help s the user to choose the best configurations such as noise level, and to decide whether to use thresholding or not. Finally by l ooking at the runtime results in the results section we can observe that our sequential implementation over perform s both the bidirectional Prewitt and Canny which are already implemented in Matlab. In other word s our implementation b ased on C/C++ is very efficient especially when we see how complex it is compare to the original bidirectional Prewitt that is already implemented in Matlab. Showing the efficiency of our sequential implementation is very significant in our study The main reason is that we want to show that our improved quality eight direction Prewitt edge detection can be employed with real time edge detection application Our next step of improving Prewitt edge detection algorithm is to implement a parallel version of our improved Prewitt edge detection algorithm This version is mainly design ed for shared memory MIMD multicore platforms. 4.3.2 Parallel Implementation to Accelerate the Process of Detection As we have mentioned earlier, Prewitt edge detection is known of its simplicity in term s of number of computation. A dding a smoothing mechanism with variable window sizes, six additional directions to the localization ste p and a global thresholding mechanism adds computational complexity to the original bidirectional Prewitt algorithm [1] These additional computation s affec t the overall performance. A parallel version of

PAGE 79

67 the proposed algorithm for the new shared memory MIMD multicore platforms is designed and implemented in order to speed up the computation. Our parallel application is based on the efficient seque ntial imple mentation which over perform s both Canny and the bidirectional Prewitt edge detection algorithm which have already use d in Matlab. To measure the performance of the parallel algorithm we will compare its performance in term s of execution time with our seque ntial implementation wh ose efficiency is already proven The parallel algorithm is designed and implemented using C /C++ as a base language and two open source libraries OpenMP and OpenCV to overcome complexities added to the original algorithm. OpenCV libr ary supports hundreds of optimized image processing algorithms mainly designed for real time applications contributing to the field of computer vision specifically [7 43] These libraries lend themselves well to parallel processing [7 54] As we have mentioned in 2.4.2 fully understating the architecture is required to design high performance application. Moreover, p arallel image processing can be applied in three different ways as data parallel, task parallel and pipeline parallel which is briefly discussed in [3 23] Task level parallelism and fine grained data level parallelism are the main two categorization s of parallel image processing. In our design we have tried both approaches, yet coarse grained proves according to different experiments to better match the MIMD multicore shared memory platform. With this approach we aim to process different part s of the image individually and simultaneously. Therefore we are required to have data partitioning mechanis m in which image is divided among the computation unit s Parallel tasks are not matching the design of our algorithm since main operations such as smoothing, localization and thresholding are dependent on each other.

PAGE 80

68 In other word s thresholding will be ap plied on the results of the localization that should be smoothed previously. So this chain of operations is required to be processed in sequence in order to provide correct output. On the other hand, dividing the image into suitable sized sub images will provide independent chunk of data that can be processed in parallel. Yet, o ne of the main challenges working on shared memory multiprocessors is the work distribution [ 3, 4] Work, for good distribution, can be divided into rows, columns or blocks as presented in F igure 4.12 Us ers are eligible to specify the mechanism of partitioning and the number of sub partitions (# of sub images). For instance when we have M x N image the sub images are specified as follows If the number of sub images equals x then we will have x sub images each of size x N. While partitioning into column will lead to have x sub images each of size M x Partitioning into blocks result s in x 2 sub images each of size x p rocessing t he sub images can be processed in parallel using multicore processors and then can merge the results into one final image. Using appropriate mechanisms for data partitioning not only provide s an independent chunk of data that can be processed concurrently but also will add flexibility in tuning the algorithm In particular, tuning mechanism allow s the algorithm to work more efficie ntly on different architectural platforms which have been proven to enhance data locality which in turn reduces the number of t ime consuming cache misses. Work distribution starts by allowing each of the processors to copy its assigned private data to its local memory. To gain efficiency, it is desired to decrease the number of times needed to copy data from slower shared memory to local memory at each computational unit [1] The algorithm starts dividing the image into a number of equal

PAGE 81

69 sized tiles (sub images). Parallel processes will work independently on different sub images using a self scheduling technique for work distribut ion. Each processor applies smoothing, localization, and iterative thresholding before writing the processed data to its final location as shown in F igure 4.10 As seen in Figure 4.10 the parallelism is applied at the highest possible level in which all work, in between the time of loading the main image up until merging sub images, is done in parallel. The original input image is defined as an object (Region Of Interest ROI) in the OpenCV library resulting in the need for some synchronization when t he final result is written back to this object. This is a requirement enforced by OpenCV to guarantee data integrity, therefore, each processor will write the data to its original region of interest atomically. It is best to choose self scheduling instead of pre scheduling for good work balance [56] T his will prevent the problem of having some processors idle while the others have excess work due to varying amount work required in each region of the image. It is important to note that sub images are compl etely treated individually and 1 st Slice 2 nd Slice N th Slice P 1 P n Smoothing ( P i ) Eight Direction Edge Detection ( P i ) Iterative Thresholding ( P i ) Write the result back ( P i ) Merging Figure 4.10: Processing life

PAGE 82

70 are processed within the iterative thresholding mechanism separately. This strategy, according to our experiments can result in better detection where more real edges are detected since each of the sub images can have its own optima l thresholding value. Self scheduling will overcome the difference in time required by applying the iterative thresholding mechanism on the assigned data as the number of computations may vary depending on the data itself. The procedure and the Pseudocode of the parallel ap plication is shown in below in F igure 4.1 1 Also, the partitioning methods which can be employed with our application is exhibited in F igure 4.11 Finally the flow chart of the complete desig n of our parallel application is presented in 4.13 Procedure of the proposed algorithm: 3) Initialization a) Input Image b) User specifies the following options: i) Partitioning Method: rows, columns, or blocks ii) Number of desired sub images, [numOfSubImage]; iii) Number of processes, [numOfProcl]; iv) Enabled or disabled: Smoothing technique, Thresholding. [toSmooth],[toThreshold]; v) Noise level of the image (visual guess): 1 very low 5 very high. [levelOfNoise]; 4) Partitioning data a) Divide into the sub work as specifie b) Divide the work among the assigned processors. Processes will self schedule to obtain the next available work 5) On each of the sub works (if any) do the following: a) De noising i) If smoothing is enabled : (1) Apply improved median filter with specific window size on the image. (2) The size of the window depends on the visual guess specified by the user at 1.b.v. specified level in 1.b.v. b) Localization Eight direction Prewitt ( as described in Section s 1.2.2 F igure 1.3 and 4.1.1, Figure 4.2 ) c) Thresholding (1) If thresholding is enabled then apply the basic iterative global thresholding ( as described in Section 4.2.2, Figure 4.7 ) d) Merging data i) Merge the processed sub work to its final destination. e) Return processed image

PAGE 83

71 Pseudocode of partitioning into rows ( The Pseudocode conventions are taken from [ 4 ] ): procedure parmin( initialization as specified in step 1 ) Img [imgHeight,imgWidth] := imRead[imgName]; /* Loading the image */ workLoad := imgHeight/numOfSubImage; shared img, destImage, numOfSubImage, levelOfNoise, numOfProc, toThreshold,toSmooth, workLoad, imgWidth, imgHeight; private i, subImg,x,y; Self Scheduled forall i := 0 until numOfSubImage /* Dynamic Scheduling of parallel processes */ begin /* partitioning data into sub images each having the same imgWidth but with specific height*/ x : = 0; y := i workLoad; subImage := Rectangle( Img, Rect(x, y, imgWidth, imgHeight/numOfSubImage)); /* copy specified rect to subImage */ if (toSmooth) then /* De noising: if smoothing is enabled */ subImage := Smoothing (subImage,LevelOfNoise); end /* Localization: Eighth direction Prewitt */ subImage := eightDirections (subImage); if (Thresholding) then /* Thresholding: if thresholding is enabled */ s ubImage := iterativeThresholding (subImage); end critical work; /* Merging Data: /* Writing Data to its final destination*/ mergeSubImage (destImage, subImage, Rect (x, y, imgWidth, imgHeight /numOfSubImage )); end critical; Release(subImage); end Retur n (destImage); /* Return Processed image*/ End Procedure Figure 4.11 : Procedure of the parallel version of the proposed algorithm

PAGE 84

72 (a) (b) (c) (d) Figure 4.12: Partitioning the image into sub images and apply eight direction Pre witt on each of the sub images (a) Original Image (b) Divide into 4 rows. (c) Divide into 4 columns. (d) Divide into 4x4 blocks

PAGE 85

73 Figure 4.13: Flow chart of the proposed parallel application. Stop Apply the Improved Median with Window Size Matches the Noise Level Specified Return the Processed Image Apply the Basic Iterative Thresholding Convert to Gray Start Loading the Image Initialization Color Image Yes No Return Gray Image Divide Image into Sub Images Apply Smoothing On each Sub Image Yes No Applying the Eight Direction Prewitt as a Localization Step Apply Thresholding Yes No Merge Sub Processed Images Dynamically Schedule the Sub Images among Threads Already Allocated All SubImages are processed No Yes De Allocate Resources

PAGE 86

74 5. Experimental Results a nd A nalysis In this section, we will go through the main experimental results and analysis of our work. This section is divided into three main parts where in the first part we will exhibit the improvement in quality of our proposed algorithm. Then in the second part, we will expose the performance of our algor ithm in term s of execution time. Finally, we will questions (stated in 1.6 ) individually based on both design and result s covered in both Chapter s 4 and 5 5.1 Quality of Proposed Algorithm In this part, we will demonstrate the improv ement in qu ality of our proposed algorithm. Hence, this part is divided into three main subparts In the first one, we will prove ( through d B numbers ) and visualize ( with images) that our proposed Improved Median filter (uses variable window size which we augmented with our improved design of the algorithm ) is more capable of suppress ing impulse noise than the Standard Medina filter (uses fixed window of size 3x3). Thus to show the difference in quality we have used the well known Peak Signal to Noise Ratio (PSNR) that we briefly explained in 1.3.3. Also we will visualize the differe nce between both Standard and Improved filters by applying them on images impacted with up to 7 0 % of impulse noise s In the second subsection of part one, we will visualize the difference in quali ty between our improved algorithm and both Canny (industrial standard) and the bidirectional Prewitt where both Canny, and the bidirectional Prewitt are implemented with Matlab. 5.1. 1 The I mprovement of P roposed Median filter To compare the quality of the p roposed Improved Median filter that uses variable sized windows with the standard Median that uses a fixed window of size 3x3, we use the

PAGE 87

75 well known Pe ak Signal to Noise Ratio PSNR (d B) [13] The results are shown in Table 1 in which the test image (Lena s ee, Figure 5.1 and Figure 5.3 ) is corrupted with varying degrees of impulse noise. As shown in Table 1, the improved Median filter generates significantly higher quality output than the standard in every case [1] The comparison of various window sizes and various noise levels and types can be seen from the first (input image indicating n oise type and level) and fo u rth Median filter and the corresponding window size. Each row of the figure represents application of different algorithms to the input image in column 1. The higher d B implies closer image to the original. To visually demonstrate the quality of the proposed Median filter compare d to the Standard filter, the output of both filters applied on an image corrupted with 70 % of impulse noise is shown in F igure 5.1. Table 1 : Comparison of quality between standard Median filter and our proposed one using PSNR applied on Lena impacted with different percentage of impulse noise Imag e: Lena 512 x 512, (see Figure 7 for image) Impulse Noise Percentage Standard Median filter PSNR(d B) Improved Median filter PS NR (d B) Window size Noise level 10 33.5798 33.5798 3x3 level 1 20 29.5107 30.2259 5x5 level 2 30 24.1262 29.3523 5x5 level 2 40 19.1859 27.8228 5x5 level 2 50 15.3806 26.646 7x7 level 3 60 12.4786 25.2654 9x9 level 4 70 10.098 23.2942 Double 9x9 level 5

PAGE 88

76 (a) (b) (c) (d) Figure 5.1: Visualize the difference between our Improved Median filter and the Standard Median filter (a) Lena original Image. (b) Lena image corrupted with 70% of impulse noise. (c) Image in b smoothed with Standard Medina filter. (d) Image in b smoothed with o ur Improved Median filter (noise level 5). Figure 5.2: Qualify of the Improved Median filter and the Standard Median filter 0 5 10 15 20 25 30 35 40 10 20 30 40 50 60 70 dB (higher implies better accuracy) Noise Percentage Standard Median Filter vs. Improved Median Filter Standard Median Filter Improved Median Filter

PAGE 89

77 By looking at F igure 5.1, we can clearly observe the difference in quality of applying our Improved Median filter and the Standard one. The image in b is an extreme case scenario in which 70% of the data is corrupted. The standard Median filter cannot handle such a case obviously for the small window size used in the smoothing process. When small window size is used to suppre ss such a high no ise percentage it will be very p ossible to replace the noisy pixel with another corrupted one which clearly shown in c. On the other hand, our Improved Median filter indeed shows a better capability of suppressing impulse noise even in th e extreme cases (such as having image corrupted with 70% of impulse noise). Also same shown in F igure 5.2 we can clearly see how the d B decreases from only from 33 to 23 only (when improved Median filter is applied) while noise increases from 10 to 70%. U nlikely, when stan dard Median filter used, the d B goes from 33 to 13 ; this shows how quality is impacted when noise percentage increases and we keep using 3x3 window size. 5.1.2 Visualizing the Q uality of our Improved Eight Direction Prewitt A lgorithm In the previous section 5.1.1, we have demonstrate d the improvement in quality that is achieved by applying Improved Median filter. Furthermore, this Improved Median filter as explained in the design section ( C hapter 4), is part of our augmented design of the improved eight direction Prewitt edge detection algorithm. Figure 5.3 is presented t o indicate the quality of our improved eight direction Prewitt edge detection algorithm From F igure 5.3 we can visualize the difference of applying our improved edge d etection algorithm on images that are impacted with various types of noise [1] Also, from the same figure, we can compare between the quality of our proposed edge detection

PAGE 90

78 algorithm and both bidirectional Prewitt and Canny (industry standard) The firs c olumn of F igure 5.3 contains the noisy image that we want to process. Column 2, has the output oduced by tuning the supported parameters. Figure 5.3: Output of Canny (industry standard) and bidirectional Prewitt vs. our proposed work

PAGE 91

79 From Figure 5.3 we can see the capability of the proposed algorithm to suppress Salt & Pepper noise (impulse noise) even when the image is impacted with very high noise percentage. Furthermore, the algorithm is able to suppress different types of noise with less impact on the real edges than the bidirectional Prewitt implemented with Matlab In fa ct, ou r proposed algorithm over perform s the bidirectional Prewitt algorithm (that is implemented in Matlab) in all cases. Thus, our answer for the first question is yes indeed our Prewitt edge detection algorithm is an improved version of the Prewitt which is more capable of working on different images impacted with various types of noise. Since additional complexities have been added to our improved algorithm, this to overcome the additional complexity added to the original b idirectional Prewitt algorithm In C hapter 4, we have briefly explained our idea of coming with an efficient parallel implementation that is based on an efficient sequential one to overcome the additional complexities added to our design. Therefore, next we will cover efficiency of our algorithm in term s of execution time and compare it with both other edge detection applications that are in use. In addition, in Figure 4.6 we have shown the result of applying the original eight directional Prewitt on image impacted with impulse noise. Yet, now we can clearly visualize the difference between both cases (smoothed with our improved filter and the un smoothed) as shown below.

PAGE 92

80 (a) (b) (c) (d) (e) (f)

PAGE 93

81 Figure 5.4: Demonstrate the noise sensitivity of Prewitt edge detection working on image impacted with impulse noise (a) Original Living Room image. (b) Eight direction Prewitt applied on the image in a. (c) Living Room image impacted with 3 % Salt and Pepper noise. (d) Eight dir ection Prewitt applied on the image in d. (e) Image in c smoothed with our Improved Median filter (noise level 1) then the Eight direction Prewitt applied on the resulted image. (f) Applying the basic global iterative thresholding mechanism on the image in e. From the above figure we can clearly observe the importance of our smoothing technique which has strengthen the original eight direction Prewitt in ( c ) applied on noisy image The result in e (smoothed with our improved filter) is almost identical with the image in b where we have applied eight direction Prewitt on the noiseless image. Finall y, the image in ( f ) helps visualizing the result of the basic global iterative threshol ding mechanism that is employed to produce binary image. 5.2 The Efficiency of our Proposed Implementations As we have shown the performance of our augmented design (in term of quality), in this section we will cover the performance of our implementation in term s of execution time. In fact, our main two goals in this study (enhancing the quality of Prewitt while performing rapidly ) are conflicted. E nhancing the quality implies additional complexity that will slow down the application. Due to this reason, in the first section we will show how efficient our sequential implementation is compared to some well known edge detection algorithm s that are already in use. Then in the second part we will present the efficiency of our parallel version of the algorithm th at is based on our efficient sequential one.

PAGE 94

82 5.2.1 The E fficiency of the Sequential Eight Direction Prewitt Edge Detection Implementation In order to provide a fair comparison of the efficiency of our algorithms and implementation choices, we comp are the sequential execution times of our proposed method with the existing implementations of bidirectional Prewitt and Canny algorithms which are already in use with Matlab; the results are shown in Table 2. In this table ( Table 2 ) the results of all tests cases of run ning sequentially on the same platform are presented As mentioned in the previous section ( C hapter 4) our algorithm s are implemented using C/C+ + as the base language with OpenCV Table 2 : Sequential run of the presented algorithm vs. the bidirectional Prewitt and Canny edge detections implemented within Matlab Image Details 3720QM CPU @ 2.60 GHz, 16 GB RAM, 64 bit Windows 7 L1 D :4 x 32 Kbytes 8 way set associative, L1 I :4 x 32 Kbytes 8 way set associative L2 : 4 x 256 Kbytes 8 way set associative, L3 : 6 Mbytes 12 way set associative Image Size Proposed 8 Directions Prewitt by itself Proposed : Smoothing (level1) & 8Directions Prewitt & Iterative Thresholding Matlab B idirectional Prewitt Matlab Canny Edge detection 15315 x 11624 169 MB 2.456 sec 3.51 sec 8.719067 sec 55.23517 sec 2250 x 2250 14.4 MB 0.063 sec 0.171 sec 0.8991 sec 2.411595 1536 x 2048 5.91 MB 0.047 sec 0.109 sec 0.263233 sec 1.494402 sec 512 x 512 768 KB 0.0145 sec 0.016 sec 0.137101 sec 0.334044 sec 256 x 256 192 KB 0.001 sec 0.0025 sec 0.026171 sec 0.2432 sec As shown in Table 2, our sequential implementation outperforms bidirectional Prewitt and Canny edge detections. It is important to note that in the table, execution times reported for our implementation include smoothing (with differ ent levels), localization and the iterative thresholding technique. From examining Table 2 (running on Intel i7) processing an image of dimension 512 x 512 takes 16 ms resulting in 62.5 fps ( Frames per Second ) with level 1 smoothing [43 ms (23.2 fps) when it is smoothed with level 5]. Obviously higher noise percentage requires more com putations for good

PAGE 95

83 suppression However, we are able t o accelerate the previous runs using 4 processors, running on the same platform, to 6 ms at 166 fps for level 1 smoothing [and 11 ms at 90.9 fps for level 5]. Also, for instance relatively small dimension images (256x256) run about 10 x times faster than b idirectional Prewitt and 100 x times faster than Canny edge detection implemented in Matlab. 5.2.2 The Efficiency of O ur Parallel Eight Direction Prewitt Edge Detection Implementation Having shown how efficient the algorithm works on images of small dimensions, we present the execution times of our parallel implementation working on larger i mages using two different multi core platforms as it is shown in Table 3 and Table 4 respectively [1] The tables below ( T able 3 and 4) are the results of fou r different scenarios (cases) Table 3 : Execution time for the proposed algorithm running on 12 core compute nodes Case 1: Eight directions Prewitt, Case 2: Smoothing (level 1) & eight directions Prewitt, Case 3: Smoothing (level 1) & eight directions Prewitt & thresholding, Case 4: Smoothing (level 5) & eight directions Prewitt & thresholding Image of size 29649 x 22008 Image of size 15315 x 11624 N cores Best execution time T N (sec) Sequential T 1 (sec) Speed up Number of cores Best execution time T N (sec) Sequential T 1 (sec) Speed up Case 1 12 2.890 22.850 7.91 12 0.803 6.021 7.50 Case 2 12 3.145 23.798 7.57 12 0.843 6.534 7.75 Case 3 12 4.521 33.532 7.42 12 1.381 8.854 6.41 Case 4 12 13.015 143.52 11.03 12 3.724 39.45 10.59 Image of size 2250 x 2250 Image of size 1536 x 2048 Case 1 12 0.028 0.180 6.43 8 0.024 0.108 4.50 Case 2 12 0.032 0.189 5.91 8 0.026 0.114 4.38 Case 3 12 0.067 0.491 7.33 12 0.051 0.305 5.98 Case 4 12 0.147 1.589 10.81 12 0.130 1.016 7.82 Image of size 2048 x 2048 Image of size 2704 x 4064 Case 1 8 0.031 0.147 4.74 8 0.069 0.374 5.42 Case 2 8 0.031 0.156 5.03 8 0.075 0.403 5.39 Case 3 12 0.051 0.372 7.29 12 0.160 1.297 8.11 Case 4 12 0.132 1.201 9.10 12 0.348 3.959 11.38 12 core compute node (shared memory multi processor, processor @ 2.2 Ghz of AMD ( Opteron ) type, 6x64 KB L1 instruction cache per processor, 6x64 KB L1 data cache per processor, 6x512 KB L2 per processor, 6MB L3 per processor, 24 GB RAM, 64 bit Linux versi on 2.6.18.

PAGE 96

84 Table 4 : Execution time for the proposed algorithm running on 64 core compute nodes Case 1: Eight directions Prewitt, Case 2: Smoothing (level 1) & eight directions Prewitt, Case 3: Smoothing (level 1) & eight directions Prewitt & thresholding, Case 4: Smoothing (level 5) & eight directions Prewitt & thresholding Image of size 29649 x 22008 Image of size 15315 x 11624 N cores Best execution time T N (sec) Sequential T 1 (sec) Speed up Number of cores Best execution time T N (sec) Sequential T 1 (sec) Speed up Case 1 28 1.35 19.07 14.13 28 0.39 5.22 13.38 Case 2 32 1.35 19.89 14.73 24 0.42 5.46 13.00 Case 3 32 1.86 28.14 15.13 32 0.65 7.64 11.75 Case 4 44 4.35 115.56 26.57 40 1.32 31.41 23.80 Image of size 2250 x 2250 Image of size 1536 x 2048 Case 1 12 0.027 0.155 5.74 12 0.019 0.096 5.05 Case 2 12 0.029 0.163 5.62 12 0.023 0.102 4.43 Case 3 16 0.051 0.460 9.02 16 0.040 0.284 7.10 Case 4 24 0.094 1.461 15.54 24 0.073 0.979 13.41 Image of size 2048 x 2048 Image of size 2704 x 4064 Case 1 12 0.023 0.126 5.48 16 0.040 0.324 8.10 Case 2 12 0.025 0.139 5.56 16 0.043 0.342 7.95 Case 3 12 0.043 0.336 7.81 24 0.117 1.135 9.70 Case 4 24 0.092 1.067 11.60 32 0.184 3.364 18.28 64 core compute node (shared memory multi processor, processor @ 2.2 Ghz of AMD (Bulldozer) type, 1x64 KB L1 instruction cache per module which contains two execution cores, 2x16 KB L1 data cache per module, 1x2 MB L2 per module, 16 MB L3 shared by four modules (located o n the same chip), 128 GB RAM, 64 bit Linux version 2.6.18. The two multicore platforms used for our experiments are: A 12 core AMD with Opteron module (Table 3) and a 64 core AMD with Bulldozer module (Table 4). The organizations of the tw o processors are quite different as briefly shown in 3.1.1 and 3.1.2 The Opteron consists of two chips each has 6 cores. Each Opteron core has its own private first level instruction and data caches of 64 KB size each and a private unified L2 cache of 512 KB. Every 6 core/chip share 6 MB of L3 cache. The 64 core AMD with Bulldozer consists of eight Bulldozer modules on one chip (16 cores). A Bulldozer module consists of two integer execution cores that share a first level instruction cache of 64 KB and a floating point unit. Each core has a private first level data cache of 16 KB. The two cores within one Bulldozer module share a unified L2 cache (1 MB). Every four modules share 16 MB of L3 cache

PAGE 97

85 The L1 data cache in Opteron is four tim cache. The idea of partitioning the image into a number of equal sized tiles helps enhance the locality of data. The main difference between Opteron and Bulldozer is that more resources are shared within the Bulldozer module. For instance, when we process relatively large images (29649 x 22008) using 32 cores, running on the Bulldozer, the best results are achieved when the parallel task is distributed such that each task is executed on every other core. This means tha t the 32 Bulldozer modules, one active core per module, will be working compared to the case in which only 16 Bulldozer modules, two active cores per unit, are used. Using more Bulldozer units not only provides larger L3 cache, a total of total 128 MB on t he four chips, but also allows each core to use the entire shared L2 cache as a dedicated second level cache and have exclusive use of the single floating point unit that is shared by the two integer cores. Although the communication is costly especially i f the cores do not have shared cache space, i.e., the cost of one core accessing cache memory located on different chips, having a better data locality (when larger caches are used) enhances the overall performance of the application. This statement will n ot apply to all applications, but in our design we aimed to divide the input image to independent tiles in order to decrease the amount of communication required at the same time increase the locality of data gained from using larger cache space. Looking at the results in Table 3, data collected from running on Opteron module, and Table 4, data collected from running on Bulldozer module, reveal that our parallel algorithm performs very well. Now, let us look at the different cases in which these results are collected. In Case 1 images are processed by applying the proposed eight

PAGE 98

86 direction Prewitt edge detection algorithm without smoothing and thresholding mechanisms. In Case 2 level 1 smoothing is used prior t o the eight direction Prewitt w hile in Case 3 the proposed thresholding is added to Case 2 Finally in Case 4 the images are processed with level 5 smoothing, eight direction Prewitt and the proposed thresholding mechanism. The number of cores used in each case is represented in th e N cores column of T ables 3 and 4 while the Best execution time T N column shows the time taken in seconds of the best parallel run using N cores. The column Sequential T 1 contains the execution time of the above specified cases on a single core. The next column Speed up is calculated from T 1 /T N which represents the number of times the parallel run is faster than the sequential. In order to understand how each case performs on either of the two platforms we will compare between the cases by test ing various image sizes. Also, for each image size we will run all four cases on two different platforms that are Bulldozer and Opteron. 5.3 Answers for the Research Questions In the introduction chapter, we have specified the main research questions we aim to answer. Therefore, b ased on the results explained in C hapter 5 we will answer these main questions in this section. 5.3.1 I s it Possible to Come Up with Improved Version of Prewitt Edge Detection A lgorithm? The answer for this que stion is shown in both F igure 4.4 and 5.3 and explained briefly in C hapter 4 and 5. Figure 4.4 shows the improvement in quality of applying

PAGE 99

87 bidirectional Prewitt vs. Ei ght direction Prewitt, while in F igure 5.3 we demonstrate the improved quality of our proposed Prewitt which is indeed proven to be an improved version of Prewitt edge detection algorithm. The whole improvement of our proposed Prewitt is based on sub improvements that are grouped together to make the final version of the algorithm. T hese sub improvements are as follows : a. First, detecting edges in eight directions instead of two difference shown in F igures 4.3 and 4.4 However, the eight direction Prewitt as we have shown in 4.1.3 and F igure 4.6 f ails to work on images that are impacted with impulse and other types of noise. b. In order to decrease the noise sensitivity of Prewitt we have augmented our improved design with smoothing mechanism we called it the improved Median filter. Our smoothing mec hanism as we have proved in Table 1 and F igure 5.1 and briefly covered in 4.2.1. In Table 1, F igure 5.1 and F igure 5.2 we can observe the capability of our improved Median filter of suppressing impulse noise and other type of noise using the idea of flexib le window size. c. Prewitt algorithm does not incorporate a thresholding mechanism in order to produce binary images. The presence of binary image, one representing the object and the other representing the background, is desirable in many object detection a pplications. For this reason, we have added iterative thresholding functionality (sub improvement) to our improved Prewitt edge detection known as basic global thresholding that is briefly covered in 4.2.2 We can visualize the output of this thresholdi ng in the column number six of F igure 5.3.

PAGE 100

88 5.3.2 H ow to Overcome the Additional Complexity Added to the Original Bidirectional Prewitt A lgorithm? As we have explained in the answer of the previous question, our improved version of the Prewitt edge detection is the augmentations of additional functionalities which all work together to produce better quality detection algorithm. These additional funct ionalities added more complexity to the original bidirectional algorithm. Accordingly, we have implemented efficient implementations of the algorithm to overcome the additional functionalities added and they are as follows : a. I n our application design we are allowing flexibility in running the application so as to use functionalities only when needed for the sake of operations reduction For instance, as shown in F igure 4.13 the user is eligible to disable and enable both smoothing and thresholding in addition to the noise level guess. All these flexibilities in the design will save some operations that might not be needed in some scenarios. b. The design of our efficient sequ ential implantation is comprehensively covered in 4.3.1 Our sequential implementation, that is based on C/C++ programming langu age with OpenCV library, over perfor ms both bidirectional Prewitt and Canny (industry standard) which are already in use with Mat lab. The comparison in execution times (of running various image sizes) between our improved Prewitt edge detection and the bidirectional Prewitt and Canny is shown in 5.1.1 T able 1. Indeed, our algorithm within its general case (Smoothing (level1) & 8Dir ections Prewitt & Iterative Thresholding) is between 2.4 to 10.47 times faster than the original bidirectional Prewitt and 13.7 to 97.28 faster than Canny. This is very impressive due to the fact

PAGE 101

8 9 that our algorithm is much more complex as shown in 4.1.1 th an the original bidirectional Prewitt. c. As we have presented the efficient sequential implementation, additional improvement has been added to our improved Prewitt edge detection algorithm. This improvement includes designing a parallel version of the algo rithm that is based on our efficient sequential one as presented in 4.3.2 and 4.11 Our parallel version of the algorithm which is mainly designed to run on shared memory multicore architecture accelerates our efficient sequential version of the algorithm. The execution time of our algorithm is shown in 5.2.2 Table 3 and Table 4 For instance (in T able 4, Case 4) we are able to process an image of size 29649 x 22008, 26.57 x times faster than the efficient sequential run using 44 processors running on 64 core node. On the other hand, we are able to achieve a very high speed up percentage, running on image with much smaller size, that is 11.38 x times faster than sequential using 12 core node which is very close to the ideal case that is 12x 5.3.3 How Efficient the Algorithm is Compare to the Original Prewitt and the Well known Canny Edge D etections? A fter we have covered our main improvements of the Prewitt algorithm, the answer of this question is to combine both the previous two answers (5.3.1 and 5.3.2) together. The improvements have not only been on the quality of the algorithm but also the execution time as shown in Figure 5.3, Table 3 and T able 4 respectively. Our im plementations in all cases over perform Prewitt and Canny in all cases working on vari ous image sizes.

PAGE 102

90 5.3.4 W ill the Improved Ve rsion of this Algorithm Perform Well U nconditionally? This is a very general statement that will require a solid proof. Let us first define the word unconditionally. It means to perform wel l with the following sc enarios: a. W orking on different operating systems. Our implementation that is attached in the Appendix, works properly on both Windows and Linux. b. P erforming well on different platforms. One of our motivation in the design is not to depend on only one platfo rm. Instead we have run our algorithm on three different platforms (Intel, Opteron and Bulldozer) as explained in C hapter 3 and 5. c. The flexibility in processing region of interest instead of the whole image, allows for operations reduction (avoid perform ing useless operations) Processing ROI is a very common practice especially in biomedical image processing. Moreover, we are allowing the user to specify the way of partitioning as presented in Figure 4.9 gives more flexibility running on different type o f architectures. d. We have shown that we are able to process various image sizes at a rapid speed start ing from very small images (256 x 256) to extreme large image s ( 29649 x 22008 ) using multiple core shared memory presented in Figure 4. I. e. Finally our improved algorithm is a robust version of the eight direction Prewitt that can perform rapidly on various image s impacted with different type of noise. 5.3.5 How Suitable is the Improved Version of this A lgorithm for Real time Edge Detection A pplication? As we have explained earlier in 2.4.1 real time image processing algorithms (according to different researches and studies) require the ability of processing 30 to 100

PAGE 103

91 frames per second. This processing capability (frame per second) is needed to decide whether an implementation is suitable for real time application or not. Next, we will prove based on our results that the presented application is suitable for real time edge dete ction applications Therefore, we want to show the ability of proces sing small and medium image sizes in real time. At the same time, it perform s very well (compare d to other image processing applications) processing extremely large images. a. As presented earlier in this chapter 5.2.1 we are able to process 62.5 fps (Frames per Second) each of size 512 x 512 and with level 1 smoothing. Yet, obviously suppressing higher noise percentage requires more computations. Thus, with the extremist case (noise of level 5) we processed 23.2 fps s equentially. However, we are able to accelerate the previous runs using 4 processors, running on the same platform, to 6 ms at 166 fps for level 1 smoothing [and 11 ms at 90.9 fps for level 5]. This becomes very suitable for real time edge detection application. Processing image with size 256x256 is extremely fast that is about (400 frame per second) running sequentially and with level 1 smoothing This runs show a rapid speed that is a bout 100 x times faster than processing the same image with Canny edge detection implemented in Matlab. b. Never the less, t o visualize the ability of processing in real time, we have processed video, consisted of 901 frames each of size 1280 x 720, in real time with o ur sequential implementation and without pipelining the images. Nevertheless, our parallel implementation as shown in Table 3 and 4, is an acceleration of our efficient sequential run. Images with size 1280 x 720 can be considered as a medium size images. On the other hand, for extreme large images loading the image by itself can

PAGE 104

92 take couple seconds. Therefore, we cannot assume there is a real time application that can process for example image of size 30000 x 20000 in real time. Yet, our presented algorith m shows a good speed up processing such an image compare to the efficient sequential run.

PAGE 105

93 6. C onclusion Sequential and P arallel edge detection application s based on eight direction Prewitt edge detection algorithm are designed and implemented to work on different multicore platforms efficiently. Different functionalities are added to the original Prewitt such as smoothing and a global thresholding mechanism. In order to suppress noise more efficiently, an improved Medi an filter that enables the application to work effectively on noisy images is added. This method not only strengths the original algorithm by allowing it to work on noisy images more effectively but also lets it compete with the industry standard detection algorithm Canny. Our algorithm when run sequentially, with all added functionality and complexity included, outperforms the default runs of both Prewitt and Canny already implemented in Matlab. Our parallel implementation of the algorithm uses C/C++ as th e base language with two open source libraries OpenCV and OpenMP. Different experiments show improved performance gained from processing different size images especially when the complexity of the problem increases which allows for more parallelism Variet y of tuning mechanisms have been added throughout the design to allow flexibility of work distribution to enhance the overall performance. The sequential application is tested on Intel platforms while t he parallel implementation of this application is test ed on two new shared memory MIMD multicore platforms namely Opt eron and Bulldozer. Finally these implementation s can effectively be used within the applications of image processing that relies on fast and accurate edge detection.

PAGE 106

94 REFERENCES [1] Image Processing, Computer Vision and Pattern Recognition vol., no.2, 22 25 July 2013 [2] Patel, H., "GP U accelerated real time polarimetric image processing through the use of CUDA," Aerospace and Electronics Conference (NAECON), Proceedings of the IEEE 2010 National vol., no., pp.177,180, 14 16 July 2010 doi: 10.1109/NAECON.2010.5712943 [3] Prajapati, H .B.; Vij, S.K.; "Analytical Study of parallel and distributed image processing," Image Information Processing (ICIIP), 2011 International Conference on vol., no., pp.1 6, 3 5 Nov. 2011 doi: 10.1109/ICIIP.2011.6108870. [4] Jordan, H. F. & Alaghband, G. (2003). Fundamentals Of Parallel Processing, Upper Saddle River, NJ: Pearson. [5] Jie Zhao; Yongmin Yang; Ge Li, "Real time Image Processing System Based on Multi core Processor," Intelligent Information Technology Application, 2009. IITA 2009. Third International Symposium on vol.1, no., pp.329,332, 21 22 Nov. 2009 doi: 10.1109/IITA.2009.177 [6] Rani, M.S.; Mridha, M.F.; Asaduzzaman, A.S.; "Investigating the impact of multimedia applications on multicore cache memory subsystems," Electrical and C omputer Engineering (ICECE), 2010 International Conference on vol., no., pp.690 693, 18 20 Dec. 2010 doi: 10.1109/ICELCE.2010.5700787 [7] Bradiski, G. & Kaebler, E. (2008). Learning OpenCV [8] Gonzalez, R.C., Woods, R. E. & Eddins, S. L. (2010). Digital Image Processing Using MATLAB (2nd ed.). Gatesmark, LLC. [9] Nalwa, V.S.; Binford, T.O., "On Detecting Edges," Pattern Analysis and Machine Intelligence, IEEE Transactions on vol.PAMI 8, no.6, pp.699,714, Nov. 1986 doi: 10.1109/T PAMI.1986.4767852 [10] Bergholm, Fredrik, "Edge Focusing," Pattern Analysis and Machine Intelligence, IEEE Transactions on vol.PAMI 9, no.6, pp.726,741, Nov. 1987 doi: 10.1109/TPAMI.1987.4767980 [11] Canny, John, "A Computational Approach to Edge Detection," Pattern Analysis and Machine Intelligence, IEEE Transactions on vol.PAMI 8, no.6, pp.679,698, Nov. 1986 doi: 10.1109/TPAMI.1986.4767851

PAGE 107

95 [12] Wang Dong; Zhou Shisheng, "Color Image Recognition Method Based on the Prewitt Operator," Computer Science and Software Engineering, 2008 International Conference on vol.6, no., pp.170,173, 12 14 Dec. 2008 doi: 10.1109/CSSE.2008.567 [13] Srivastava, G.K.; Verma, R.; Mahrishi, R.; Rajesh, S.; "A novel wavelet edge detection algorithm for noisy images," Ultra Modern Telecommunications & Workshops, 2009. ICUMT '09. International Conference on vol., no., pp.1 8, 12 14 Oct. 2009. [14] Top al, C.; Akinlar, C.; Gen, Y., "Edge Drawing: A Heuristic Approach to Robust Real Time Edge Detection," Pattern Recognition (ICPR), 2010 20th International Conference on vol., no., pp.2424,2427, 23 26 Aug. 2010 doi: 10.1109/ICPR.2010.593 [15] Chaoyu Lin ; Yuliang Tang, "Research and design of the intelligent surveillance system based on DirectShow and OpenCV," Consumer Electronics, Communications and Networks (CECNet), 2011 International Conference on vol., no., pp.4307,4310, 16 18 April 2011 doi: 10.11 09/CECNET.2011.5768334. [16] Jun Feng Xiong; Bin Fang; "Edge detection in noisy image using kernel regression," Wavelet Analysis and Pattern Recognition (ICWAPR), 2012 International Conference no., pp.45 52, 15 17 July 2012 doi: 10.1109/ICWAPR.2012.6294753. [17] Joshi, S.R.; Koju, R.; "Study and comparison of edge detection algorithms," Internet (AH ICI), 2012 Third Asian Himalayas International Conference on vol., no., pp.1 5, 23 25 Nov. 2012 doi: 10.1109/AHICI.2012.6408439. [18] Phueakjeen, W.; Jindapetch, N.; Kuburat, L.; Suvanvorn, N., "A study of the edge detection for road lane," Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI CON), 2011 8th International Conference on vol., no., pp.995,998, 17 19 May 2011 doi: 10.1109/ECTICON.2011.5948010 [19] Gonzalez, R.C., Woods, R. E. (2008). Digital Image Processing (3 rd ed.), Upper Saddle River, NJ: Prentice Hall. [20] A. Rosenfield and A. C. Kak, Digital Image Processing, N ew York: Academic Press, 1982. [21] Qingwei Liao; Jingxin Hong; Meiqun Jiang; "A comparison of edge detection algorithm using for driver fatigue detection system," Industrial Mechatronics and Automation (ICIMA), 2010 2nd International Conference on vol.1, no., pp.80 83, 30 31 May 2010 doi: 10.1109/ICINDMA.2010.5538090. [22]

PAGE 108

96 [23] Lei Yang; Dewei Zhao; Xiaoyu Wu; Hui Li; Jun Zhai; "An improved Prewitt algorithm for edge detection based on noised image," Image and Signal Processing (CISP), 2011 4th International Congress on vol.3, no., pp.1197 1200, 15 17 Oct. 2011 doi: 10.1109 /CISP.2011.6100495. [24] Huili Zhao; Guofeng Qin; Xingjian Wang, "Improvement of canny algorithm based on pavement edge detection," Image and Signal Processing (CISP), 2010 3rd International Congress on vol.2, no., pp.964,967, 16 18 Oct. 2010 doi: 10.11 09/CISP.2010.5646923 [25] Liu Xi'ang; Xia Jinsong; Yang Donghe; Liu Yingchun, "Cocoon Edge Detection based on Self Adaptive Canny Operator," Computer Science and Software Engineering, 2008 International Conference on vol.6, no., pp.7,10, 12 14 Dec. 2008 doi: 10.1109/CSSE.2008.1046 [26] Kumar, P.S.; Srinivasan, K.; Rajaram, M., "Performance evaluation of statistical based median filter to remove salt and pepper noise," Computing Communication and Networking Technologies (ICCCNT), 2010 International Conference on vol., no., pp.1,6, 29 31 July 2010 doi: 0.1109/ICCCNT.2010.5591652. [27] Chunfu Gao; Zhiyong Luo; Xinsheng He, "Study on the Real Time Image Processing System of Cutting Robot Based on SOPC," E Product E Service and E Entertainment (ICEEE) 2010 International Conference on vol., no., pp.1,4, 7 9 Nov. 2010 doi: 10.1109/ICEEE.2010.5660263 [28] Liu Jing; Xinli Liu; Guofu Yin, "The research and implementation of the method of pretreating the face images based on OpenCV machine visual library ," Electronic and Mechanical Engineering and Information Technology (EMEIT), 2011 International Conference on vol.5, no., pp.2719,2721, 12 14 Aug. 2011 doi: 10.1109/EMEIT.2011.6023595. [29] Hongchao Song; Yuanyuan Shang; Xuefeng Hou; Baoyuan Han, "Resea rch on image enhancement algorithms based on Matlab," Image and Signal Processing (CISP), 2011 4th International Congress on vol.2, no., pp.733,736, 15 17 Oct. 2011 doi: 10.1109/CISP.2011.6100359 [30] Zhu Youlian; Huang Cheng, "Image denoising algorithm based on PSO optimizing structuring element," Control and Decision Conference (CCDC), 2012 24th Chinese vol., no., pp.2404,2408, 23 25 May 2012 doi: 10.1109/CCDC.2012.6243044 [31] Hore, A.; Ziou, D., "Is there a relationship between peak signal to noise ratio and structural similarity index measure?," Image Processing, IET vol.7, no.1, pp.12,24, February 2013 doi: 10.1049/iet ipr.2012.0489 [32] oach to improve image quality of information Pattern Recognit., 2008. vol.8, pp. 2674 2683.

PAGE 109

97 [33] Pattern Recognit. 2004 vol.8, pp 1607 1617. [34] Wang Yuanji; Li Jianhua; Lu Yi; Fu Yao; Jiang Qinzhong, "Image quality evaluation based on image weighted separating block peak signal to noise ratio," Neural Networks and Signal Processing, 2003. Proceedings of the 2003 International C onference on vol.2, no., pp.994,997 Vol.2, 14 17 Dec. 2003 doi: 10.1109/ICNNSP.2003.1281036 [35] Karim, T.F.; Lipu, M.S.H.; Rahman, M.L.; Sultana, F., "Face recognition using PCA based method," Advanced Management Science (ICAMS), 2010 IEEE International Conference on vol.3, no., pp.158,162, 9 11 July 2010 doi: 10.1109/ICAMS.2010.5553266 [36] PirahanSiah, F.; Abdullah, S.N.H.S.; Sahran, S., "Adaptive image segmentation based on peak signal to noise ratio for a license plate recognition system," Computer Applications and Industrial Electronics (ICCAIE), 2010 International Conference on vol., no., pp.468,472, 5 8 Dec. 2010 doi: 10.1109/ICCAIE.2010.5735125 [37] Nagaria, B.; Hashmi, M.F.; Hussain, I.; Chauhan, R.S., "Comparative Analysis of an Optimal Image Compression Using FDWT at Various Decomposition Level with Different Statistical Numerica l Measures for Different Pixel Frame," Computational Intelligence and Communication Networks (CICN), 2011 International Conference on vol., no., pp.262,266, 7 9 Oct. 2011 doi: 10.1109/CICN.2011.53 [38] Korhonen, J.; Junyong You, "Peak signal to noise rat io revisited: Is simple beautiful?," Quality of Multimedia Experience (QoMEX), 2012 Fourth International Workshop on vol., no., pp.37,38, 5 7 July 2012 doi: 10.1109/QoMEX.2012.6263880 [39] Ningbo Zhu; Gang Wang; Gaobo Yang; Weiming Dai, "A Fast 2D Otsu Thresholding Algorithm Based on Improved Histogram," Pattern Recognition, 2009. CCPR 2009. Chinese Conference on vol., no., pp.1,5, 4 6 Nov. 2009 doi: 10.1109/CCPR.2009.5344078. [40] Osman, M.K.; Mashor, M.Y.; Jaafar, H., "Performance comparison of clustering and thresholding algorithms for tuberculosis bacilli segmentation," Computer, Information and Telecommunication Systems (CITS), 2012 International Conference on vol., no., pp.1 ,5, 14 16 May 2012 doi: 10.1109/CITS.2012.6220378. [41] Xin Chen; Houjin Chen, "A novel color edge detection algorithm in RGB color space," Signal Processing (ICSP), 2010 IEEE 10th International Conference on vol., no., pp.793,796, 24 28 Oct. 2010 doi: 10.1109/ICOSP.2010.5655926 [42] Sthitpattanapongsa, P.; Srinark, T., "A two stage Otsu'S thresholding based method on a 2D histogram," Intelligent Computer Communication and Processing (ICCP), 2011

PAGE 110

98 IEEE International Conference on vol., no., pp.345,348, 25 27 Aug. 2011 doi: 10.1109/ICCP.2011.6047894 [43] Culjak, I.; Abram, D.; Pribanic, T.; Dzapo, H.; Cifrek, M., "A brief introduction to OpenCV," MIPRO, 2012 Proceedings of the 35th International Convention vol., no., pp.1725,1730, 21 25 May 2012. [44] Abreu, E.; Lightstone, M.; Mitra, S.K.; Arakawa, K., "A new efficient approach for the removal of impulse noise from highly corrupted images," Image Processing, IEEE Transactions on vol.5, no.6, pp.1012,1025, Jun 1996 doi: 10.1109/83.503916 [45 ] Yi wen Qiu; Zongliang Gan; Yaqiong Fan; Xiuchang Zhu, "An adaptive image denoising method for mixture Gaussian noise," Wireless Communications and Signal Processing (WCSP), 2011 International Conference on vol., no., pp.1,5, 9 11 Nov. 2011 doi: 10.1109/WCSP .2011.6 096774 [4 6 ] Jong Sen Lee, "Digital Image Enhancement and Noise Filtering by Use of Local Statistics," Pattern Analysis and Machine Intelligence, IEEE Transactions on vol.PAMI 2, no.2, pp.165,168, March 1980doi: 10.1109/TPAMI.1980.4766994 [47] L ei Tao; Jiang Ping; Zhou Jin; Wu Qin zhang, "Study of High Speed Parallel Image Processing Machine," Computer Science and Electronics Engineering (ICCSEE), 2012 International Conference on vol.3, no., pp.223,226, 23 25 March 2012 doi: 10.1109/ICCSEE.2012.392 [48] Yan Ma; Guoqing Li; Jian Wang; Dingsheng Liu, "The Performance Comparison and Evaluation of Parallel Platforms based on Throughput Model," Grid and Cooperative Computing Workshops, 2006. GCCW '06. Fifth International Conferenc e on vol., no., pp.482,486, Oct. 2006 doi: 10.1109/GCCW.2006.91 [49] Krishnakumar, Y.; Prasad, T. D.; Kumar, K. V S; Raju, P.; Kiranmai, B., "Realization of a parallel operating SIMD MIMD architecture for image processing application," Computer, Communication and Electrical Technology (ICCCET), 2011 International Conference on vol., no., pp.98,102, 18 19 March 2011 [50] Hennessy, J.L. & Patterson, D.A. (2012). Computer Architecture: A Quantitative Approach (5 th ed.), Morgan Kaufmann, MA. [51] [52] Goryawala, M.; Guillen, M.R.; Bhatt, R.; Mcgoron, A.; Adjouadi, M., "A Comparative Study on the Performance of the Parallel and Distributing Computing Operation in MatLab," Advanced Information Networking and Applications (AINA), 2010

PAGE 111

99 24th IEEE International Conference on vol., no., pp.150,157, 20 23 April 2010 doi: 10.1109/AINA.2010.102 [53] Ubaidullah; Khan, S .A., "Accelerating MATLAB slow loop execution with CUDA," Emerging Technologies (ICET), 2011 7th International Conference on vol., no., pp.1,4, 5 6 Sept. 2011 doi: 10.1109/ICET.2011.6048447 [54] Matuska, S.; Hudec, R.; Benco, M.; "The comparison of CP U time consumption for image processing algorithm in Matlab and OpenCV," ELECTRO, 2012 vol., no., pp.75 78, 21 22 May 2012. [55] Wafi, Z.N.K.; Ahmad, R.B.; Paulraj, M. P., "Highways Traffic Surveillance System (HTSS) using OpenCV," Control and System Graduate Research Colloquium (ICSGRC). 2010 IEEE vol., no., pp.44,48, 22 22 June 2010 doi: 10.1109/ICSGRC.2010.5562523 [56] Chaoyu Lin; Yuliang Tang, "Research and design of the intelligent surveillance system based on DirectShow and OpenCV," Consumer E lectronics, Communications and Networks (CECNet), 2011 International Conference on vol., no., pp.4307,4310, 16 18 April 2011 doi: 10.1109/CECNET.2011.5768334 [57] Hongping Wang; Juan Zhang; Xiuguo Liu; Xiaodong Huang, "Parallel algorithm design for remo te sensing image processing in the PC cluster environment," Geoinformatics, 2010 18th International Conference on vol., no., pp.1,5, 18 20 June 2010 doi: 10.1109/GEOINFORMATICS.2010.5567904 [58] Keltcher, C.N.; McGrath, K.J.; Ahmed, A.; Conway, P., The AMD Opteron processor for multiprocessor servers," Micro, IEEE vol.23, no.2, pp.66,76, March April 2003 doi: 10.1109/MM.2003. 1196116 [59] Butler, M.; Barnes, L.; Sarma, D.D.; Gelinas, B., "Bulldozer: An Approach to Multithreaded Compute Performance," Micro, IEEE vol.31, no.2, pp.6,15, March April 2011 doi: 10.1109/MM.2011.23 [60] Wenshuo Gao; Lei Yang; Xiaoguang Zhang; Bin Z hou; Chunxi Ma, "Based on soft threshold wavelet de noising combining with Prewitt operator edge detection algorithm," Education Technology and Computer (ICETC), 2010 2nd International Conference on vol.5, no., pp.V5 155,V5 162, 22 24 June 2010 doi: 10.1109/ICETC.2010.5529792 [61] Proceedings DARPA Image Understanding Workshop, vol. I, pp. 35 37, Los Angeles, CA, USA, Feb. 1987.

PAGE 112

100 APPENDIX /* An Improved Parallel Eight Direction Prewitt Edge Detection application (IPEDPED) Written by Mohammed Mohammed, Dec 25 2012...to run on Linux Updated Nov 20 2012... More work and implementation Feb 11 2013... More work and finalization Jully 2 2013... In this code we have come with a new design for the Prewitt edge detection algorithm that is capable to run sequentially or parallel on shared multicore architecture That is running Linux operating system. Users are given privileges to have control over the operations to apply on the input image. The main f unctionalities added of this algorithm are Smoothing, Localizing, and iteratively thresholding. This program has different implementations that have been tested through the developing phase in order to choose the most suitable implementations. Partitioning the image can be done in one of three main ways: dividing into rows, columns or blocks. We have tested the idea of having two levels of parallelism yet our final decision (according to different experimental results and hundred runs, was to keep the paral lelism at the highest level which is proven to provide the best results. If the user is willing to save images or subimages, we are required to have renaming mechanism that works dynamically to rename the image whose number depends on the user input... Therefore, we have built a function to take care of casting and to change from integers to string and const char*. Then the function getSubImage will help specify the Region of interest for each of the processors in order to let each processor copy the dat a needed from the original image. Sub images (tiles) are divided among the processors that are located on the same node running on the Hydra. "I am using th e previous program I implemented which is DivideImage Different Way, and also use the DivideImageAmo ngProcessor2 *" In this program we are able to divide the image into sub images (using different ways to partition the image into columns, rows, and blocks) then apply our implemented 8 direction Prewitt edge detection. Within the same implementation the u ser can chose to either run the 8 direction in parallel or let it run sequentially. If we run it in parallel then we will have subBlocks are running by processors then each of them would be master of other processor in order (if two levels of parallelism are employed). Yet, we are focusing on the one level of parallels where the same processor copy the data from the original image to its local memory working on it, then save it (merge it back) within the same location to its final destination. Merging data to its final destination requires some synchronization which is enforced by OpenCV to guarantee data integrity. Dividing the work among the processors is done in parallel. In this program we need to ente r 6 arguments as the following: cout << "Ar guments:<1> <2> <3> <4> <5> <6><7><8>" << endl; cout << "<1> : imageName" << endl; cout << "<2> : choose type of partition 1: divide into rows, 2: into blocks, 3: into colmouns" << endl; cout << "<3> : number of sub images" << endl ; cout << "<4> : number of threads" << endl; cout << "<5> : do you want to run 8D in parallel? 1: Yes, 0: No" << endl; cout << "<6> : 1 or missing: does not save sub images" << endl; cout << "<7> : Level of noise 0 means ign or the smoothing,1: low level of noise, 5:very noisy" << endl; cout << "<8> : 0 No thresholding, 1 apply Iterative Thresholding" << endl; isOK = false; -------------------------------------------------------------------------------------Set Input -------------------------------------------------------------------------------------<1> Choosing the name of the image. (this supporting different image extensions such as .tif, .tiff, .jpeg, .png, .bmp, etc. <2> The second argument to choose the type of partitioning: 1: divide into rows 2: divide into blocks 3: divide into colmouns <3> The third argument is to choose the number of sub images (in case it is block then we will have nxn sub images) <4> The fourth argument is to specify number of threads (cores)

PAGE 113

101 <5> The fifth argument to choose whether to run the 8 direction Prewitt in parallel or not (if yes two levels of parallelism), default is one level 1: Yes run it as a nested parallel 0: No don't run it in parallel (this is the default run) <6> The sixth argument is to choose either save sub images or not Default is to not save. 1: Yes save subimages 2: Do not save subimages <7> The seventh argument is to enable or disable the smoothing, if smoothing is enabled then level of noise is specified as the following 1: for low level of noise. 5: for very high noise percentage. <8> The eighth argument is to enable or disable the iterative thresholding 1: enable thresholding 0: diable thresholdin g ---------------------------------------------------------------------------------------------Compiling the Source Code --------------------------------------------------------------------------------------------Compiling the code and connect with OpenMp and OpenCV's libraries to run on Linux environment: g++ fopenmp anImprovedParallelEightDirectionPrewittEdgeDetection.cpp I/opt/opencv 2.2/include L/opt/opencv 2.2/lib lopencv_calib3d lopencv_contrib lopencv_core lopencv_features2d lopencv_flann lopencv_gpu lopencv_highgui lopencv_imgproc lopencv_legacy lopencv_ml lopencv_objdetect lopencv_video o IPEDPED Name of the a pplication is: IPEDPED To run the same source code on windows you just have to replace the first three libraries with the following: #include #include #include #include #include */ #include #include #include #include #include #include #include #include #include # include #include #include // ----------generate string form right to left int+String ---------------// string gen ( string a int x ) { //int num; string temp = a ; std :: ostringstream ostr ; //output string stream ostr << x ; //use the string stream just like cout, //the str() function of the stream std :: string theNumberString = ostr str (); theNumberString = theNumberString + temp ; return theNumberString ; } // ----------generate string form right to left String+int img1 ----------// string genBack ( string a int x ) { //int num;

PAGE 114

102 string temp = a ; std :: ostringstream ostr ; //output string stream ostr << x ; //use the string stream just like cout, //the str() function of the stream std :: string theNumberString = ostr str (); theNumberString = temp + theNumberString ; return theNumberString ; } // ---------------------generate const char from string -------------------// const char genC ( string a ) { const char ccpExample = a c_str (); return ccpExample ; } // --------------Merge the sum images into the original one ---------------// void getSubImg ( IplImage img IplImage subImg CvRect roiRect ) { cvSetImageROI ( img roiRect ); subImg = cvCreateImage ( cvGetSize ( img ), img > depth img > nChannels ); cvCopy ( img subImg NULL ); cvResetImageROI ( img ); } // ------------------Divide the main images into sub images ---------------// IplImage retSubImg ( IplImage img CvRect roiRect ) { cvSetImageROI ( img roiRect ); IplImage subImg = cvCreateImage ( cvGetSize ( img ), img > depth img > nChannels ); cvCopy ( img subImg NULL ); cvResetImageROI ( img ); return subImg ; } // ------------Merge the sub images into the original one -----------------// void mergeSubImg ( IplImage img IplImage subImg CvRect roiRect ) { cvSetImageROI ( img roiRect ); cvCopy ( subImg img NULL ); cvResetImageROI ( img ); } // -----------------Draw Box for Test purposes ----------------------------// void draw_box ( IplImage img CvRect rect ){ cvRectangle ( img cvPoint ( rect x rect y ), cvPoint ( rect x + rect width rect y + rect height ), cvScalar ( 0x10 0x00 0xff ) ); } // ------------------Basic Global Iterative Thresholding ---------------// IplImage IterativeThresholding ( IplImage src ) { IplImage dst1 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst2 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage mask1 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage mask2 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); // CvScalar s = cvScalar(double val0, double val1=0, double val2=0, double val3=0); int MaxIteration = 0 ; // to count the number of iteratiosn cvCopy ( src dst1 ); cvCopy ( src dst2 ); CvScalar scAvgOld = cvAvg ( src );

PAGE 115

103 CvScalar scAvgNew = cvAvg ( dst1 ); CvScalar AvgG ; CvScalar AvgLE ; double AvgOld = scAvgOld val [ 0 ]; double AvgNew = scAvgNew val [ 0 ]; double M1 ; double M2 ; double NewThreshold = 0 ; double OldThreshold = AvgNew ; double temp ; bool isOkay = false ; do { MaxIteration = MaxIteration + 1 ; cvCmpS ( dst1 OldThreshold mask1 CV_CMP_GT ); cvCmpS ( dst1 OldThreshold mask2 CV_CMP_LE ); AvgG = cvAvg ( src mask1 ); AvgLE = cvAvg ( src mask2 ); M1 = AvgG val [ 0 ]; M2 = AvgLE val [ 0 ]; NewThreshold = 0.5 *( M1 + M2 ); // To show the itermediate thresholding values cout << "Old Threshold << OldThreshold << endl ; cout << "New Threshold << NewThreshold << endl ; cout << "Count is << MaxIteration << endl ; // check the condition in order to stop the loop temp = abs ( OldThreshold NewThreshold ); if ( temp <= 0 ) { isOkay = true ; } else OldThreshold = NewThreshold ; } while ( isOkay == false ); cout << "count is << MaxIteration << endl ; cout << "FinalThreshold << NewThreshold << endl << "Old Threshold << OldThreshold << endl ; ///cvSaveImage("M1.jpg",mask2); cvThreshold ( src dst1 NewThreshold 256 CV_THRESH_BINARY ); /// cvThreshold(src,dst1,NewThreshold,255,CV_THRESH_TRUNC); cvSaveImage ( "Iterative3.jpg" dst1 ); /// cvThreshold(src,dst2,100,256,CV_THRESH_OTSU); return ( dst1 ); /// cvSaveImage("IterativeThresholding.jpg",dst1); /// cvSaveImage("OSTSU.jpg",dst2); // de allocate memory cvReleaseImage (& dst1 ); cvReleaseImage (& dst2 ); cvReleaseImage (& mask1 ); cvReleaseImage (& mask2 ); } /* Parallel 8 directions if we are willing ot u se two levels of parallelis -* / IplImage EightDirections_Parlelle ( IplImage src int numOfPro ) { IplImage dst1 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst2 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst3 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst4 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels );

PAGE 116

104 IplImage dst5 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst6 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst7 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst8 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); // ----------Accessories of our filters to convolve with ----------// double delta = 0 ; CvPoint anchor = cvPoint ( 1 1 ); double c90 [] = { 1 1 1 0 0 0 1 1 1 }; // 180 degree double c180 [] = { 1 0 1 1 0 1 1 0 1 }; // 270 degree double c270 [] = { 1 1 1 0 0 0 1 1 1 }; //0 degree double c0 [] = { 1 0 1 1 0 1 1 0 1 }; // 135 degree double c135 [] = { 1 1 0 1 0 1 0 1 1 }; // 225 degree double c225 [] = { 0 1 1 1 0 1 1 1 0 }; // 315 degree double c315 [] = { 1 1 0 1 0 1 0 1 1 }; // 45 double c45 [] = { 0 1 1 1 0 1 1 1 0 }; //Allocate and apply the header for Mat// CvMat M0 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M45 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M90 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M135 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M180 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M225 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M270 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M315 = cvCreateMat ( 3 3 CV_64FC1 ); omp_set_nested ( numOfPro ); omp_set_dynamic ( numOfPro ); // Allow parallel sections #pragma omp parallel num_threads(numOfPro) { #pragma omp sections { #pragma omp section { cvInitMatHeader ( M90 3 3 CV_64FC1 c90 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes } #pragma omp section { cvInitMatHeader ( M180 3 3 CV_64FC1 c180 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes } #pragma omp section { cvInitMatHeader ( M270 3 3 CV_64FC1 c270 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes } #pragma omp section { cvInitMatHeader ( M0 3 3 CV_64FC1 c0 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes

PAGE 117

105 } #pragma omp section { cvInitMatHeader ( M135 3 3 CV_64FC1 c135 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes } #pragma omp section { cvInitMatHeader ( M225 3 3 CV_64FC1 c225 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes } #pragma omp section { cvInitMatHeader ( M315 3 3 CV_64FC1 c315 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes } #pragma omp section { cvInitMatHeader ( M45 3 3 CV_64FC1 c45 ); /// printf("done by %i \ n",omp_get_thread_num()); // for testing purposes } } #pragma omp barrier // -----first step of merging --------// #pragma omp sections { #pragma omp section { cvFilter2D ( src dst1 M90 anchor ); //cvSaveImage("90Degree.jpg",dst1); // to save itermediate images cvFilter2D ( src dst2 M180 anchor ); //cvSaveImage("180Degree.jpg",dst2); // to save itermediate images cvMax ( dst1 dst2 dst2 ); / printf("Hello world section 1 thread %i \ n",omp_get_thread_num()); // for testing purposes */ } #pragma omp section { cvFilter2D ( src dst3 M270 anchor ); //cvSaveImage("270Degree.jpg",dst3); // to save itermediate images cvFilter2D ( src dst4 M0 anchor ); //cvSaveImage("0Degree.jpg",dst4); // to save itermediate images cvMax ( dst3 dst4 dst4 ); /* printf("Hello world section 2 thread %i \ n",omp_get_thread_num()); // for testing purposes */ } #pragma omp section { cvFilter2D ( src dst5 M135 anchor ); //cvSaveImage("135Degree.jpg",dst5); cvFilter2D ( src dst6 M225 anchor ); //cvSaveImage("225Degree.jpg",dst6); cvMax ( dst5 dst6 dst6 ); /// printf("Hello world section 3 thread %i \ n",omp_get_thread_num()); // for testing purposes } #pragma omp section { cvFilter2D ( src dst7 M315 anchor ); //cvSaveI mage("315Degree.jpg",dst7); // to save itermediate images cvFilter2D ( src dst8 M45 anchor ); //cvSaveImage("45Degree.jpg",dst8); // to save itermediate images

PAGE 118

106 cvMax ( dst7 dst8 dst8 ); /* printf("Hello world section 4 thread %i \ n",omp_get_thread_num()); // for testing purposes */ } } /* I need to have barrier here to wait until all the other processes have completed their job in order */ #pragma omp barrier #pragma omp parallel sections { // Second step of merging #pragma omp section { cvMax ( dst2 dst4 dst4 ); // printf("Hello world section 2_1 thread %i \ n",omp_get_thread_num()); } #pragma omp section { cvMax ( dst6 dst8 dst8 ); // printf("Hello world section 2_2 thread %i \ n",omp_get_thread_num()); } } } // -----------------------------------------// // ---------Third step of merging ----------// cvMax ( dst4 dst8 dst8 ); // cvSaveImage("ALLDIRECTIONS.jpg",dst8 ); return ( dst8 ); // return the final result cvReleaseImage (& src ); cvReleaseImage (& dst1 ); cvReleaseImage (& dst2 ); cvReleaseImage (& dst3 ); cvReleaseImage (& dst4 ); cvReleaseImage (& dst5 ); cvReleaseImage (& dst6 ); cvReleaseImage (& dst7 ); cvReleaseImage (& dst8 ); // in each 45 degree angel cvReleaseData (& M0 ); cvReleaseData (& M45 ); cvReleaseData (& M90 ); cvReleaseData (& M135 ); cvReleaseData (& M180 ); cvReleaseData (& M225 ); cvReleaseData (& M270 ); cvReleaseData (& M315 ); } /* -----------------------------------------------------------------// Smooth then Parallel 8 directions // -----------------------------------------------------------------*/ IplImage Smooth_EightDirections_Parlelle ( IplImage src1 int numOfPro int noise Level int toThreshold ) { // smooth the recieved image IplImage src = cvCreateImage ( cvGetSize ( src1 ), src1 > depth src1 > nChannels ); switch ( noiseLevel ) { case 1 : cvSmooth ( src1 src CV_MEDIAN 3 ); /// cvSaveImage("Median3.jpg",src); break ; case 2 :

PAGE 119

107 cvSmooth ( src1 src CV_MEDIAN 5 ); /// cvSaveImage("Median5.jpg",src); break ; case 3 : cvSmooth ( src1 src CV_MEDIAN 7 ); /// cvSaveImage("Median7.jpg",src); break ; case 4 : cvSmooth ( src1 src CV_MEDIAN 9 ); /// cvSaveImage("Median9.jpg",src); break ; case 5 : cvSmooth ( src1 src CV_MEDIAN 9 ); cvSmooth ( src src1 CV_MEDIAN 9 ); cvCopy ( src1 src ); /// cvSaveImage("DoubleMedian9.jpg",src); break ; } IplImage dst1 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst2 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst3 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst4 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst5 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst6 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst7 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst8 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); // -------Accessories of our filters ------// double delta = 0 ; CvPoint anchor = cvPoint ( 1 1 ); double c90 [] = { 1 1 1 0 0 0 1 1 1 }; // 180 degree double c180 [] = { 1 0 1 1 0 1 1 0 1 }; // 270 degree double c270 [] = { 1 1 1 0 0 0 1 1 1 }; //0 degree double c0 [] = { 1 0 1 1 0 1 1 0 1 }; // 135 degree double c135 [] = { 1 1 0 1 0 1 0 1 1 }; // 225 degree double c225 [] = { 0 1 1 1 0 1 1 1 0 }; // 315 degree double c315 [] = { 1 1 0 1 0 1 0 1 1 }; // 45 double c45 [] = { 0 1 1 1 0 1 1 1 0 }; // -Allocate and apply the header for Mat -// CvMat M0 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M45 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M90 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M135 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M180 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M225 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M270 = cvCreateMat ( 3 3 CV_64FC1 ); CvMat M315 = cvCreateMat ( 3 3 CV_64FC1 ); omp_set_nested ( numOfPro ); omp_set_dynamic ( numOfPro ); #pragma omp parallel num_threads(numOfPro) { #pragma omp sectio ns {

PAGE 120

108 #pragma omp section { cvInitMatHeader ( M90 3 3 CV_64FC1 c90 ); /// printf("done by %i \ n",omp_get_thread_num()); } #pragma omp section { cvInitMatHeader ( M180 3 3 CV_64FC1 c180 ); /// printf("done by %i \ n",omp_get_thread_num()); } #pragma omp section { cvInitMatHeader ( M270 3 3 CV_64FC1 c270 ); /// printf("done by %i \ n",omp_get_thread_num()); } #pragma omp section { cvInitMatHeader ( M0 3 3 CV_64FC1 c0 ); /// printf("done by %i \ n",omp_get_thread_num()); } #pragma omp section { cvInitMatHeader ( M135 3 3 CV_64FC1 c135 ); /// printf("done by %i \ n",omp_get_thread_num()); } #pragma omp section { cvInitMatHeader ( M225 3 3 CV_64FC1 c225 ); /// printf("done by %i \ n",omp_get_thread_num()); } #pragma omp section { cvInitMatHeader ( M315 3 3 CV_64FC1 c315 ); /// printf("done by %i \ n",omp_get_thread_num()); } #pragma omp section { cvInitMatHeader ( M45 3 3 CV_64FC1 c45 ); /// printf("done by %i \ n",omp_get_thread_num()); } } #pragma omp barrier // -----f irst step of merging --------// //#pragma omp parallel sections //{ #pragma omp sections { #pragma omp section { cvFilter2D ( src dst1 M90 anchor ); //cvSaveImage("90Degree.jpg",dst1); cvFilter2D ( src dst2 M180 anchor ); //cvSaveImage("180Degree.jpg",dst2); cvMax ( dst1 dst2 dst2 ); /// printf("Hello world section 1 thread %i \ n",omp_get_thread_num()); } #pragma omp section { cvFilter2D ( src dst3 M270 anchor ); //cvSaveImage("270Degree.jpg",dst3); cvFilter2D ( src dst4 M0 anchor ); //cvSaveImage("0Degree.jpg",dst4); cvMax ( dst3 dst4 dst4 ); /// printf("Hello world section 2 thread %i \ n",omp_get_thread_num());

PAGE 121

109 } #pragma omp section { cvFilter2D ( src dst7 M315 anchor ); //cvSaveImage("315Degree.jpg",dst7); cvFilter2D ( src dst8 M45 anchor ); //cvSaveImage("45Degree.jpg",dst8); cvMax ( dst7 dst8 dst8 ); /// printf("Hello world section 4 thread %i \ n",omp_get_thread_num()); } } /* I need to have brrier here to wait untill all the other proceeses have completed their job inorder */ #pragma omp barrier #pragma omp parallel sections { // S econd step of merging #pragma omp section { cvMax ( dst2 dst4 dst4 ); // printf("Hello world section 2_1 thread %i \ n",omp_get_thread_num()); } #pragma omp section { cvMax ( dst6 dst8 dst8 ); // printf("Hello world section 2_2 thread %i \ n",omp_get_thread_num()); } } } // -----------------------------------------// // ---------Third step of merging ----------// cvMax ( dst4 dst8 dst8 ); // cvSaveImage("ALLDIRECTIONS.jpg",dst8); if ( toThreshold == 1 ) { dst2 = IterativeThresholding ( dst8 ); return ( dst2 ); } else { return ( dst8 ); } cvReleaseImage (& src ); cvReleaseImage (& src1 ); cvReleaseImage (& dst1 ); cvReleaseImage (& dst2 ); cvReleaseImage (& dst3 ); cvReleaseImage (& dst4 ); cvReleaseImage (& dst5 ); cvReleaseImage (& dst6 ); cvReleaseImage (& dst7 ); cvReleaseImage (& dst8 ); // in each 45 degree angel cvReleaseData (& M0 ); cvReleaseData (& M45 ); cvReleaseData (& M90 ); cvReleaseData (& M135 );

PAGE 122

110 cvReleaseData (& M180 ); cvReleaseData (& M225 ); cvReleaseData (& M270 ); cvReleaseData (& M315 ); } // ---------------Apply 8 Directio n edge detection ---------------// IplImage EightDirections ( IplImage src int toSave ) { toSave = 1 ; // to not allow to store the sub images for now IplImage dst1 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst2 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst3 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst4 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst5 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst6 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst7 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); IplImage dst8 = cvCreateImage ( cvGetSize ( src ), src > depth src > nChannels ); // -------Accessories of our filters ------// double delta = 0 ; CvPoint anchor = cvPoint ( 1 1 ); double c90 [] = { 1 1 1 0 0 0 1 1 1 }; // 180 degree double c180 [] = { 1 0 1 1 0 1 1 0 1 }; // 270 degree double c270 [] = { 1 1 1 0 0 0 1 1 1 }; //0 degree double c0 [] = { 1 0 1 1 0 1 1 0 1 }; // 135 degree double c135 [] = { 1 1 0 1 0 1 0 1 1 }; // 225 degree double c225 [] = { 0 1 1 1 0 1 1 1 0 }; // 315 degree double c315 [] = { 1 1 0 1 0 1 0 1 1 }; // 45 double c45 [] = { 0 1 1 1 0 1 1 1 0 }; // -----------------------------------------// // -Allocate and apply the header for Mat -// CvMat M90 = cvCreateMat ( 3 3 CV_64FC1 ); cvInitMatHeader ( M90 3 3 CV_64FC1 c90 ); CvMat M180 = cvCreateMat ( 3 3 CV_64FC1 ); cvInitMatHeader ( M180 3 3 CV_64FC1 c180 ); CvMat M270 = cvCreateMat ( 3 3 CV_64FC1 ); cvInitMatHeader ( M270 3 3 CV_64FC1 c270 ); CvMat M0 = cvCreateMat ( 3 3 CV_64FC1 ); cvInitMatHeader ( M0 3 3 CV_64FC1 c0 ); CvMat M135 = cvCreateMat ( 3 3 CV_64FC1 ); cvInitMatHeader ( M135 3 3 CV_64FC1 c135 ); CvMat M225 = cvCreateMat ( 3 3 CV_64FC1 ); cvInitMatHeader ( M225 3 3 CV_64FC1 c225 ); CvMat M315 = cvCreateMat ( 3 3 CV_64FC1 ); cvInitMatHeader ( M315 3 3 CV_64FC1 c315 ); CvMat M45 = cvCreateMat ( 3 3 CV_64FC1 );

PAGE 123

111 cvInitMatHeader ( M45 3 3 CV_64FC1 c45 ); // o -> means save all the combination if ( toSave == 0 ) { // -----first step of merging --------// cvFilter2D ( src dst1 M90 anchor ); //cvSaveImage("90Degree.jpg",dst1); cvFilter2D ( src dst2 M180 anchor ); //cvSaveImage("180Degree.jpg",dst2); cvMax ( dst1 dst2 dst2 ); cvSaveImage ( "90&180.jpg" dst2 ); cvFilter2D ( src dst3 M270 anchor ); //cvSaveImage("270Degree.jpg",dst3); cvFilter2D ( src dst4 M0 anchor ); //cvSaveImage("0Degree.jpg",dst4); cvMax ( dst3 dst4 dst4 ); cvSaveImage ( "270&0.jpg" dst4 ); cvFilter2D ( src dst5 M135 anchor ); //cvSaveImage("135Degree.jpg",dst5); cvFilter2D ( src dst6 M225 anchor ); //cvSaveImage("225Degree.jpg",dst6); cvMax ( dst5 dst6 dst6 ); cvSaveImage ( "135&225.jpg" dst6 ); cvFilter2D ( src dst7 M315 anchor ); //cvSaveImage("315Degree.jpg",dst7); cvFilter2D ( src dst8 M45 anchor ); //cvSaveImage("45Degree.jpg",dst8); // ------------------------------------------// // ----------second step of merging ---------// cvMax ( dst7 dst8 dst8 ); cvSaveImage ( "315&45.jpg" dst8 ); cvMax ( dst2 dst4 dst4 ); cvSaveImage ( "90&180&270&0.jpg" dst4 ); cvMax ( dst6 dst8 dst8 ); cvSaveImage ( "315&45&135&225.jpg" dst8 ); // -----------------------------------------// // ---------Third step of merging ----------// cvMa x ( dst4 dst8 dst8 ); cvSaveImage ( "ALLDIRECTIONS.jpg" dst8 ); // ----------------------------------------// } // 1 -> means save only the alldirections one else if ( toSave == 1 ) { // -----first step of merging --------// cvFilter2D ( src dst1 M90 anchor ); //cvSaveImage("90Degree.jpg",dst1); cvFilter2D ( src dst2 M180 anchor ); //cvSaveImage("180Degree.jpg",dst2); cvMax ( dst1 dst2 dst2 ); cvFilter2D ( src dst3 M270 anchor ); //cvSaveImage("270Degree.jpg",dst3); cvFilter2D ( src dst4 M0 anchor ); //cvSaveImage("0Degree.jpg",dst4); cvMax ( dst3 dst4 dst4 ); cvFilter2D ( src dst5 M135 anchor ); //cvSaveImage("135Degree.jpg",dst5); cvFilter2D ( src dst6 M225 anchor ); //cvSaveImage("225Degree.jpg",dst6); cvMax ( dst5 dst6 dst6 ); cvFilter2D ( src dst7 M315 anchor ); //cvSaveImage("315Degree.jpg",dst7); cvFilter2D ( src dst8 M45 anchor ); //cvSaveImage("45Degree.jpg",dst8); // ------------------------------------------// // ----------second step of merging ---------//

PAGE 124

112 cvMax ( dst7 dst8 dst8 ); cvMax ( dst2 dst4 d st4 ); cvMax ( dst6 dst8 dst8 ); // -----------------------------------------// // ---------Third step of merging ----------// cvMax ( dst4 dst8 dst8 ); //cvSaveImage("ALLDIRECTIONS.jpg",dst8); // ----------------------------------------// } return ( dst8 ); cvReleaseImage (& src ); cvReleaseImage (& dst1 ); cvReleaseImage (& dst2 ); cvReleaseImage (& dst3 ); cvReleaseImage (& dst4 ); cvReleaseImage (& dst5 ); cvReleaseImage (& dst6 ); cvReleaseImage (& dst7 ); cvReleaseImage (& dst8 ); // in each 45 degree angel cvReleaseData (& M0 ); cvReleaseData (& M45 ); cvReleaseData (& M90 ); cvReleaseData (& M135 ); cvReleaseData (& M180 ); cvReleaseData (& M225 ); cvReleaseData (& M270 ); cvReleaseData (& M315 ); } // -----------------------------Divide Image in parallel ----------------------// IplImage retSubImgNS ( IplImage img CvRect roiRect ) { /// cvSetImageROI(img, roiRect); IplImage subImg = cvCreateImageHeader ( cvSize ( roiRect width roiRect height ), img > depth img > nChannels ); subImg > origin = img > origin ; subImg > widthStep = img > widthStep ; subImg > imageData = img > imageData + roiRect y img > widthStep + roiRect x img > nChannels ; ///cvCopy(img, subImg, NULL); ///cvResetImageROI(img); return subImg ; } // ----------------------------------------------------------------------------// void SubImgRow8DMerge ( IplImage img1 int n int isParallel int toSave int subPro int noiseLevel int toThreshold ) { CvPoint p1 ; // define object of type point CvPoint p2 ; // here it can depend on the number of processors that are used. string iName = "" ; string imName = "" ; string pImage = "" ; string nName = "" ; string wName = "" ; const char fName ;

PAGE 125

113 const char cName ; //int n =16; int workLoad = img1 > height / n ; // so in this program each of the processor will take a block of the image of two diementions workload*width p1 x = 0 ; p2 x = img1 > width 1 ; //p2.y=256; int x ; int y ; int i ; /// omp_set_num_threads(numOfThread); #pragma omp parallel shared(img1,workLoad,n) firstprivate(imName,nName,fName,wName,cName,x,y,i) { #pragma omp for ordered schedule(dynamic) for ( i = 0 ; i < n ; i ++) { x = 0 ; y = i workLoad ; imName = genBack ( "img" i ); // cout<width 1,img1 >height/n)); /* synchronize the data partition in order to avoid having more than one processor working on the same data sections because we set and reset the data of interest and this should be done by only single processor. */ imName = retSubImgNS ( img1 cvRect ( x y img1 > width img1 > height / n )); if ( isParallel == 0 ) { imName = EightDirections ( imName 1 ); } else { if ( noiseLevel > 0 ) { // to smmoth the image using Median filter with different level of noise imName = Smooth_EightDirections_Parlelle ( imName subPro noiseLevel toThresho ld ); } else imName = EightDirections_Parlelle ( imName subPro ); } // we can check to see whether the program is working or not

PAGE 126

114 if ( toSave == 1 ) { cvSaveImage ( cName imName ); // save images dynamicly } /* synchronize the data merging in order to avoid having more than one processor working on the same data sections because we set and reset the data of interest and this should be don e by only single processor. */ #pragma omp critical { mergeSubImg ( img1 imName cvRect ( x y img1 > width img1 > height / n )); } cvReleaseImage (& imName ); // release the images } } cvSaveImage ( "AllDirections.jpg" img1 ); } // ----------------------SubImgBlocks 8 D Merge -------------------------// void SubImgBlock8DMerge ( IplImage img1 int n int isParallel int toSave int subPro int noiseLevel int toThreshold ) // divide image into sub images the division is taken place when we are working on into rows. { CvPoint p1 ; // define object of type point CvPoint p2 ; // here it can depend on the number of processors that are used. string iName = "" ; string imName = "" ; string pImage = "" ; string nName ; string wName = "" ; const char fName ; const char cName ; int workLoad = img1 > width / n ; int workLoad2 = img1 > height / n ; /* so in this program each of the processor will take a block of the image of two dimensions workload*width */ p1 y = 0 ; p2 y = img1 > height 1 ; int x ; int y ; int c = 0 ; int i ; int j ; /// omp_set_num_threads(8); #pragma omp parallel shared(img1,workLoad,n) firstprivate(imName,nName,fName,wName,cName,x,y,i,j,c) { #pragma omp for ordered schedule(dynamic) for ( i = 0 ; i < n ; i ++) { for ( j = 0 ; j < n ; j ++) { x = j workLoad ; y = i workLoad2 ; imName = genBack ( "img" c ); nName = gen ( ".tif" c + 1 ); /* generate images with string names: 1.tif,2.tif,3.tfi and so on */ fName = genC ( nName ); /* convert the string names to const* char */

PAGE 127

115 string wName = gen ( ".jpg" c ); /* return the string as: 0.jpg,1.jpg,2.jpg and so on */ cName = genC ( wName ); /* to generate cost string* by passing string to the function inorder to be accepted by some functions */ // load the sub imag IplImage imName ; // set points to start with // draw_box(img1,cvRect(x,y,img1 >width 1,img1 >heig ht/n)); /* synchronize the data partition in order to avoid having more than one processor working on the same data sections because we set and reset the data of interest and this should be done by only single processor. */ #pragma omp critical { c = c + 1 ; } imName = retSubImgNS ( img1 cvRect ( x y img1 > width / n img1 > height / n )); if ( isParallel == 0 ) { imName = EightDirections ( imName 1 ); } else { if ( noiseLevel > 0 ) { // to smooth the image using Median filter with different level of noise imName = Smooth_EightDirections_Parlelle ( imName subPro noiseLevel toThreshold ); } else imName = EightDirections_Parlelle ( imName subPro ); } if ( toSave == 1 ) { cvSaveImage ( cName imName ); // save images dynamically } #pragma omp critical { mergeSubImg ( img1 imName cvRect ( x y img1 > width / n img1 > height / n )); } // save as tif image cvReleaseImage (& imName ); // release the images } } } cvSaveImage ( "Alldirection.jpg" img1 ); //cvReleaseImage(&img1); } // ----------------------Process Sub Images -------------------// void SubImgCol8DMerge ( IplImage img1 int n int isParallel int toSave int subPro int noiseLevel int toThreshold ) { CvPoint p1 ; // define object of type point CvPoint p2 ; // here it can depend on the number of processors that are used. string iName = "" ; string imName = "" ; string pImage = "" ; string nName ; string wName = "" ; const char fName ;

PAGE 128

116 const char cName ; /* int workLoad = img1 > height/n; // so in this program each of the processor will take a block of the image of two diementions workload*width */ int workLoad = img1 > width / n ; p1 y = 0 ; p2 y = img1 > height 1 ; int i ; int x ; int y ; #pragma omp parallel shared(img1,workLoad,n) firstprivate(imName,nName,fName,wName,cName,x,y,i) { #pragma omp for ordered schedule(dynamic) for ( i = 0 ; i < n ; i ++) { x = i workLoad ; y = 0 ; imName = genBack ( "img" i ); nName = gen ( ".tif" i + 1 ); /* generate images with string names: 1.tif,2.tif,3.tfi and so on */ fName = genC ( nName ); /* convert the string names to const* char */ string wName = gen ( ".jpg" i ); /* return the string as: 0.jpg,1.jpg,2.jpg and so on */ cName = genC ( wName ); /* to generate cost string* by passing string to the function in order to be accepted by some functions */ // set points to start with // draw_box(img1,cvRect (x,y,img1 >width 1,img1 >height/n)); IplImage imName ; imName = retSubImgNS ( img1 cvRect ( x y img1 > width / n img1 > height )); if ( isParallel == 0 ) { imName = EightDirections ( imName 1 ); } else { if ( noiseLevel > 0 ) { // to smmoth the image using Median filter with different level of noise imName = Smooth_EightDirections_Parlelle ( imName subPro noiseLevel toThreshold ); } else imName = EightDirections_Parlelle ( imName subPro ); } if ( toSave == 1 ) { cvSaveImage ( cName imName ); // save images dynamicly } #pragma omp critical { mergeSubImg ( img1 imName cvRect ( x y img1 > width / n img1 > height )); } cvReleaseImage (& imName ); // release the images } } cvSaveImage ( "AllDirections.jpg" img1 );

PAGE 129

117 } // ----------------------------------------------------------------------// Get user input of matrix dimension and printing option // ----------------------------------------------------------------------bool GetUserInput ( int argc ) { bool isOK = tr ue ; if ( argc < 7 ) { cout << "Arguments:<1> <2> <3> <4> <5> <6>" << endl ; cout << "<1> : imageName" << endl ; cout << "<2> : choose type of partition 1: divide into rows, 2: into blocks, 3: into colmouns" << endl ; cout << "<3> : number of blocks" << endl ; cout << "<4> : number of threads" << endl ; cout << "<5> : do you want to run 8D in parallel? 1: Yes, 0: No" << endl ; cout << "<6> : 1 or missing: does not save sub images" << endl ; cout << "<7> : Level of noise 0 means ignor the smoothing,1: low level of noise, 5:very noisy" << endl ; cout << "<8> : 0 No thresholding, 1 apply Iterative Thresholding" << endl ; isOK = false ; } return isOK ; } int main ( int argc char ** argv ) { if ( GetUserInput ( argc )== false ) return 1 ; IplImage img1 = cvLoadImage ( argv [ 1 ], CV_LOAD_IMAGE_GRAYSCALE ); int choose = atoi ( argv [ 2 ]); ///bool isOkay; int toSave = 0 ; int noiseLevel = 0 ; int toThreshold = 0 ; double runtime = 0 ; int n = atoi ( argv [ 3 ]); int numOfThread = atoi ( argv [ 4 ]); int isParallel = atoi ( argv [ 5 ]); toSave = atoi ( argv [ 6 ]); int subPro = atoi ( argv [ 7 ]); noiseLevel = atoi ( argv [ 8 ]); toThreshold = atoi ( argv [ 9 ]); ///om p_set_num_threads(numOfThread); ///runtime = omp_get_wtime(); switch ( choose ) { case 1 : printf ( "Split into Rows, \ t %i subimages, \ t %i threads, \ t %i subThreads, \ t %i Noise Level \ n n numOfThread subPro noiseLevel ); runtime = omp_get_wtime (); SubImgRow8DMerge ( img1 n isParallel toSave subPro noiseLevel toThreshold ); runtime = omp_get_wtime () runtime ; break ; case 2 : printf ( "Split into Blocks, \ t %i subimages, \ t %i threads, \ t %i subThreads, \ t %i Noise Level \ n n numOfThre ad subPro noiseLevel ); runtime = omp_get_wtime ();

PAGE 130

118 SubImgBlock8DMerge ( img1 n isParallel toSave subPro noiseLevel toThreshold ); runtime = omp_get_wtime () runtime ; break ; case 3 : printf ( "Split into Colm, \ t %i subimages, \ t %i threads, \ t %i subThreads, \ t %i Noise Level \ n n numOfThread subPro noiseLevel ); runtime = omp_get_wtime (); SubImgCol8DMerge ( img1 n isParallel toSave subPro noiseLevel toThreshold ); runtime = omp_get_wtime () runtime ; break ; } ///runtime = omp_get_wtime() runtime; //printf("%i subimages, \ t %i threads, \ t %i subThreads \ n",n,numOfThread,subPro); cout << runtime << seconds" << endl ; cvReleaseImage (& img1 ); return 0 ; }