Citation |

- Permanent Link:
- http://digital.auraria.edu/AA00002627/00001
## Material Information- Title:
- Motion estimation techniques and their application to digital image compression
- Creator:
- Schramke, Richard W
- Publication Date:
- 1993
- Language:
- English
- Physical Description:
- ix, 91 leaves : illustrations ; 29 cm
## Subjects- Subjects / Keywords:
- Image processing -- Digital techniques ( lcsh )
Image processing -- Digital techniques ( fast ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) non-fiction ( marcgt )
## Notes- Bibliography:
- Includes bibliographical references.
- General Note:
- Submitted in partial fulfillment of the requirements for the degree, Master of Science, Department of Electrical Engineering, Department of Computer Science and Engineering.
- Statement of Responsibility:
- by Richard W. Schramke.
## Record Information- Source Institution:
- University of Colorado Denver
- Holding Location:
- Auraria Library
- Rights Management:
- All applicable rights reserved by the source institution and holding location.
- Resource Identifier:
- 28863611 ( OCLC )
ocm28863611 - Classification:
- LD1190.E54 1993m .S383 ( lcc )
## Auraria Membership |

Full Text |

MOTION ESTIMATION TECHNIQUES
AND THEIR APPLICATION TO DIGITAL IMAGE COMPRESSION by Richard W. Schramke B.S., Pennsylvania State University, 1984 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Master of Science Department of Electrical Engineering and Computer Science This thesis for the Master of Science degree by Richard Wayne Schiamke has been approved for die Department of Electrical Engineering and Computer Science by Dr. Jay Rothman Date: /-^ H> /ffS Schramke, Richard Wayne (M.S., Electrical Engineering) | Motion Estimation Techniques and their Application to Digital Image Compression Thesis directed by Dr. Joe Thomas In the last ten years, considerable research has been done in the area of image sequence processing. Much of this research has been prompted by the desire for ; i advanced commercial applications such as videoteleconferencing, high-delinition digital television and multimedia systems. i l This thesis investigates the theory behind image sequence processing and two types of techniques used in motion estimation for image compression applications: i correlation-based and gradient-based techniques. A description of various, specific implementations for each type of technique is then given. Before investigating motion estimation techniques, however, a variety of traditional digital image compression techniques are described in order to demonstrate the need for motion estimation in advanced compression applications. [ i The form and content of this abstract are approved. I recommend its publication Signed: I IV DEDICATION This thesis is dedicated to my wife, Corine, who supported me and/or left me alone (whichever was appropriate at the time) throughout it's development and my graduate coursework, even though she would rather have been the one to pursue a i Masters degree. This thesis is also dedicated to Robyn and Erik, my "big" kids, who are old enough (12 and 10 years old, respectively) to realize what time they gave up with me for it, without complaining, yet without completely realizing its significance i (although neither; can imagine writing a report longer than 20 pages); and to Diane, who involuntarily shared the first nine months of her life with it, but also made it take twice as long to complete, thanks to her distracting cuteness. ACKNOWLEDGMENTS I'd like to acknowledge the following people, without whom, either directly or indirectly, this thesis would not have been possible: Vance McCollough, who provided a year's worth of guidance and stimulating discussion, and helped to keep this thesis from becoming my "lifetime achievement, Joe Thomas, who stepped in to advise me on this thesis when I needed it most, my father and mother, Dick and Jean Schramke, who provided me with the education and inquisitiveness to learn, and, from an early age encouraged me to do my best and to do what I enjoyed, and provided me what I needed to do that, David Keller, my 9th grade math teacher, who gave me my first chance to "play" with computers, yet always reminded me to keep my priorities straight, and Tom Vito, my "mentor", who taught me "real" engineering and originally sparked my interest in the topic of this thesis. CONTENTS 1. INTRODUCTION.......................................................1 1.1 Image Compression Applications..............................1 1.2 Thesis Objectives..........................................3 1.3 Outline....................................................3 2. DIGITAL IMAGE COMPRESSION..........................................4 2.1 Technique Classifications....................................4 2.2 General Compression Techniques..............................6 2.3 Statistical Compression Techniques.........................12 2.4 Spatial-Domain Compression Techniques......................17 2.5 Transform-Domain Compression Techniques....................24 2.6 Interframe Compression Techniques..........................29 2.7 Compression Technique Standards........................... 32 3. MOTION ESTIMATION THEORY......................................... 46 3.1 Image Sequence Processing...................................46 3.2 Optical Flow Theory........................................48 3.3 Motion Estimation Techniques...............................51 4. CONCLUSIONS........................................................82 5. FURTHER STUDY......................................................83 6. BIBLIOGRAPHY 84 vn TABLES 2.7- 1 Run-Length Component Value Category Definitions........37 2.7- 2 Sample Huffman Coding Table for AC Component Symbols...39 3.3- 1 Blockmatching Techniques Computational Complexity......62 3.3- 2 Typical Hierarchical Blockmatching Parameters......... 64 V1U FIGURES 2.2- 1 5-Bit Uniform Quantizer Example ........................ 8 2.3- 1 Vector Quantization Example Histogram ................. 15 2.3- 2 Codebook Example ffl ................................16 2.3- 3 Codebook Example #2 ............................. 16 2.3- 4 Codebook Example #3 ................................17 2.4- 1 Typical DPCM Encoder ...................................20 2.4- 2 Pixel Relationships ....................................21 2.7- 1 JPEG Encoder ...........................................33 2.7- 2 JPEG Decoder ...........................................33 2.7- 3 DCT Coefficient Arrangement ............................34 2.7- 4 "Zig-Zag" Run-Length Encoding ......................... 36 2.7- 5 Run-Length Encoding Symbol Construction ............... 37 2.7- 6 H.261 Interframe Encoder .............................. 40 2.7- 7 H.261 Interframe Decoder .............................. 42 2.7- 8 MPEG Encoder Block Diagram ............................ 43 2.7- 9 MPEG Decoder Block Diagram .......................... 44 2.7- 10 MPEG Prediction Scheme ................................ 45 3.2- 1 Aperature Problem Example ............................. 51 3.3- 1 Motion Estimation Correlation Function ................ 52 3.3- 2 Motion Vector Example ................................. 53 3.3- 3 Offset Sampling ....................................... 56 3.3- 4 log D Search Technique Example ....................... 58 3.3- 5 Cross Search Technique Example ........................59 3.3- 6 One-at-a-Time Search Technique Example .................61 3.3- 7 Conjugate Direction Search Technique Example........... 62 3.3- 8 Hierarchical Blockmatching, Level 3 Example............ 66 IX 3.3- 9 Hierarchical Blockmatching, Level 2 Example................. 67 3.3- 10 Hierarchical Blockmatching, Level 1 Example................. 68 3.3- 11 Limb and Murphy Displacement Estimation .................... 70 3.3- 12 Method of Steepest Descent Gradient Estimation.............. 72 3.3- 13 Walker and Rao Gradient Calculations ....................... 76 ! I 1. INTRODUCTION The intent of image compression is to minimize the amount of data required to represent an image [1,2]. There are many ways to perform this operation, each with strengths, weaknesses, and practical limitations, and each based on the particular application they are to be used in. In this chapter we investigate some of the fundamental image compression applications and narrow the focus of this thesis to one particular area of image compression interest: motion estimation. 1.1 Image Compression Applications There are two general categories into which image compression applications can be divided: 1) those for the transmission of images, and 2) those for the storage of images. Fundamentally, however, the theory and associated algorithms used to implement the compression schemes for both is the same. The first of these application types, the transmission of images, encompass a wide-variety of applications, including: communications where the upcoming communication standard, known as the Integrated Service Digital Network (ISDN), is intended to allow not only conventional audio traffic, but also videoteleconferencing [3-7]; broadcasting where the resolution of standard television signals will not only be increased in High-Definition Television (HDTV), but digital standards are also being proposed at the same resolution as HDTV but within the bandwidth 2 limitations of the current standard television signals [3-5, 7-11]; remote sensing which not only includes weather and land resource management satellites, such as LANDSAT [1,3], but which also includes the video signals which are sent back from the Space Shuttle [12]; military where there is a desire to use remotely-piloted vehicles (RPVs) in place of human pilots [1,13]; and, business where digital graphics and FAX machines have become commonplace [3,5]. The second of these application types, the storage of images, is an ever- expanding area of interest which includes: medical diagnosis where digitized images are used routinely in such technologies as computed radiography, computer multimedia tomography (CT) scans, and magnetic resonance imaging (MRI) [3,13,14]; which is one of the fastest growing applications, and includes such technologies as Compact Disc-Interactive (CDI) and Digital Video-Interactive (DVI), and is primarily targeted for advertising and training applications [3,7,15]; and, 3 photojournalism which is probably the newest application, and represents a change from traditional film-based photography to digital data-based, with such products as digital cameras and photo Compact Discs (CDs) [3,15]. 1.2 Thesis Objectives The purpose of this thesis is to: 1) summarize the fundamental theories behind current motion estimation techniques, 2) investigate the role motion estimation plays in advanced digital image compression applications, and 3) describe a variety of general and specific motion estimation implementations. 1.3 Outline In Chapter 2, historical digital image compression techniques are described to understand what limitations are imposed by these techniques. Three current image compression standards are described, two of which are meant for motion image compression applications. This chapter primarily establishes what role motion estimation plays in more advanced compression applications. In Chapter 3, the fundamental theories behind motion estimation techniques are explored. This investigation begins with a description of general image sequence processing and then goes in to optical flow theory, upon which the majority of motion estimation techniques are based. The chapter continues by describing the two primary types of techniques used in motion estimation: correlation-based and gradient-based. It concludes with a description of a variety of specific implementations of each type of motion estimation technique. 2. DIGITAL IMAGE COMPRESSION In order to understand what role motion estimation plays in digital image compression, we must first understand some of the fundamental image compression techniques used. This chapter reviews some of the common and more advanced algorithms currently used in digital image compression. Then, specific examples of still-image and motion-image compression standards are described. 2.1 Technique Classifications In essence, we can classify image compression techniques in at least three different ways. The first of these ways is based on how well we can reconstruct the original image from the compressed version; this describes the technique as being either I lossless or lossy. The second way is based on what type of data redundancy is being exploited by the technique: spatial, transform or temporal redundancy. And, finally, I the third way is based on what data is used by the technique: intraframe or interframe data [13, 16, 17]. 2.1.1 LOSSLESS VS. LOSSY (STATISTICAL! COMPRESSION A lossless technique, also known as a bit-preserving or reversible technique [14], is one for which, upon decompressing the data, the original and decompressed data are exactly the same. Of course, this assumes that the process through which the compressed data is passed, be it a storage or transmission media, does not introduce errors. A lossy data compression technique, also known as an irreversible technique [14], on the other hand, does not reproduce the original data exactly upon decompression, regardless of whether the data was sent through a noiseless channel or 5 Techniques which are based primarily on whether they are lossless or lossy are generally statistical compression techniques, and these techniques need not be limited to just image compression applications. 2.1.2 SPATIAL. TRANSFORM AND TEMPORAL REDUNDANCY All image compression techniques make use of one of three forms of data redundancy in order to achieve data reduction gains. These three forms, or domains of redundancy are: 1) spatial, 2) transform, and 3) temporal. Spatial redundancy, sometimes also refered to as the data-domain [17], makes use of the fact that the intensity of pixels, particularly those surrounding the one being described, are related in some way. This redundancy is somewhat intuitive since we can observe that neighboring pixels normally have similar amplitudes. Of course, this is not necessarily true when we observe a portion of an image near the edge of an object Transform redundancy in images is not something intuitively obvious, generally. There are a number of ways to transform image data, however, into a domain [17] which may exhibit a high degree of localized redundancy. For instance, a solid portion of the image will have a large content of low-frequency energy, while a highly-textured region will have a large content of high-frequency energy. In general, then, the transform domain of one small region will be very similar to that of its neighboring region. And, finally, the temporal redundancy of a sequence of related images is quite possibly the most intuitive redundancy. From one frame to the next, in most cases, there is very little change in pixel values. If this were not the case, there would be a jerkiness to the sequence which, to a human observer, would be extremely annoying. It is this form of redundancy, in time, which gives us the greatest compression gain. 6 2.1.3 INTRAFRAME VS. INTERFRAME COMPRESSION If data from a single, independent image is the only data used in the compression process, then the technique is known as an intraframe technique. However, if data from a sequence of related images is used, the technique is known as an intetframe technique. At first, this classification may seem to be the same as saying that spatial- domain and transform-domain techniques are the same as intraframe, and temporal- domain techniques are the same as interframe techniques. Although both of these statements are true, this classification can not be limited to only these cases. In general, all interffame techniques not only make use of temporal redundancy, but they may also use data in the spatial- or transform-domain to determine and encode this redundancy [5, 17]. 2.2 General Compression Techniques Although there seems to be quite a difference in how digital signals are processed compared to their analog equivalent, in essence the only real difference is that a digital signal is one which is discrete in both time and amplitude. In general, we can choose the interval at which these two discrete components of a digital image are sampled to effectively compress our data. In this section, we will investigate what varying the sample interval does to our representation of the image, and its effect on the quantity of data as simple data compression techniques. 2.2.1 QUANTIZATION Digitization changes a continuous, analog signal into discrete, digital form, creating digital values representing the intensity of particular location on the images. This location is commonly refered to as a pixel or pel (pixel element) [14]. Representing any signal, whether digital or analog, by a set of discrete level is known 7 as quantization. Quantized signal intensity values are also commonly known as PCM (Pulse Code Modulation) levels [13]. Although the terms "digitization" and "quantization" are sometimes used interchangeably, they are not equivalent processes; a digitized signal is always quantized, but a quantized signal is not necessarily digitized, since the original signal may have already been in digital form. Whenever a continuous signal is digitized, some amount of information is always lost, since a finite-sized number can not represent the infinite number of intensity levels associated with a continuous signal. The more bits which are used to represent the continuous pixel intensities, the greater is the systems ability to accurately represent that intensity in discrete form [2]. For instance, if only a single bit is used, then the intensity represented by that bit may only be one of two values, normally white or black. However, if 8 bits are used, the intensity may now be represented by 256 values, which, for a monochrome image, is normally gray levels ranging from white to black. We can see that the more bits which are used the more accurate the representation can be. But, we can also see that an infinite number of intensities can never be represented by a discrete value, since an infinite number of bits would be required! The dynamic range of a discrete signal is another way in which we may describe the capability of a system to represent the continous pixel intensity. By definition it is the difference in amplitude between the smallest and largest intensity which a digital word may discriminate, normally expressed in dB (decibels). Each bit represents 6 dB of dynamic range, and, therefore, our 8-bit quantization represents a dynamic range of 48 dB [19]. Normally, a uniform quantizer is used to digitize continuous signals, since these quantizers are generally easy to implement When a signal is uniformally quantized, this means that the discrete amplitude levels generated by the quantizer are equally 8 spaced between the minimum and maximum values [1]. An example transfer function for a uniform quantizer is shown in Figure 2.2-1. However, nonuniform quantizers, or companders, may also be used. A nonuniform quantizer may be implemented by performing a nonlinear transformation on the sampled signal [1]. In general this is done in order to perform rudimentary data compression. While not actually eliminating data, as one would normally consider data compression, it maximizes the use of the limited number of quantization levels. The form of this nonlinear transform is primarily based on the probability of a level being contained in the image [16]. Discrete .. Output -- till >\ I I H) I 1 1 1 1 I I I t 1 Continuous Input Figure 2.2-1 5-Bit Uniform Quantizer Example 9 However, this transform is normally determined in such a manner as to reduce the mean squared error of the quantized signal [20]. Such a transform is called a Lloyd-Max Quantizer when the following error, E, is minimized: E= CLH [x-u(x)fpu(x)dx =E [(w-w)2] (2.2-1) i where u is the continuous pixel intensity, with probability density functions Pu(x), is the discrete variable representing u, and E[] is the expected value of the random variable. It can be shown [2] that the solution to this equation, which minimizes E, is a function of only the total number of quantization levels and the probability of each distinct value. For a Lloyd-Max Quantizer it can also be shown [2] that: 1) the output is an unbiased estimate of the input values, and 2) the quantization error is orthogonal to the quantizer input.! However, as we can see from Equation 2.2-1, this quantizer is only optimal for a particular image, and is not optimal for any other image which does not have the same histogram. Based on these results, and the amount of tolerable distortion, a limited amount of image compression can be done by carefully choosing the quantizer transform and size, such that an optimum quantizer is used for the desired number of bits. For instance, most commercial, monochrome frame grabbers today are designed for 8-bit PCM quantization. However, for a uniform quantizer it is known [2] that only 6 bits are required by the human visual system; and, furthermore, if the optimum quantizer is used for each particular image, only 5 bits are required. Therefore, we can effectively compress our image by 8:5, or a factor of 1.6, by designing an optimum quantizer designed around visual quantization requirements. Although not mathematically 10 lossless, such a quantizer would be considered visually lossless, since a human observer would not be able to detect the losses. 2.2.2 SAMPLING RATE Just as the dynamic range of the image data may be chosen to trade off storage for image quality, so may the rate at which the image is sampled. It is obvious that the lower the rate at which the image is sampled, the smaller the amount of data that will be generated will be. The lower bound on the sampling rate, however, is limited by the quality of the image desired. Nyquists sampling theorem states that the lowest sampling rate allowed must be at least twice the highest frequency of the signal that we wish to represent. Translated for image sampling, this means that the sampling rate is directly related to how sharp edges in the image may be represented. Another way to look at the effect the sampling rate has on the image quality is to consider what physical limitations it places on the image representation. For instance, if an image is first sampled at frequency fs, then each pixel has a size of l/fs x l/fs. If the sampling rate is cut in half, then each pixel now has a size of 2/fs x 2/fs. We can see that each pixel sampled at half the rate would then represent twice the horizontal and vertical size of the pixels sampled at fs; it also means that, where there were once 4 distinct pixel values, there is now only one. In effect, the values of the original 4 pixels are averaged to form the new, single pixel value. For areas of constant intensity, such as a blank wall, we can see that this averaging would have absolutely no effect on the image quality. However, if the image has lots of detail in it, where adjacent pixel intensities can vary greatly, we can intuitively see that this will have a substantial effect on image quality. Suppose that our original image was simply a checker board, where adjacent pixels alternated between white and black, in both the horizontal and vertical direction. 11 If we sampled at half the rate, we would be averaging two white and two black pixel values, to generate a composite gray pixel. Instead of having a checker board pattern, we would then have an image which would just be constant gray throughout. Subsampling the image, therefore, creates an intensity blurring effect If, then, the desire is to compress data through a subsampling technique, we must determine what level of blurring is acceptable, primarily based on general, a priori knowledge of our images, and choose our minimum sampling rate accordingly. 2.2.3 FRAME RATE Standard television stations in the United States broadcast 60 interlaced fields per second, which produces a frame rate of 30 frames per second. This frame rate was chosen for a nuriiber of reasons, primarily because it exceeds the rate at which flicker is perceived by the human brain, and also due to the frequency of AC power in the United States. However, what if 15 frames per second were determined to be sufficient? Just cutting the frame rate in half would also cut our data rate in half, producing an effective compression ratio of 2:1. This is the simplest way in which varying the frame rate may contribute to compression gain [1]. 12 2.3 Statistical Compression Techniques i 2.3.1 LOSSLESS COMPRESSION LIMIT Theoretically, there is a maximum compression ratio possible for any particular sampled data set, beyond which lossless data compression is not possible. This limitation is based on Shannons fundamental work in the area of information theory, and is known as the source data entropy [18,21-23]. According to this theory, the minimum average number of bits, H, by which any particular sample in the data set can be represented is: tf=-Eflfclog2(flfc) (2-3-1) k where is the probability that the particular value, k, occurs in the data set, I and His the entropy, or average number of bits used to represent any possible value in the data set. Fof a typical monochrome image, the maximum lossless compression ratio is rarely better than 3:1 or 4:1 [24]. It is this fundamental law of information theory that all data compression techniques must abide by if they are to be considered lossless. However, depending on what amount of distortion is acceptable, a lossy data compression technique can achieve significant compression with very little perceivable, or at least objectionable, distortion. Many of the image compression techniques used are lossy but may be considered to be visually lossless [14]. 2.3.2 VARIABLE LENGTH CODING The lossless data compression technique which is the most optimal from an entropy perspective is Huffman encoding [25]. This technique, which relies on statistical knowledge of the data to be compressed, creates variable-length codes from the fixed-lengthy quantized values. 13 Observing Equation 2.3-1, it is apparent that if the variable-code length for each value fc is chosen to be k=-log 2ipk) (2.3-2) then the average code length will exactly equal the calculated entropy. However, code lengths must be integer values, and the length Ik, calculated in this manner, is generally not an integer. Therefore, we must choose the code length to be the smallest integer value greater than that given in Equation 2.3-2, or 4=-flog2(ft)l (2.3-3) which will always give us an average length greater than the entropy. This approach to choosing the variable code lengths is known as Shannon-Fano coding i [21,26], but does not, in fact, produce an optimal code. This is due to the fact that the code lengths must be integer values. The encoding technique which is optimal, however, is the Huffman code [25] This code is produced with the intent of assigning the smallest code lengths to the most probable data values, and the largest code lengths to the least probable values. To increase the compression efficiency, pairs, triples, etc., of data may be encoded using the Huffman technique. The more data values which are included in a particular code, the closer the average code length will be to the theoretical entropy limit There are some inherent problems with using variable length codes, however. One requirement is that, since the decoding process must distinguish all codes uniquely, one code must not be a prefix to any other code [25]. The way in which Huffman codes are designed guarantees that this requirement is met, and that there is no ambiguity between codes. Another requirement, in order to maintain the calculated average data compression for: a variable length code, is that the required statistics, in particular the 14 probability that each data value occurs, be known prior to coding, and that the decoding process have a codebook which maps the variable-length codes to their corresponding data values. In general, the exact statistical nature of the data being compressed is not known. However, if common statistical information is available about normal data of the type being compressed, such as the probability of a particular character occuring for text processing applications, the code can be built based on this more general statistical data. As long as the data being compressed does, in fact, behave like the normal data, the compression obtained should be close to optimal. Unless the exact nature of the data is known, however, the actual compression ratio will be lower than the optimal rate for which the code was designed. 2.3.3 VECTOR QUANTIZATION Vector quantization is a lossy coding technique which maps all image intensity values to a smaller subset of intensity values, or vectors, the collection of which are known as a codebook. The codebook is primarily chosen to produce minimal distortion to the images being compressed, based on some known fidelity criteria. Preferably, relatively few vectors will be chosen for the codebook to represent a relatively large number of intensity values, while maintaining a desired reconstruction quality [27, 28].; As an example, suppose we have 16 possible intensity values, represented by a 4-bit number, and wish to create a codebook of only 4 vectors, represented by a 2-bit number. This would give us an effective compression ratio of 2:1. There are many ways to choose a codebook, few of which may minimize distortion. Assume that a histogram for our example data is shown in Figure 2.3-1. A simple way to create the codebook might be to ignore the lower two bits of our intensity value, and only transmit the upper two bits. This codebook selection requires no knowledge of what the statistical nature of our data is. Our codebook, mapping, and 15 relative distortion, would appear as illustrated in Figure 2.3-2. (For simplicity sake, we are using the;absolute difference as our distortion measure for these examples). If we change the mapping slightly, to more closely reflect this data set, we see that the distortion changes, as shown in Figure 2.3-3. Now, let us choose what appears to be a better codebook, based solely on our histogram, as shown in Figure 2.3-4. We see that the distortion is significantly reduced, by a factor of 6.3:1 from Example #1 to Example #3. There are many techniques which may be used in the selection of a vector codebook. It is beyond the scope of this thesis, however, to discuss any particular techniques. A good discussion of different types of vector quantization techniques may be found in [14]. Quantity Intensity Value Figure 2.3-1 Vector Quantization Example Histogram 16 Intensity Codebook Reconst. Absolute Value1 Value Value Diff. Ouantitv Distortion 0 0 0 0 0 0 1 0 0 1 1 1 2 0 0 2 2 4 3 0 0 3 4 12 4: 1 4 0 7 0 5 1 4 1 30 30 6 1 4 2 40 80 7 1 4 3 30 90 8 2 8 0 7 0 9 2 8 1 4 4 10 2 8 2 2 4 11 2 8 3 1 3 12 3 12 0 0 0 13 3 12 1 0 0 14 3 12 2 0 0 15 3 12 3 0 0 Total Distortion: 228 Figure 2.3-2 Codebook Example #1 Intensity Codebook Value Value Reconst. Value Absolute Diff. Ouantitv Distortion 0 0 3 3 0 0 1 0 3 2 1 2 2 0 3 1 2 2 3 0 3 0 4 0 4 1 6 2 7 14 5 1 6 1 30 30 6 1 6 0 40 0 7 1 6 1 30 30 8 2 8 0 7 0 9 2 8 1 4 4 io 2 8 2 2 4 11 2 8 3 1 3 12 3 12 0 0 0 13 3 12 1 0 0 14 3 12 2 0 0 15 3 12 3 0 0 Total Distortion: 89 Figure 2.3-3 Codebook Example #2 17 Intensity Codebook Reconst. Value Value Value 0 0 4 1 0 4 2 0 4 3 0 4 4 0 4 5 1 5 6 2 6 7 3 7 8 3 7 9 3 7 10 3 7 11 3 7 12 3 7 13 3 7 14 3 7 15 3 7 Absolute Diff. Ouantitv Distortion 4 0 0 3 1 3 2 2 4 1 4 4 0 7 0 0 30 0 0 40 0 0 30 0 1 7 7 2 4 8 3 2 6 4 1 4 5 0 0 6 0 0 7 0 0 8 0 0 Total Distortion: 36 Figure 2.3-4 Codebook Example #3 2.4 Spatial-Domain Compression Techniques Beyond the general and statistical data compression methods discussed thus far, any further compression of the data should make use of one or more known characteristics of the data being compressed. Spatial-domain compression techniques makes use of the fact that neighboring pixels tend to be highly correlated in intensity [1, 13, 29-31]. 18 2.4.1 RUN-LENGTH CODING When consecutive pixels have equal intensity values, data compression may be achieved by representing these values in the form of a run-length code. A run-length code consists of two parts: an intensity value and a run-length. The intensity value simply indicates what the common value of the pixels are, and the run-length specifies how many pixels, in a row, have this value. Obviously, the more pixels which have this intensity value, the greater the compression ratio will be when using run-length coding techniques. This technique, however, is normally associated with the compression of bi- level, or black-and-white, images, where each intensity value may be represented by a single bit One such application would be in the transmission of FAXes, where each pixel is specified as either being black or white. When combined with Huffman coding techniques, optimized to represent typical run-lengths for the associated application, run-length coding of bi-level images can represent significant compression gains [32]. The majority of real world, multi-level images, though, where intensity levels are represented by more than one bit do not normally have significant runs of pixels with equal values, however. Even if, to the human eye, the intensity values may appear to be equal, most values will, in fact, deviate about an average value by some number of intensity levels. In this case, then, not only will there be very little compression achieved, but quite often the compressed image will have more data than the original. One multi-level image application which does use run-length coding, and achieves significant compression gains, is the area of computer-generated graphics. When a computer is used to generate images, such as briefing charts and graphical diagrams, where there are only a relatively small number of colors, run-length coding tends to perform well. Another set of applications where run-length coding performs 19 well, but is not done in the spatial domain, are for the some of the current digital image compression standards, such as discussed in Section 2.7. 2.4.2 DPCM Differential PCM (DPCM) data compression, in the spatial-domain, makes use of the fact that the intensity of a pixel, although not exactly the same, is normally closely related to the intensity of pixels around it. In DPCM, therefore, the difference of the current pixel, from one or more neighboring pixels, is transmitted rather than its actual intensity value [1], Figure 2.4-1 shows a general block diagram for the implementation of a DPCM encoder, where: w(&) is our current pixel, a(k) is our prediction of u(k) ? u(k) is our prediction bias, e(&) is the error between our predicted and actual pixel value, and e(k) is our quantized error, which is transmitted. There are two characteristics of the DPCM encoder which should be noted. First, the predicted value for the current pixel, u(k) 5 is some function of previous pixels, or, more accurately, previous prediction values and errors, a(k) and e(k). This means that the encoder must have some memory, or delay capability. And, second, the transmitted value is the prediction error, following a quantization function. This quantization function effectively determines the number of bits used to represent the pixel value, but it also affects the prediction error. The choice of both of these functions, prediction and quantization, greatly influences the effectiveness of the i DPCM encoder. For the simplest case of a DPCM encoder, assume that our prediction for the current pixel is equal to that of its leftmost neighbor, A, as shown in Figure 2.4-2 [13]. In this case, our quantized error, e(k), would be equal to our actual error, e{k), and our prediction, a(k), would be the intensity of the previous pixel, u(k-1). 20 Therefore, we can see that our transmitted value in this case would be: e(k) = u(k) -u (fc-1) (2.4-1) It is obvious however, that, to some extent, the more pixels we use in our predictor, the better our prediction should be; and, since we are sending the prediction error, the more compact our transmitted values will be. The number of pixels used in the prediction is considered to be the order of the predictor. Some of the more commonly used predictors are [14]: u( k) = 0.97A u(k)= 0.50A + 0.50C u{k)= 0.90A -0.8IB + 0.90C u(k)= 0.75A -0.50B 0.75C w(fc)=A-B +C (2.4-2) Previous Line 21 s? O0-00* Figure 2.4-2 Pixel Relationships The efficiency of any particular DPCM scheme is directly related to the selection of both the predictor and quantizer functions. These two functions are, necessarily, inter-related, and must, therefore, be mutually chosen. In most cases a Lloyd-Max quantizer is the most optimal choice, based on a given predictor [33]. 2.4.3 ADPCM While the majority of DPCM predictors and quantizers are globally applied across an image, Adaptive DPCM (ADPCM) attempts to choose a more optimal predictor and/or quantizer based on localized image statistics. For ADPCM, an effective scheme must be employed to determine which DPCM coder would be the most optimum to use for a particular segment of image data. Along with prediction errors, then, information must be transmitted which identifies the DPCM coding scheme being used for each particular portion of the image data. While, in most cases, the prediction errors are significantly reduced in magnitude, in this manner, the overhead data associated with coder identification must also be accounted for when determining the effective compression ratios obtained. 2.4.4 BIT-PLANE CODING Bit-plane coding, like DPCM, assumes that neighboring pixels are closely related in intensity. However, unlike DPCM, which takes advantage of the fact that the intensity difference between two neighboring pixels is small, bit-plane coding assumes 22 that the significant bits of neighboring are highly correlated For instance, for 8-bit intensity values,ibit-plane coding takes advantage of the fact that the most significant bit, bit 7, will normally be the same for adjoining pixels. Likewise, there will be some correspondence in the other bit planes which will allow the data to be compressed [14]. In general, the more significant the bit-plane, the higher the correspondence between pixel values there will be. This is particularly true when there are areas of highly uniform intensity. However, this is not always the case. For example, take the condition, for 8-bit values, where the uniform intensity is either 127 (most significant bit is 0, all others are 1) or 128 (most significant bit is 1, all others are 0). From a DPCM perspective this deviation is not large at all, but in bit-plane coding it means that every bit plane experiences a change when going from one of these values to the other. Such is also a problem when coding near the edge of an object, where the background and object may have a high degree of contrast. To compensate for the former of these effects, Gray codes may be used in place of the typical, linear binary quantization values. Gray codes are designed such that, for all intensities which differ by only one quantization level, the codes differ in only one bit plane, as well; likewise, for differences of two quantization levels, the codes differ at most in two bit planes, etc. Using gray codes, therefore, makes bit plane coding more of a linear function of the image intensities. If we encode each of the bit planes independently, it is intuitive to use run- length coding tb represent each plane. This is particularly true for the more significant bits, which will tend to have rather long runs of ones and zeros. However, depending on our application and intensity characteristics, we could also encode the more significant bits using run-length coding, while using some number of less significant bits, and encoding them using DPCM or some other encoding technique. The 23 particular implementation used to encode bit-planes is not as important as the fact that we make use of some statistical nature of the bit planes to accomplish data reduction. One useful feature of bit-plane coding is that the image can be progressively transmitted or stored, essentially varying the frame rate. This is particularly important for applications which require human observation, but where communication bandwidth, or storage availability, is limited. If the most significant bit planes are transmitted first, a coarse image, in amplitude, will appear on the users monitor. As the transmission progresses, and more bit planes are sent, more amplitude detail will be added to the image, thereby removing intensity coarseness. Although not significant technically, this technique has been found to be somewhat more acceptable to users, in some applications, than equivalent scan transmissions, where full resolution is sent for each pixel, in order, and the image is built along scan lines. This method of transmission is sometimes also refered to as hierarchical coding. 2.4.5 SUBBAND CODING Although the final spatial-domain technique we investigate may not seem like one, in concept, it is. Like bit-plane coding, subband coding techniques decompose an image in to smaller, and more granular, subimages. But, unlike bit-plane coding, which decomposes the image as a function of intensity, subband coding creates subimages by decomposing the image as a function of frequency. At first, subband coding may seem to be a transform-domain technique, due to the fact that frequency is the primary component exploited. However, the implementation of this technique, in fact, makes it a spatial-domain technique. Subband coding decomposes the image in to subimages by applying finite-impulse response (FIR) filters at appropriate frequencies, not transforming the image in to the frequency (or transform) domain. 24 Subband coding generates a subimage for each frequency band we filter the original image in to. If, for instance we used two-dimensional quadrature mirror filters, which allows alias-free reconstruction, we will get four subimages, each containing a quarter of the information from the original. Each subimage will indicate strong components for areas of the original image which are contained in the response- curve for the particular filter used. For example, if a high-pass filter is used, edges and highly-textured areas will have strong components, while, if a low-pass filter is used, smooth areas will have strong components. Each of these subimages will normally exhibit better statistical characteristics than the aggregate image. One of the previously discussed procedures, particularly DPCM coding, may then be used to encode each of the subimages. And, just like bit- plane coding, subband coding allows for effective progressive transmission of the image to take place. For instance, the subimages may be transmitted from the lowest- frequency components to highest. Then, as each subimage is received, the image will improve from being a blurry image, with no high-frequency components, to a sharp image, containing all of the original image frequency components [34]. 2.5 Transform-Domain Compression Techniques When transform-domain techniques are used, the spatial image data is transformed in to some other domain. The primary reason that a transformation is done is to deconrelate the spatial image data in to orthogonal components. This transformation, then, produces a component for which each coefficient maximizes the amount of information, or energy, which it represents. In image compression applications, then, this means that this one component will contain more information, per bit, than the equivalent data did in the spatial-domain. For instance, if the spatial- image is decomposed in to two equal sized components, and the major component 25 represents 95% of the spatial-image information, then an effective compression ratio of almost 2:1 is directly obtained. In addition, based on the particular transform-domain in to which the image is decomposed, if certain components of the domain are known to contain little or no significant data, than these components can be greatly, or completely suppressed with little degradation to the original image. For example, most natural images, in the frequency-domain, contain little or no energy in the high-frequency components. Knowing this, therefore, these components can be quantized to very few bits, relative to the lower frequency components, or completely ignored, with little degradation to the reconstructed image quality. Two other characteristics of transforms, besides decorrelation, are important when it comes to using them for image compression applications. The first of these is that the transform should ideally decorrelate the spatial-image independent of the actual image data. In other words, the optimum transform should not depend, in any way, on the particular statistics of the spatial image. And the other characteristic, which makes a transform truly practical, is that there must be some efficient, or fast implementation for it [35]. The following sections briefly describes a few of the more common transform- domains in to which images are sometimes converted for compression applications: 2.5.2 KARHUNEN-LOEVE TRANSFORM The Karhunen-Loeve Transform (KLT) is the theoretical optimal transform, from an energy-packing perspective [2,36]; this means that a minimal set of coefficients may be found which represents more information, per number of coefficients, than any other possible transformation. The KLT is sometimes also referred to as a Hotelling, principal component, or eigenvector transform. 26 TheKLT is essentially found by calculating the covariance matrix of an image. It can be shown [17] that the basis vector terms are then given by: However, even though the KLT completely fulfills the decorrelation criteria desired, it fails both the independence and implementation criteria. Due to the fact that the optimal KLT i s a function of the image covariance matrix, it is totally dependent upon the image statistics; in other words, a different, optimal KLT exists for each image to be compressed. And, secondly, no fast implementation for this transform has been found [2], Due to these last two failures, then, the KLT is not used for general image l (2.5-1) 0
where: N is the order of the transform, |