Citation
Optimized spatial query processing on solid state drives

Material Information

Title:
Optimized spatial query processing on solid state drives
Creator:
Mahmoudi, Nafiseh
Place of Publication:
Denver, CO
Publisher:
University of Colorado Denver
Publication Date:
Language:
English

Thesis/Dissertation Information

Degree:
Master's ( Master of science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Department of Electrical Engineering, CU Denver
Degree Disciplines:
Electrical engineering
Committee Chair:
Banaei-Kashani, Farnoush
Committee Members:
Ra, Ilkyeun
He, Liang

Notes

Abstract:
The advancement in design of the modern storage units that are developed based on the non-volatile memory (NVM) technology, most notably solid-state drives (SSDs), has revolutionized the storage technology, and the storage products developed based on this new technology are quickly replacing the rotation-based hard disk drives (HDD) in most computing platforms, from mobile devices to data centers. In particular, SSDs offer dominant I/O characteristics as compared with HDDs, such as much lower latency (hence, much higher I/O rates), higher throughput from random I/O, less power consumption, less heat production, and most importantly, the ability to retrieve many stored data blocks in parallel. In this paper, we propose a family of “SSD-aware” spatial query processing algorithms for efficient processing of spatial range and spatial join queries on massive datasets leveraging the parallel I/O feature of SSDs. Our analytical and empirical evaluation of the proposed query processing solutions show up to 35 percent improvement over the existing state of the art spatial query processing solutions that fail to benefit from the parallel I/O feature of SSDs. Solid State Drives (SSDs) have emerged as a new type of secondary storage medium with both improved characteristics (e.g., access time) and new characteristics (e.g., parallel data retrieval) as compared to HDDs. In this project, we are focused on developing SSD-aware data management solutions that both are adapted to the improved features of SSD and can leverage the new features to achieve higher performance. For instance, we are developing SSD aware index structures that can benefit from parallel retrieval capabilities of SSDs for high-performance implementation of spatial operators (e.g., spatial range and join queries) on massive spatial datasets.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
Copyright Nafiseh Mahmoudi. Permission granted to University of Colorado Denver to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

i OPTIMIZ ED SPATIAL QUERY PROCESSING ON SOLID STATE DRIVE by NAFISEH MAHMOUDI B.Sc. , University of Allameh Rafie , IRAN, 2012 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requiremen ts for the degree of Master of Science Electrical Engineering Program 2017

PAGE 2

ii © 2017 NAFISEH MAHMOUDI ALL RIGHTS RESERVED

PAGE 3

iii This thesis for the Master of Science degree by Nafiseh Mahmoudi has been approved for the Electric al Engineering Program by Dr. Farnoush Banaei Kashani, Chair Dr. Ilkyeun Ra Dr. Liang He Date : December 16, 2017

PAGE 4

iv Mahmoudi, Nafiseh (M.S., Electrical Engineering Program) Optimiz ed Spatial Query Processing o n Solid State Drive Thesis directed by Assistant Professor Farnoush Banaei Kashani ABSTRACT The advancement in design of the modern storage units that are developed based on the non volatile memory (NVM) technology, most notably solid state drives (SSDs), has revolutionized the storage technolo gy, and the storage products developed based on this new technology are quickly replacing the rotation based hard disk drives (HDD) in most computing platforms, from mobile devices to data centers. In particular, SSDs offer dominant I/O characteristics as compared with HDDs, such as much lower latency (hence, much higher I/O rates), higher throughput from random I/O, less power consumption, less heat production, and most importantly, the ability to retrieve many stored data blocks in parallel. In this paper , we of spatial range and spatial join queries on massive datasets leveraging the parallel I/O feature of SSDs. Our analytical and empirical evaluation of the pro posed query processing solutions show up to 35 percent improvement over the existing state of the art spatial query processing solutions that fail to benefit from the parallel I/O feature of SSDs. Solid State Drives (SSDs) have emerged as a new type of sec ondary storage medium with both improved characteristics (e.g., access time) and new characteristics (e.g., parallel data retrieval) as compared to HDDs. In this project, we are focused on developing SSD aware data management solutions that both are adapte d to the improved features of SSD and can leverage the new features to achieve higher performance. For instance, we are developing SSD -

PAGE 5

v aware index structures that can benefit from parallel retrieval capabilities of SSDs for high performance implementation of spatial operators (e.g., spatial range and join queries) on massive spatial datasets. The form and content of this abstract are approved. I recommend its publication. A pproved : Farnoush Banaei Kashani

PAGE 6

vi DEDICATION I dedica ted my dissertation to my parents, Fatemeh and Hossein, who have always loved me unconditionally and whose good examples have taught me to work hard for the things that I aspire to achieve.

PAGE 7

vii ACKNOWLEDGEMENT S I would like to express the deepest appreciatio n and gratitude to my a dvisor, Dr. Farnoush Banaei Kashani and immense knowledge. His guidance and persistent help was paramount in my research. Without his help this dissertati on would not have been possible. I wou ld also like t o thank my friends and CSE BDL ab members for standing by my side at truly challenging times . Last but not least, I would like to thank my family for their continuous support in this journey.

PAGE 8

viii TABLE OF CO NTENTS C HAPTER I . INTRODUCTION ................................ ................................ ...................... 1 II . BACKGROUND ................................ ................................ ...................... 3 2.1 NAND F LASH A RCH ITECTURE ................................ ................................ 3 2.2 F LASH T RANSLATION L AYER ................................ ................................ ... 5 2.3 A DVANCED C OMMANDS AND T HEIR R ESTRICTIONS ................................ 6 2.4 R EAD /W RITE P ROCESS OF A R EQUEST ON SSD ................................ ....... 7 2.5 P AGE C ONFLICT ................................ ................................ ....................... 7 III . RELATED WORK SURVEY ................................ ................................ . 9 3.1 E XPLORATION OF S S TRUCTURE ................................ ...................... 9 3.2 E MPIRICAL E VALUATION OF Q UERY P ROCESSING ON SSD .................... 10 3.3 S PATIAL Q UERY ................................ ................................ ..................... 10 3.3 P ARALLEL S PATIAL Q UERY ................................ ................................ ... 11 IV . PROBLEM DEFINITION A ND FORMALIZATION ......................... 12 4.1 P ROBLEM D EFINITION ................................ ................................ ............ 12 4.2 O VERVIEW ................................ ................................ ............................. 13

PAGE 9

ix 4.2.1 Indexing ................................ ................................ .......................... 13 4.2.2 Hierarchical Traversal of Batch Range Queries ............................. 14 4.2.3 Hierarchical Traversal of Join Query ................................ ............. 16 4.2.4 SSD Aware Spatial Query: ................................ ............................ 18 4.3 P ROPOSED S OLUTION ................................ ................................ ............. 20 4.3.1 Page Conflict Graph ................................ ................................ ....... 21 4.3.2 Steps of Scheduling and Packing ................................ ................... 24 4.3.3 Packet aware Query Processing ................................ ..................... 26 4.4 O PTIMIZATION ................................ ................................ ....................... 26 4.4.1 Hybrid Batch of Range Query ................................ ...................... 26 4.4.2 Hybrid Memory and Packet Aware Join Query Processing ......... 26 V. EXPERIMENTS AND RESU LTS ................................ ......................... 27 5.1 S ETUP AND C ONFIGURATION ................................ ................................ . 27 5.2 W ORKLOADS AND PERFORM ANCE METR ICS ................................ ........... 28 5.3 R ESULTS AND A NALYSIS ................................ ................................ ........ 28 5.3.1 Comparison of Packing and Baseline ................................ .......... 28

PAGE 10

x 5.3.2 SSD Configuration ................................ ................................ ....... 30 5.3.3 Spatial Join Query ................................ ................................ .......... 33 5.3.4 Spatial Range Query ................................ ................................ ....... 36 5.3.5 Comparison of Disk and SSD ................................ ........................ 37 VI. CONCLUSION AND FUTUR E WORK ................................ .............. 38 REFERENCES ................................ ................................ .............................. 39

PAGE 11

xi LIST OF TABLES T ABLE 1. TOTAL AND PARTIAL CONFLICTS ................................ ............................... 8 2. EXPERIMENTAL SSD CONFIGURATION PARAME TERS .............................. 27 3. EXPERIMENTAL WORK LOAD ................................ ................................ ..... 28

PAGE 12

xii LIST OF FIGURES FIGURE 1. INTERNA L STRUCTURE OF SSD ................................ ................................ ... 3 2. ALLOCATION SCHEME S ................................ ................................ ............... 5 3. RANGE QUERY ................................ ................................ ........................... 14 4. J OIN QUERY ................................ ................................ ................................ 16 5. MAPPING LPA TO PP A ................................ ................................ ................ 19 6. TOTAL AND PARTIAL CONFLICT EDGES ................................ .................. 23 7. TOTAL CONFLICT ED GES ................................ ................................ ........... 24 8. SPATIAL JOIN COMP ARISON ................................ ................................ ...... 29 9. RANGE QUERY COMPA RISON ................................ ................................ .... 29 10. 24 PAGE ALLOCATI ON SCHEMES ................................ ............................. 30 11. DIFFERENT NUMBER OF CHANNELS ................................ ........................ 31 12. DIFFERENT NUMBER OF CHIPS ................................ ................................ 32 13. DIFFERENT NUMBER OF DIES ................................ ................................ .. 32 14. DIFFERENT NUMBER OF PLANES ................................ ............................. 33 15. UNIFORM UNIFORM DISTRIBUTION ................................ ........................ 34

PAGE 13

xiii 16. UNIFORM SKEW ................................ ................................ ....................... 34 17. CLUSTER DISTRIBU TION ................................ ................................ .......... 35 18. REAL DATASET ................................ ................................ ......................... 35 19. COMPARISON BETWE EN DIFFERENT D ISTRIBUTION .............................. 36 20. REAL DATASET RAN GE QUERY ................................ ............................... 37 21. DISK VS. SSD ................................ ................................ ............................. 37

PAGE 14

1 CHAPTER I I INTRODUCTION Spat ial Database Management System (SDBMS) is a database to store spatial data and manage them to process various types of spatial queries. SDBMS is utilized in many applications such as Computer Graphics, Image Processing, Geographic Information System (GIS), Data Warehouse, Transportation Planning, Location based Service, Computer Aided Design (CAD), and many other areas. For over a decade, spatial databases have been an active area of research in both science and industry. Spatial query processing encounters challenges as the data size increases and access to data expands. Spatial data objects are more complex than basic data as spatial data contain points, lines, polygons, and volumes. Therefore, execution of spatial query is more complicated than one dimens ional data as it comprises of validation of several conditions in multiple dimensions. So, spatial query processing requires intensive spatial computation and disk I/O access. In other words, spatial query processing can be computationally expensive withou t proper utilization of inherent hardware capabilities and data organization. Several researchers have studies various methods to enhance the performance of spatial queries. Performance of SDBMSs in handling spatial queries depends on the spatial indexing techniques and the underlying hardware. Indexing techniques have been adapted to spatial data in order to improve the retrieval efficiency and boost the pe rformance of spatial queries . In addition to data structure modification, hardware specific approach es that allow parallel algorithms have been studied to optimize the speed of spatial query processing. For example, parallel R tree is designed for shared disk environments in [1] .

PAGE 15

2 Traditional research on spatial query optimization assumed the data resides on disk. With the adven t of Solid State Drives (SSDs), it is critical to explore new avenues of research in optimally utilizing their capabilities in management of spatial queries. The recent trend in switching from disk to SSD brings a whole new functionality as the way data is stored on SSD is very different than disk. On SSD, the data is supposed to follow a specific structure to be stored. Spatial queries execution time has an increasing linear relationship with the number of spatial objects. The internal parallelism proper ty of SSD has been disregarded in the literature whereas it can be efficiently utilized to optimize spatial query operations. However, internal parallelism is far beyond the scope of basic operations of SSDs. In this thesis, we use this unique merit of SSD to optimize spatial query operations. We focus on two of the most popular spatial query types, range and join queries, and propose a model to optimize their execution on SSD. We introduce two strategies to scale up spatial query processing operations ove r SSD: (I) packing of spatial data to access them in parallel, (II) a new hybrid algorithm to optimally perform spatial query on packed data by combing BFS and DFS algorithms. For the proposed approach we use static scheme for physical page allocation on S SD. Based on our experimental results, our proposed methods achieve significant improvement (35%) in performance of spatial database systems. The rest of the thesis is organized as follows. Section II presents the background, and in section III the related work is reviewed. Section IV is about the problem definition and proposed solution. Experiments are presented in Section V. Section VI summarizes conclusion and future works.

PAGE 16

3 CHAPTER II II BACKGROUND This section presents a brief overview of important as pects of SSD's internal structure which is a fundamental concept in developing our research. A flash based SSD is a non volatile memory which consists of four components as displayed in Figure 1, which are explained as follows. I/O requests are transferred through an interface connection (e.g., SATA, SAS or PCIe) to SSD. Host interface is a connection between the host system and SSD. SSD controller is responsible to issue I/O requests from/to host interface controller to/from flash memory controller which i s connected to NAND flash memory packages through multiple channels. The embedded RAM buffer stores memory address tra nslation information and data. The following sections explain the hardware and firmware structures of SSD in details. Figure 1. Internal Structure o f SSD 2.1 NAND Flash Architecture As shown in the right side of Figure 1 , each channel of flash memory controller is shared with multiple NAND flashes. NAND flashes consist of two or more dies that contain multiple

PAGE 17

4 planes. Each plane in cludes thousands of blocks. Block is a group of fixed number of pages (e.g., 32 to 256 pages) and can be erased individually. The size of a page is between 5 12 B to 16 KB. Respectively, the size of a block is between 16 KB to 4 MB. The smallest component o f read/write in NAND Flash is a Page which is composed of memory cells. Based on the special internal architecture of NAND flash, it is able to process I/O requests in parallel using its internal resources. There are Four levels of parallelism available in SSDs. In Channel level parallelism , each channel can be accessed si multaneously and independently. NAND flashes can operate independently in Flash package parallelism whereas in Die level parallelism Dies share the same I/O bus. Die level parallelism can be exploited by using interleave command (which is explained in Section 2.3 as follows). In Plane level parallelism , the smallest unit that can be operated in parallel is plane level. As described in Advanced Command section, multi plane command can submi t multiple planes simultaneously in Plane level parallelism. We can generate 24 allocation schemes by 4 abovementioned parallelism models while data access varies across these 24 page allocations. However, they all have a common feature: the set of pages i n a single plane is always fixed and remains unchanged across allocation scheme.

PAGE 18

5 Figure 2. Allocation Schemes 2.2 Flash Translation Layer Flash Translation Layer (FTL) is the core software of the SSD controller. Since the inner structure of NAND flash is different than HDD, it hides the inner structure of NAND Flash from Host. It maps the logical page address (LPA) that is used by host to physical page address (PPA) that is understandable by NAND flash. The PPA includes the following information in the exact order of channel, package, die, plane, block, and page addresses. Allocation scheme specifies how a PPA is assigned to a LPA. The examples of allocation schemes are shown in Figures 2 . Allocation schemes are classified into two categories: dynamic an d static. In dynamic allocation, a LPA is allocated to a free PPA from the entire

PAGE 19

6 SSD after verifying availability of channel, package, die and planes in addition to the priority order of parallelism. However in static allocation, a LPA is assigned to a pr e determined PPA. The PPA in static allocation is determined by modulo operation. 2.3 Advanced Commands a nd Their Restrictions Read, write (or program) and erase are the three basic operations in NAND flash. In this research, our primary focus is on read operation. In addition to the basic operations, some advanced commands such as copy back, multi plane and interleave are implemented to enhance the parallelism feature of SSD. Thus, multi plane and interleave commands are used in this research as wel l. The following presents the overview of multi plane command and its essential enhancement to the parallelism feature. Multi plane advanced command executes plane level parallelism for the same operation (read/write) in all planes within the same die. To execute multi plane read or write, the pages should have the same package, die, block and page addresses. If two pages are in two different planes and their page/block/die numbers are not the same, they cannot be retrieved in parallel. For example, in WPC D mode of Figure 2 , pages 1 and 19 are not accessible in parallel but pages 1 and 3 can be read in parallel by using multi plane advanced command. The execution time of multi plane advanced command for reading all pages that can be accessed by it is equal to the reading time of one page with a basic command. Because of the internal structure of a flash package, several dies share the same multiplexed interface. Therefore, it is not possible to send read request to all of them simultaneously. So, interleave command is used to retrieve several pages in different dies of the same flash package.

PAGE 20

7 2.4 Read/Write Process of a Request o n SSD The host system sends an I/O request through the host interface connection to the SSD. Each request is divided to mult iple sub requests in form of LPA that are assigned by the host system. Then, the host interface issues the request to the SSD controller to be converted to PPAs by FTL that is understandable by NAND flash. As mentioned before, PPA includes the information of channel, package, die, plane, block and page addresses. Flash memory controller sends sub requests to multiple channels in parallel. A series of sub requests pipeline across different flash packages within the channels. Then, the requests interleave to multiple dies within same package whereas each die includes multiple planes. Thus, the requests are sent to different planes using multi plane command and takes their assigned blocks and pages. 2.5 Page Conflict There are some restrictions and conflicts on parallel reading of pages on SSD. These conflicts can be classified in the following categories. 1. If a set of requests are sent to the same channel, this will be caused I/O bus contest. However, interleaving can be used inside the channel. 2 . If a set of requests need to access the same package, there will be conflict. But potentially they can use die interleaving. 3. If a set of requests require access to the same die in different planes, they can use multi plane command. However, as mention ed in background section 2.3, to execute multi plane command, one restriction is that all pages must have the same die/package/channel number. In other words, the pages' addresses should be the same.

PAGE 21

8 4. Pages in the same plane cannot be fetched in paralle l. Therefore, there will always be conflict in reading the pages within the same plane. As shown in Figure 2, pages 1, 17 and 33 that are located in one plane cannot be read in parallel. Page conflict is defined as the condition in which any of the abovem entioned conditions happens on the resource of SSD. We group page conflicts into partial conflicts and total conflicts. Partial conflicts contain categories 1 and 2 that have the potential to explore parallelism inside SSD by using interleaving within a ch annel and die interleaving. In Category number 3, if the requests can pass the restriction and can use multi plane, they belong to partial conflict otherwise they are considered as total conflict. Category number 4 is total conflict that means there is al ways conflict. All the conflicts are shown in T able 1, whereas 0 means there is no conflict and 1 means there is a conflict. For example, resource number 4 that is inside a package has a partial conflict but it can use die interleaving. So, there is no to tal conflict inside a package. Table 1 . Total and Partial Conflicts Resource Partial Total 1. Among channels 0 0 2. Channel 1 0 3. Among packages 1 0 4. Package 1 0 5. Among dies 1 0 6. Die 1 0/1 7. Among planes 1 0/1 8. Planes 1 1

PAGE 22

9 CHAPTER I II III RELATED WORK SURVEY We review the related work in four categories: SSD and Internal parallelism, query processing over SSD, spatial query and parallel spatial query. 3.1 In recent years, popularity of f lash memory based storage increased because of their significant improvement over hard disk in agile processing capabilities. Accordingly, a great deal of research has been conducted on flash based SSDs. Some studies considered different aspects of SSDs [2] [4] . For instance, the architecture of SSDs was studied in [2], [5], [6] Agrawal et al proposed a range of design choices of SSD and analyzed the performance of different configurations of SSD and its parameters such as page size in [5] . The I/O performances of SSDs are hierarchical dependent. Different types of SSD simulator are available in [7] [9] . One important f eature of SSD structure that facilitates superior processing capability of it its internal parallelism. A detailed description of internal parallelism of SSDs has been studied in [10], [11] . Chen et al. represented the improvement of I/O performance by exploiting internal parallelism of SSD s in [ 11] . A key step in fully benefiting from parallelism of SSD is to use allocation schemes. It was shown that page allocation schemes have an essential role to exploit SSD's parallelism feature [7], [12] . Performance of all possible page allocation schemes was evaluated and compared with each other in [12] .

PAGE 23

10 Simultaneous accessing multiple resources in SSD may result in a conflict among I/O requests. In order t o leverage random read performance in SSD, one study proposed an I/O request scheduling scheme to avoid resource contention [13] . Since SSD controller need to access the physical page addresses of requests that are hidden within FTL, they implemented some hardware adjustment. 3. 2 Empirical Evaluation of Query Processing on SSD Investigating data structures and algorithms that leverage fast random reads to speed up query processing are presented in [14], [15] . Tsirogiannis et al [15] propose FlashScan and FlashJoin, a Flash aware scan and join implementation for business intelligence and other data analysis queries for a column based page layout. Their results show that s ignificant performance and energy gains can be achieved by pushing query processing components inside the Smart SSD. In another work Ghodsnia et al [16] improved the I/O parallel characteristics of the SSDs and proposed parallel I/O aware query optimization. 3.3 Spatial Q uery A summary of several spatial query techniques is presented in [17] . These techniques include methods that are used in internal memory like (nested loop join, plane sweep and Z order) and met hods that are performed in external memory. External memory methods are classified in three categories; Spatial join algorithms when either datasets are indexed, over non indexed datasets, and spatial join methods when one of the datasets not indexed. Diff erent spatial index structures are used, however, one of the most popular one is R tree based spatial [18] .

PAGE 24

11 3.3 Parallel Spatial Query If the number of the spatial datasets increases, parallel algorithms can be used to reduce the execution time of a spatial join. Several methods have been proposed in parallelizing spatial join [19] [28] . One study uses shared virtual memory architecture to implement parallel spatial join algorith m [1] . Also, some studies have investigated the shar ed nothing environment in implementing parallel spatial joins [19], [22] . GPU based spatial join was described in [27] using R tree based spatial join and was compared with CPU parallelism. A spatial join implementation for non indexed polygonal data over GPU was performed in [23] . Some work has been done in parallelizing spatial join with MapReduce, such as on clusters [20] , and multi way spatial join [ 25] .

PAGE 25

12 CHAPTER IV IV PROBLEM DEFINITION AND FORMALIZATION 4.1. Problem Definition This research focuses on optimization of spatial queries, particularly range query and join query. An overview of different spatial queries and their underlying algorithms is presented as follows. 1. Range Query : Let be a set of spatial objects consisting of data assigned to points in Euclidean space. A range query is defined by specifying the coordinates of its lower left and upper right corners. A spatial object is contained into t he range query if and only if Database system usually receives multiple range queries. In order to improve the overall performance, we group a set of range queries in batches, i.e. is defined as a batch of spatial queries, 2. Join Query : Given two datasets and in Euclidean space, spatial join query returns all pairs of objects that satisfy a spatial predicate which and are points. As noted in definition of range and join queries, the process of executing a spatial query involves validation of some conditions on observations in one or two datasets. Depending on the size of dataset(s), the retriev al of data points and validation of conditions may take a really long time. In order to measure the efficiency of query execution process, we compare the

PAGE 26

13 throughput and execution time of queries. Our optimization model maximizes throughput by minimizing ex ecution time of spatial queries. We define our objective function as follows. Definition 1 . For given spatial queries, schedule queries to minimize the total query response time, by using special features of SSD. In order to understand our propose solution to the optimization model, we provide an overview of the R Tree allocation and its access pattern in SSD. 4.2 Overview This section summarized optimal indexing approach tha t facilitates better reading of data. It also provides detailed explanation on how the nodes of indexed dataset(s) are mapped in SSD's pages. 4.2.1 Indexing The first step in performing an efficient spatial query is to index the dataset(s) and provide sync hronized traversal of the indices. There are several techniques to perform join and range queries on spatial datasets [17] . In this study we chose the approach that is based on index ing the dataset(s) with R tree [18] to perform spatial queries. Figures 3 and 4 show some samples of spatial data that are partitioned into multiple hierarchical MBRs for spatial indexing. Hierarchical traversal in this research includes Breadth First [28] , Depth First [1] as described below. In the following, the hierarchical traversal of spatial queries range and join are explained respectively in details with an example for each of the approaches. Furthermore, the process of mapping pages in SSD is explained.

PAGE 27

14 Figure 3. Range Query 4.2.2 Hierarchical Traversal of Batch Range Queries A s shown in Figure 3, a batch of range queries is executed on the indexed dataset. The range queries spr ead into accessing the R Tree nodes across different subtree paths. In BFS approach, we traverse trees level by level. In each level, nodes that meet th e query condition will be fetched into a set named S. When descending the tree to reach the next level, sub nodes of the nodes that are elements of S will be fetched and stored in set S, and previous nodes will be deleted. As we progress to lower levels, t he number of pages in S increases dramatically. If we reach to a leaf, all populated nodes are given as the output. In DFS approach, we examine each node and its sub nodes with the query condition before moving to the neighbor nodes, which are at the same level. For any nodes and its sub nodes that meet the query condition, they are retrieved and stored in set S. The parent node is removed from S and the process continues until we reach to the leaf. Then, the process starts over from the neighbor node to t he originally selected node. According to Figure 3, is a batch of range queries. Range query overlaps node R 1 that includes sub nodes Also, overlaps with sub nodes

PAGE 28

15 The same coverage criter ia is met between query and sub nodes Algorithm 1 . BFS Traversal Range Query 1. Input : 2. 3. 4. While do : 5. do : 6. 7. do 8. 9. 10. Based on algorithm 1 BFS approach, in the first step, the parent n odes are checked to determine any overlap with . Since both and overlap , we add and to set . In the next steps, the sub nodes of all nodes in (i.e. are checked for overlap condition . The parent nodes that were already in are replaced with their Sub nodes that meet the overlapping condition . The process continues to the leaf. At the end of process, it will return all nodes that are included in . Algorithm 2 . DFS Traversal Range Query 1. Input : 2. For do : 3. IF 4. 5. do : 6. 7. do 8. 9. 10.

PAGE 29

16 DFS starts from node 1 and since it overlaps the batch of range queries then explores its sub node 3 . It moves forward with the sub nodes of 3 that are and . So, the sequence of the nodes that have to be visited recursively is All overlapping nodes are added to set . In each time, when a parent node in S that is already read will be removed and replaced with its sub nodes. As outlined in above approaches, there is a set S that includes all pages that need to be retrieved at each step. 4.2.3 Hierarchical Tra versal of Join Query In this section, we explain the traversal of the trees for join query. In BFS approach, in each level, two trees are evaluated simultaneously and if a node meets join condition, e.g. intersection, it is assigned to set S. In next lev el, we only evaluate the nodes that had intersection in the upper level and if they meet the condition, they are added to set and the nodes that were already in S from the parent level will be deleted. We follow the same logic until we arrive to the leaf level. Then, all objects that had intersection with each other are populated and added to set [28] . Figure 4. Join Query For example, as shown in Figure 4, we start from the roots of both trees and traverse both trees synchronously. In other words, we check versus and store the pairs of the join results in set (i.e. In the next step, we only check the entries

PAGE 30

17 of the MBRs that had overlap in the previous level. Therefore, we check the intersection between sub no des with sub nodes . The overlapping nodes will replace their parent nodes in set . The synchronized tree traversal and joining the nodes continues level by level down the R tree until we reach the leaf entries of both trees. A ll nodes in set will be set as the output when we reach to the leaf entries. Algorithm 3 . BFS Traversal Join Query 1. Input : 2. 3. While do : 4. do : 5. 6. do 7. 8. 9. DFS is very similar to the app roach presented for range queries but in this case we simultaneously check the two trees. We start with a node in tree A and check all of its sub nodes with all nodes in the tree B for intersection. Only if there is an intersection between parent nodes, we examine the sub nodes and in case of acceptance, we include them in S and remove the parent nodes. After we check all nodes, all the objects that meet the condition are assigned to set S [1] . DFS approach starts with checking MBR with all MBRs of B and its sub nodes then moves to . Because has no intersection with and then is checked for any intersections. We check with and since they have intersection we move forward with checking against Because of no further intersection, we check agai nst and add the nodes in Set . Then, we check with . We continue with the same process

PAGE 31

18 on the most recent selected node until we reach to the leaf or there are no more nodes to check in under the selected node. The same process is repeated for all other nodes and their sub nodes. Each node in will be replaced with its sub nodes if they meet the join condition. After reaching the leaf, all nodes in S are recorded as the result. Algorithm 4 . DFS Traversal Range Query 1. Input : 2. For do : 3. For do : 4. IF 5. 6. do : 7. 8. do 9. 10. 11. As outlined in all above approaches, there is a set that includes all pages that need to be f etched. Our objective is to minimize the retrieval process time of those pages using parallelism feature of SSD. The preceding section provided the detail aspects of spatial query processing without considering SSD. In the following section, the SSD aware spatial query processing will be explained. 4.2.4 SSD Aware Spatial Query: Each node of the R Tree(s) is defined by an LPA in host system and maps to a page in SSD. When a request is sent from the host system to SSD and arrives at the FTL, an LPA is conve rted to a PPA. This mapping can be one of the 24 static allocation schemes or a dynamic allocation. For example, in figure 5, LPA maps to PPA 24. Typically, array representation of a tree allocates the nodes in sequential order physically. But SSD allo cation scheme

PAGE 32

19 distributes the R Tree nodes across different resources. This enables the parallel access of nodes. Figure 5. Mapping LPA to PPA As a benchmark, we read page requests from SSD without using any specific approach and call it baseline model. In the baseline method there is no I/O scheduling and page read requests are issued as FIFO. In this model, page read requests can be fetched in parallel because of the possibility of no result on the contention. So, baseline model can minimize the retrie val process of pages using parallelism feature of SSD in contrast with disk.

PAGE 33

20 However, spatial database management systems receive several queries. Since we should read huge number of nodes from SSD, there is a chance of hot spot occurrence and resource co ntention on SSD and the nodes may be in conflict with each other. Thus, we will not be able to read the maximum possible number of pages in parallel. To improve the baseline model, we reorganize the page requests to increase the number of pages that can be read in parallel and therefore minimize page conflict. Based on this proposition, we formalize our original problem into packing of page read requests. Definition 2: For given set of page read requests (S) of the R tree/ R trees, we partition S into mini mum number of PACKETs. Definition 3: A Packet is a set of page requests that can be accessed in parallel while not requesting the same SSD resour For example, if we assume the number of channels is 4, and each channel contains 2 chips, and each chip is composed of 2 dies, and there are 2 planes per die, the maximum number of pages in a packet will be This packing method is independent of internal feature of SSD. The next section explains how packing process of pages in a set is performed. 4.3. Proposed Solution This section descri bes the steps and procedure of the Request Scheduling and Packing methods. Due to the nature of conflict between pages that reside on SSD, we propose an intuitive heuristic to solve this problem that relies on the general heuristics developed for Graph Col oring problem. In addition, details about the data structure used in implementing the solution is provided.

PAGE 34

21 4.3.1 Page Conflict Graph Some pairs of pages are in coflicts in the problem of this study because of the limitations in their designated physical a ddresses. Such conflicts can be embodied through the structure of a graph. Each page is shown by a node and each conflict is represented by an edge connecting two nodes, i.e. two conflicting pages should be directly connected with an edge. In other words, the graph data structure is built on top of the page conflicts. Let be a page conflict graph, where is the vertex set and E is the edge set that connects conflicting vertices that cannot be read in parallel. Therefore, the process of selecting non conf licting pages is the same as coloring neigbor vertices with different colors in Graph Coloring problem with one additional cosntraint: the maximum number of pages (vertices) that have the same color is limited to the packet capacity. A coloring of this gr aph is labeling of the pages into colors such that no two conflicting pages that share the same edge will have the same color. In the literature, Graph Coloring problem is known to be NP hard [29], [30] and there is not a known algorithm which optimally can color t he vertices within reasonable time. Conventional approaches to solve near optimal Graph Coloring problem within bounded time depend on heuristic approaches. Therefore, we propose a heuristic approach to solve our problem within attainable time frame. Our o bjective is to create the optimized conflict free page schedule. In graph coloring point of view to potentially solve our page selection problem, the pages could act as vertices. Then edges are drawn between pages that have conflicts. The graph is a heterg enous network background page similar to properly coloring the vertices by minimum number of colors. Thus, the objective

PAGE 35

22 could potentiall be finding the minimum connected vertices. Colors which are used, are stored in array of used color and the chromatic number is the total number of colors in the arr ay. Algorithm 5 . Coloring graph 1. Input : 2. Step 1 : Compute degree sequence of the input 3. Step 2: Assign color1 to the vertex v i of G having highest degree 4. Step 3: Assign color1 to all the non connected vertices of v i and store c olor1 into used color array 5. Step 4: Assign new color which is not previously used to the next uncolored vertex having next highest degree. 6. Step 5: Assign the same new color to all non connected uncolored vertices of the newly colored vertex. 7. Step 6: Repeat step 4,5 until all vertices are colored. 8. Step 7: n=chromatic number of the colored graph= total colors in used color array The heuristic to solve the counterpart graph coloring problem that represents the original page selection problem contains three s tages; The first stage it to color vertices with the minimum number of colors. The process follows the steps presented in Algorithm 5. The second stage is focused on selecting same color pages to be included in one packet with limited capacity. In this sta ge, the colors are sorted based on the number of pages that have their colors in decending order. For example, color#1 has the highest number of pages with this color. Starting from the top color in the list, the maximum number of pages that can be grouped in a packet are fetched as one packet. The process continues until there is no pages with this color to be fetched in a packet. The same steps are repeated for each remaining color in the sorted list of colors. The fetching and packing step is repeated it eratively until there are no more pages to retrieve.

PAGE 36

23 Figure 6. Total a nd Partial Conflict Edges In the third stage, we remove all the edges that have partial conflicts and then color the nodes in the graph. The next step is c lustering the graph and fetching pages with the same colors and assign them to the packet. This steps is also iteratively repeated until there is no more pages that can be fetched. As depicted in F igure 7 , when we remove the partial conflicts, number of clu sters increases. From each cluster, pages that have the same color are fetched and are assigned to a packet. If two pages have conflicts they cannot be assigned into the same packet. By removing the partial conflicts, less number of colors are required to color the graph.

PAGE 37

24 Figure 7. Total Conflict Edges General heuristics to solve problem with graph data structure is to perform exhaustive search of entire graph. This could add overhead for very large graphs. To manage the complex ity of the solution, we propose a heuristic algorithm that generalizes graph coloring algorithm to solve page selection problem. 4.3.2 Steps of Scheduling and Packing In this research, the combination of multi plane and interleave advanced command is used to increase the parallel unit utilization of SSD. The scheduling and packing processes are outlined in the following steps. For a given set S, we cluster the pages that cannot be read in parallel. There are two conditions to check for validation of confli ct between pages:

PAGE 38

25 Algorithm 6 . Page conflict 1. Input : 2. Max size of packet = 3. While do { 4. 5. While do { 6. 7. IF PPA have the same plane# Check if channel#, Package#, Die# are the same OR 8. I F PPA have the same Die# Check if channel#, Package# are the same and Page# are d ifferent Reject 9. E lse add (p) to (i) 10. Size(i)=Size(i)+1 } 11. Size(s)=size(s) 1} Condition 1. PPA is checked to evaluate if the pages have the same plane/die/package and channel addresses but different page addresses. Condition 2. PPA is evaluated by multi plane restriction, if pages have the same die address, package address and channel address but their page numbers are different. The pages that pass the two above conditions are grouped in a cluster. Then we sort clusters in descending order by thei r cluster size. From each cluster a page is added in the first packet to be read in parallel. The process continues until all non conflicting pages are added e page to be added to it, the packet is sent for processing. The pages that do not meet the condition to be added to the previous packet will be added to the next packet while validating for conditions 1 and 2. All pages are added to different pack accordi ng to these steps until there is no other pages to be included in a packet. The next request is processed by iterating through these steps.

PAGE 39

26 4.3. 3 Packet aware Query Processing By using packet concept in BFS, at each level of tree we can pack the page requ ests that each node and its sub nodes are packed in packages for parallel processing without conflicts. Similar to BFS approach, we group nodes of set S in packe ts. 4.4 Optimization 4.4.1 Hybrid Batch of Range Query There is the option for S to be read with either DFS or BFS. In hybrid model, we manage the conflict of SSD page access for selected pages to be read. In other words, if a page is selected to be r ead from SSD by DFS or BFS, it is first validated for any conflicts in reading other pages on SSD in parallel. For example, if nodes and can be read in parallel from SSD, we start with DFS algorithm. Also, if and can be read in parallel, we process their retrieval with BFS algorithm. 4.4.2 Hybrid Memory and Packet Aware Join Query Processing In hybrid method, we include the valid nodes from the same level in until we reach the memory limit. Then we continue selecting nodes within the same level from one tree, while the other tree is traversed through it sub nodes. In the meantime, we pack the nodes into pac kets

PAGE 40

27 CHAPTER VI EXPERIMENTS AND RESULTS 5. 1 Setup and Configuration In this study, we use SSDSim [7] that is an event driven simulator for evaluation and experimental purposes. SSDSim provi des accurate and detailed implementation of different layers of SSD. Also, it facilitates the execution of advanced commands such as multi plane and interleaving that are essential for optimal modeling in this study. In this research, we simulate SSD with 8 channels, and each channel consists of 4 packages, and 2 dies per package, and 2 planes per dies, whereas each plane contains 2048 blocks. The methods were implemented in C# programming language using a personal computer with a 2.50GHz Core i7 Intel CPU and 8 GB memory. The summary of experiment specifications is outlined in Table 2. Table 2. Experimental SSD Configuration Parameters Parameter Value Number of channels 8 Number of packages/chips per channel 4 Number of dies per chip/package 2 Number o f planes per die 2 Number of blocks per plane 2048 Number of pages per block 64

PAGE 41

28 5.2 Workloads and performance metrics We used both synthetic and real world workloads in our experiments. Table 3 provides details of each workload separately. Tab le3. Experimental Workload Dataset Points Distribution Uniform 500K, 1, 2, 3, 5, 7, 10, 20 M Uniform Skew 500K, 1, 2, 3, 5, 7, 10, 20 M Skew Cluster 500K, 1, 2, 3, 5, 7, 10, 20 M Cluster Real 200k, 1M Uniform Cluster In our experiments we me asured the total response time, and analyze the performance of our scheduling algorithm on all 24 allocation schemes. 5.3 Results and Analysis In the following, we present performance measurements of different approaches and compare the results of baselin e and packing methods for join and range queries. 5.3.1 Comparison of Packing and Baseline We run our experiment on various sizes of datasets. The comparison between BFS, DFS and the hybrid method for Packing and Baseline approach is shown in Figure 8. B FS performs better in small datasets. However, if the size of datasets increases, the performance of DFS becomes better than BFS. Hybrid version performs the best among all these methods. The

PAGE 42

29 performance of packing approach is 40% better than the baseline approach. Figure 8. Spatial Join Comparison Figure 9. Range Query C omparison In Range query, when we have 1M points and the number of range queries increase from 10 to 500, the BFS app roach performs better than DFS as shown in Figure 9. The hybrid 0 50 100 150 200 250 300 350 400 1M 2M 3M 5M 7M 10M 20M Response time (S) Number of points JOIN QUERY BFS DFS Hybrid Baseline-BFS Baseline-DFS 0 2 4 6 8 10 12 14 10 50 100 200 400 500 Response time (S) Number of queries Uniform DISRTIBUTION BFS DFS Hybrid Baseline-BFS Baseline-DFS

PAGE 43

30 approach improves the performance by around 5%. Packing method decreases the execution time by 35% on average. According to Figures 8 and 9, the packing approach improves the performance of spatial join and range queries on average 40% and 35%, respectively. By comparing all methods, the hybrid approach performs the best among others. Henceforward, we check the hybrid approach of packing with the BFS baseline method. We select the size of dat aset as 5M points to show the result of spatial join query. 5.3.2 SSD Configuration Figure 10. 24 Page Allocation Schemes In the baseline method without packing, die and plane allocation schemes perform better than channel level allocation schemes in most cases. The reason is that, die and plane allocation schemes benefit from advanced commands which combine multiple requests together, and therefore minimize the execution time. 0 10 20 30 40 50 CDPW CWDP CPWD CDWP CPDW CWPD DPWC DWCP DCPW DWPC DCWP DPCW WCDP WDCP WPDC WPCD WCPD WDPC PCDW PWDC PDCW PCWD PWCD PDWC Response time (S) Allocation Scheme Spatial Query Packing Baseline

PAGE 44

31 In channel level allocation scheme, there can be conflict between the upcoming request and the request that has already been sent to read the pages in planes. So, some extra time is needed to avoid reso urce conflicts. In other words, the channels are busy until the read process of pages from the previous requests is completed. In case of packing, due to scheduling of the requests, the I/O requests fully cover all the internal resources and the execution time of all the 24 page allocation schemes is approximately the same. Figure 11. Different Number of Channels 0 10 20 30 40 50 60 70 80 90 4 6 8 10 Response time (S) Number of Channels Spatial QUERY Packing Baseline

PAGE 45

32 0 10 20 30 40 50 2 4 6 8 Response time (S) Number of Chips Spatial QUERY Packing Baseline Figure 12. Different Number of Chips 0 10 20 30 40 50 1 2 4 8 Response time (S) Number of Dies Spatial QUERY Packing Baseline Figure 13. Different number of Dies

PAGE 46

33 Figure14. Different Number of Planes dies, and pla nes, the execution time will decrease. Increasing the number of channels has the highest impact among the other resources because by using channels, I/O requests are sent independently through each channel. If the number of planes and dies increases, it im proves the performance. Hence, I/O requests have to be sent by interleaving through a channel. 5.3.3 Spatial Join Query As shown in these figures, the execution time of un iform skewed data is shorter than the others. In uniform uniform case, which means both dataset are generated by uniform distribution and spread in all the space, more time is needed to search all the space in both datasets and join them. The execution tim e for joining cluster cluster distribution datasets is the highest among the others. In clustered datasets, the density of data varies across the space, so the process time to locate the cluster of data and evaluate their query condition increases. 0 10 20 30 40 50 60 1 2 4 8 Response time (S) Number of Planes Spatial QUERY Packing Baseline

PAGE 47

34 Figu re 15. Uniform Uniform Distribution Figure 16. Uniform Skew 0 20 40 60 80 100 120 140 160 2M 3M 5M 7M 10M Response time (S) Number of points JOIN QUERY Packing Baseline 0 20 40 60 80 100 120 2M 3M 5M 7M 10M Response time (s) Number of points Uniform Skew Packing Baseline

PAGE 48

35 Figure 17. Cluster Distribution In comparing different distribution types for data, uniform skewed data requires less time in comparison with the case when both of datasets are uniform. On the other hand, clustered data processing requires more time. Figure 18. Real Dataset Join Queries 0 20 40 60 80 100 120 140 160 180 2M 3M 5M 7M 10M Response time (S) Number of points Clustered dataset Packing Baseline 0 0.5 1 1.5 2 2.5 3 3.5 4 200K 1M Response time (S) Number of points Spatial Join Query Packing Baseline

PAGE 49

36 5.3.4 Spatial Range Query According to Figure 19, the response time of uniform dataset and the range queries data is distributed in a cluster way is less t han of uniform dataset and uniform distribution of range queries. The reason is when the range queries are clustered, it needs less time for searching and performing range query. The highest execution time is when the dataset has a skewed distribution, e. g. gamma distribution, and the range query is distributed uniformly. It is because the density of range queries in dataset increases and accordingly causes increasing in execution time. Our result on real dataset for range queries shows BFS approach perfor ms better than DFS and Hybrid performs the best among all approaches. Overall, the packing approach perform 33% better than baseline approach. Figure 19 . Comparison Between Different Distribution 0 1 2 3 4 5 6 7 8 Uniform-Uniform Unifrom-Skew Uniform-Cluster Skew-Uniform Response time (S) Distribution of Dataset Queries Comparison Packing Baseline

PAGE 50

37 Figure 20. Real D ataset Range Quer ies 5.3.5 Comparison of Disk and SSD Figure 21 shows the result of Spatial join query on Disk and SSD. Flash SSD typically performs 3.5 times better than disk regardless of packing approach. Packing approach outperforms disk by 6 times shorter processing time. Figure 21. Di sk Vs. SSD 0 2 4 6 8 10 12 10 50 100 500 Response time (S) Number of queries Spatial Range Queries Packing Baseline 0 200 400 600 800 1000 1200 1400 1600 5M 10M 20M Response time (S) Number of points JOIN QUERY Packing Baseline Disk

PAGE 51

38 CHAPTER VI I CONCLUSION AND FUTURE WORK This research presented a new method of designing an I/O scheduling packing model by scheduling packing approach helps minimize the resource conflict on SSD by maximizing the usage of parallel unit of SSD. The approach uses the Physical Addresses of the requests to pack the requests. In this work we evaluated the result of performing spatial range query and spatial join query by reading the nodes from the SSD with scheduling packing approach. Our experiments show 40% improvement in spatial join query and 35% improvement in performing spatial range queries compared with the baseline approach. In our future work, we will l everag e other features such as heat gene ration and lifetime and we will o ptimiz e the performance of different types of spatial queries such as KNN and enhance the reading rate and minimize the execution time.

PAGE 52

39 REFERENCES [1] T. Brinkhoff, H Proceedings of the Twelfth International Conference on Data Engineering , 1996, pp. 258 265. [2] stics and Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems , New York, NY, USA, 2009, pp. 181 192. [3] D. Agrawal, D. Ganesan, R. Si Adaptive Tree: An Proc VLDB Endow , vol. 2, no. 1, pp. 361 372, Aug. 2009. [4] S. W. Lee, B. Moon, C. Park, J. M. Kim, and S. Ssd in En Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data , New York, NY, USA, 2008, pp. 1075 1086. [5] Wobber, Ted, John D. Davis, Mark Manasse, Mark N. Agrawal, and V. Prabhakaran. "Design trade offs for ssd performance." 57 70. [6] state Disks (SSDs) As a Function Proceedings of the 36th Annual International Symposium on Computer Architecture , New York, NY, USA, 2009, pp. 279 289. [7] Interplay of SSD Parallelism Through Advanced Commands, Allocation Strategy and Proc eedings of the International Conference on Supercomputing , New York, NY, USA, 2011, pp. 96 107. [8] SIM: Configurable and Proceedings of the 2009 ACM Symposium on Applied Computing , New York, NY, USA, 2009, pp. 318 325.

PAGE 53

40 [9] Flash Based Solid 2009 First International Conference on Advances in System Simulation , 2009, pp. 125 131. [10] flash memory based solid state drives in high 2011 IEEE 17th International Symposium on High Perfo rmance Computer Architecture , 2011, pp. 266 277. [11] Based Solid Trans Storage , vol. 12, no. 3, p. 13:1 13:39, May 2016. [12] ods in Emerging Non Volatile IEEE Trans. Parallel Distrib. Syst. , vol. 28, no. 3, pp. 746 759, Mar. 2017. [13] in Proceedings of the 39th Annual International Symposium on Computer Architecture , Washington, DC, USA, 2012, pp. 404 415. [14] J. Do, Y. on Smart SSDs: Opportunities and Challeng Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data , New York, NY, USA, 2013, pp. 1221 1230. [15] Processing Techniques for Solid State Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data , New York, NY, USA, 2009, pp. 59 72. [16] in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data , New York, NY, USA, 2014, pp. 349 360. [17] ACM Trans Database Syst , vol. 32, no. 1, Mar. 2007.

PAGE 54

41 [18] trees: A Dynamic Index Structure for Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data , New York, NY, USA, 1984, pp. 47 57. [19] tree spatial join for a shared nothing 1999 Int ernational Symposium on Database Applications in Non , 1999, pp. 423 430. [20] 2009 IEEE International Conference on Cluster Computing and Workshops , 2009, pp. 1 8. [21] trees on the 17th Asia and South Pacific Design Automation Conference , 2012, pp. 353 358. [22] fficient Parallel Spatial Join Processing Method in a Shared Nothing Database [23] to end Spatial Join System over Large Polygonal Datasets on GPGPU Platform, Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems , New York, NY, USA, 2016, p. 18:1 18:10. [24] Using R in Proceedings of the 2Nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data , New York, NY, USA, 2013, pp. 23 31. [25] H. Gupta, B. Chawda, S. Negi, T. A. Faruquie, L. V. Subramaniam, and M. Mohania, way Spatial Joins on Map Proceedings of the 16th International Conference on Extending Database Technology , New York, NY, USA, 2013, pp. 113 124. [26] 2009 Eig hth International Conference on Grid and Cooperative Computing , 2009, pp. 287 292.

PAGE 55

42 [27] of Technology North Bangkok International Journal of Applied Science and Technology. [28] Y. trees: Breadth VLDB , 1997, vol. 97, pp. 25 29. [29] Optimal Graph Colorin J ACM , vol. 23, no. 1, pp. 43 49, Jan. 1976. [30] regular graphs are NP Discrete Math. , vol. 30, no. 3, pp. 289 293, Jan. 1980.