Generating mobile agents for complex system stimulation

Material Information

Generating mobile agents for complex system stimulation
Aldhlaki, Ahmed ( author )
Place of Publication:
Denver, Colo.
University of Colorado Denver
Publication Date:
Physical Description:
1 electronic file (107 pages) : ;


Subjects / Keywords:
Transportation problems (Programming) ( lcsh )
Engineering mathematics ( lcsh )
Engineering mathematics ( fast )
Transportation problems (Programming) ( fast )
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )


In order to simulate the Green Routing Investigation and Design (GRID) algorithm, we propose a Generating Mobile Agent for Complex Systems (GMACS) simulation that helps to generate Agents for Complex Systems (ACS), i.e., cars, trains, and airplanes in a transportation network by using different patterns and principles. In the design of the simulation, we must control the distribution of ACS for a map by using different approaches and parameters. We consider the essential features of providing a model that represents a real transportation network situation. Moreover, ACS do not move from one place to another randomly. Therefore, we consider the people pattern from United States Department Of Transportation (USDOT), the principle of ACS from Santa Fe Institute (SFI) and Brinkoff, BerlinMOD and driver behavior for Uber and Lyft to understand people's behavior. Our framework consists of four components which are the conceptual, the physical, the mathematical, and the computation models. The conceptual model demonstrates the basic concept of distribution of ACS on a map. The physical model represents analysis of the data that we need, i.e., the Open Street Map (OSM). The mathematical model illustrates the method of distribution ACS on a map.
Includes bibliographical references.
System Details:
System requirements: Adobe Reader.
Statement of Responsibility:
by Ahmed Aldhlaki.

Record Information

Source Institution:
University of Colorado Denver
Holding Location:
Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
on10005 ( NOTIS )
1000536393 ( OCLC )
LD1193.E52 2017m A54 ( lcc )


This item has the following downloads:

Full Text
AHMED ALDHLAKI B.S., University of Al-Mustansiriyah, 2008
A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment Of the requirements for the degree of Master of Science Computer Science Program


This thesis for the Master of Science degree by Ahmed Aldhlaki Has been approved for the Computer Science Program
Tom Altaian, Chair Boris Stilman Famoush Banaei-Kashani
Date July 29. 2017

Aldhlaki, Ahmed (M.S., Computer Science)
Generating Mobile Agents for Complex Systems (GMACS)
Thesis directed by Professor Tom Altman
In order to simulate the Green Routing Investigation and Design (GRID) algorithm, we propose a Generating Mobile Agent for Complex Systems (GMACS) simulation that helps to generate Agents for Complex Systems (ACS), i.e., cars, trains, and airplanes in a transportation network by using different patterns and principles. In the design of the simulation, we must control the distribution of ACS for a map by using different approaches and parameters. We consider the essential features of providing a model that represents a real transportation network situation. Moreover, ACS do not move from one place to another randomly. Therefore, we consider the people pattern from United States Department Of Transportation (USDOT), the principle of ACS from Santa Fe Institute (SFI) and Brinkoff, BerlinMOD mid driver behavior for Uber and Lyft to understand peoples behavior. Our framework consists of four components which are the conceptual, the physical, the mathematical, and the computation models. The conceptual model demonstrates the basic concept of distribution of ACS on a map. The physical model represents analysis of the data that we need, i.e., the Open Street Map (OSM). The mathematical model illustrates the method of distribution ACS on a map.
The form and content of this abstract are approved. I recommend its publication.
Approved: Tom Altman

I dedicate this work to my grandmother who passed away last year, my mother, my father, my wife, and my family that have supported me throughout my life mid motivated me to challenge the difficulties of life and become the person who I am.

I would like to say to my sponsor, die Higher Committee of Education Development in Iraq, thank you for giving me this wonderful opportunity to finish my masters degree in the United States. I want to thank the GRID group (Ray, Matt, Dardan, Omar) for their kind support and advice. Special thanks to my advisor Professor Tom Altman for his invaluable advice and guidance through this thesis work. I want to thank Professor Boris Stilman for agreeing to be on my committee. Finally, I want to thank Professor Famoush Banaei-Kashani for his help and advice throughout this project.

1. Introduction..................................................................1
1.2 Complex System............................................................2
1.3 Spatial Dataset and the Agent Behavior...................................3
1.3.1. Spatial dataset......................................................3
1.3.2. Patterns of an Agent.................................................4
1.4 Thesis Outline............................................................6
2. Theoretical background.......................................................8
2.1 Brinkoff..................................................................8
2.1.1 Behavior ofACSC.......................................................9
2.1.2 The starting node of ACSC (moving object)............................ 12
2.1.3 The destination of a moving object.................................. 14
2.2 BerlinMOD............................................................... 15
2.3 MNTG: An Extensible Web-based Traffic Generator......................... 17
2.4 OPORTO: A Realistic Scenario Generator for Moving Objects................21
2.5 CENTRE: Synthetic Generation of Cellular Network Positioning Data........22
2.6 Review Different generator for ACS.......................................24
3. GRID Algorithm Simulation...................................................25
3.1 Modeling and the Simulation (M&S)........................................26
3.2 Physical model...........................................................28
3.2.1 Nodes in OSM.........................................................29
3.2.2 Ways in OSM..........................................................31
3.2.3 Relations in OSM.....................................................32
3.2.4 Tags of OSM..........................................................33
3.3 Conceptual model.........................................................33
3.3.1 Mode of transportation...............................................33
3.3.2 Principle and Patterns of (ACS)......................................34
3.3.3. Repetitive patterns and achieving the goals of ACSC.................35

3.3.4 Surrounding Environment for Agent in complex system................36
3.3.5 Distribution of Train..............................................44
3.3.6 Distribution of Airplanes..........................................46
3.4 Mathematical model.....................................................47
3.5 Computational model....................................................50
4. Implementation.............................................................51
4.1 Open Street Map data...................................................51
4.2 Implementation of different simulation approaches......................53
4.2.1 Scenario 1: Simple random selection................................53
4.2.2. Scenario 2: Simple random selection with a distance measure........54
4.2.3 Scenario 3: Using K-Means algorithm to define regions..............55
4.2.4 Scenario 4: Using K-Means to generate source to destination locations ... 57
4.2.5 Scenario 5: Defining downtown and the around area..................58
4.3 Application developed..................................................59
4.4 Graphical User Interface for GMACS.....................................60
5. Experiments................................................................66
6. Conclusions and Future Research............................................71
6.1 Conclusions............................................................71
6.2 Future work............................................................72
Appendix A:...................................................................78
Appendix B....................................................................82
Appendix C....................................................................89

Figure 2.1: Behavior of moving object [4].....................................9
Figure 2.2: Networks of motorways in Northwest Germany [4].................. 10
Figure 2.3: Demonstration and an example of the data-space oriented approach [4]. 13
Figure 2.4: Example for RB and NB [4]........................................14
Figure 2.5: Typical Distribution of Single Trip Duration [5]................ 17
Figure 2.6: Minnesota Traffic Generator (MNTG) website for selecting Denver [6] 19
Figure 2.7: MNTG System Overview [6].........................................20
Figure 2.7: The OPORTO visualization tools [12]..............................22
Figure 2.8: General Architecture for CENTRE [13].............................23
Figure 3.1: Outline information..............................................27
Figure 3.2: A taxonomy of simulation [21]....................................27
Figure 3.3: New taxonomy of simulation represents merging Figure 3.1 with Figure 3.2.... 28
Figure 3.4: The fundamental element in Open Street Map [22]..................29
Figure 3.5: The playground in the Pulaski Park in Denver [40]................30
Figure 3.6: Node reference for Way [40]......................................32
Figure 3.8: Annual Person-Trips by Purpose and Day of Week 2009 [24].........34
Figure 3.9: Average Annual Person Trip, Person Miles, and Trip Length per
Household by purpose: 2009 [24].............................................36
Figure 3.10: Mode Replacement [25]...........................................37
Figure 3.11: Travel attributes [25]..........................................38
Figure 3.12: Origin Destination [25].......................................38
3.13: Quest promotion from Uber,

Figure 3.14: Select area to drive from Uber App
3.15: Uber Boost mid Prime time..................................................41
3.16: Prime time and opt for Lyft................................................42
Figure 3.17 Calculation of the time for train in station B.......................46
Figure 3.18 Calculation of the time for train in station C.......................46
Figure 4.1 Database diagram to store the information............................52
Figure 4.2 Scenario 1: Simple random selection..................................54
Figure 4.3 Scenario 2: Simple random selection with a distance measure..........55
Figure 4.4 Scenario 3: Using K-Me mis algorithm to define regions...............57
Figure 4.5 Inter source-destination location....................................58
Figure 4.6 Scenario 5: Defining downtown mid the around area....................59
Figure 4.7 Graphical User Interface for GMACS...................................61
Figure 4.8 Source-Destination Locations.........................................62
Figure 4.9 Home-to-work scenario................................................64
Figure 5.3 Execution time of generating various number of moving agents using the
"random with distance" scenario..................................................67
Figure 5.4 Execution time of centroid generation for various number of centroids and
epsilon values...................................................................68
Figure 5.5 Execution time when using mixed moving object source-destination location generation..............................................................70

Generating Mobile Agent for Complex System
Agent for Complex system in general such as Car, Train, Airplane, moving objecf
Agent for Complex System such as Car Agent for Complex System such as Train Agent for Complex System such as Airplane Green Routing Investigation and Design Santa Fe Institute Open Street Map
Topologically Integrated Geographic Encoding mid Referencing
An Extensible Web-based Traffic Generator
Synthetic Generation of Cellular Network Positioning Data
A Realistic Scenario Generator for Moving Objects
Generate Spatio Temporal Data
Generator of Time Evolving Regional Data
Simulation of Urban Mobility
Modeling mid Simulation
The Network-Based approach
The Region-Based approach
United States Department of Transportation
Automatic Train Control information
Private Aviation
Standard Instrument Departure Standard Terminal Arrival Route

1. Introduction
Every day, global warming mid climate change have taken over the news all around the world. CO2 emissions by many human activities is aggregating the change in climate. We are in a situation that motivates us to find new ways of decreasing these emissions even on a small scale. The main factors of global warming are generation of electricity, transportation, and industry. Transportation represents a variety of sources such as vehicles on highways, air travel, marine transportation, and rail. For the transportation system, the U.S. possesses 253 million transportation vehicles on the road [2] which produce CO2 emissions in the transportation system. We will focus on the transportation system because it represents the second largest source of CO2 emission. It accounts for about 31% of total U.S. CO2 emission in the 2014 [1].
In the world of transportation systems, there are many routing algorithms which
offer methods to get the best route in short time such as Dijkstra [36] and A* [37]. Each of
these algorithms provides different methodology to find the shortest path for a user.
Shortest path and shortest arrival time are two aspects also considered by many routing
applications like Google Maps. Shortest paths in an application, which is user-centric,
usually helps in traffic jams mid gas emissions. Imagine if trucks and cars with big engines
were told to go through a city rather than via an outer road to go from point A to point B.
We would have more traffic jams and more gas emissions. An ideal application would be
system-centric, which means the distribution of the drivers on the road would be such that
the transportation network will not face traffic jams, electric vehicles and vehicles with

small engines can go through the city to go to their destination, and big engine vehicles can choose other routes (routes that do not go across the city) so that the eco-friendliness is maintained. The idea of implementing a smart routing application that considers eco-friendliness to distribute drivers on a transportation network motivated us to start a project called Green Routing Investigation and Design (GRID). The final GRID project will require a list of agents (drivers source and destination locations defined by latitude mid longitude) generated in as realistic a way as possible to test the application. In an Agent Complex System (ACS), agents do not change their location in a completely random fashion, but they are affected by many factors around them. The behavior of ACS is guided by some specific conditions such as going to work on weekdays, going to amusement places during the weekend, going to school on a rainy day, etc. Thus, we need to employ various scenarios in generating agents on the road such that we reflect the real-world transportation system. Before doing this, we should understand what a complex system is mid how the environment affects the behavior of the generated agents.
1.2 Complex System
We would like to start with the group of complex systems [3], discussed at Santa Fe Institute (SFI) where researchers apply the following definition: "Complexity refers to the condition of the universe which is integrated mid yet too rich and varied for us to understand in simple common ways. We can understand many parts of the universe in these ways, but the larger more intricately related phenomena can only be understood by principles and patterns.". This concept is substantial for us because the environment where we live can be described as a complex system. In this complex system, there exist many

basic interactions of agents, i.e., cars in the transportation network, people in a marketplace, a pack of wolves, the stock market, etc. Let us consider the transportation system as having fundamental parts called agents or active objects such as cars, airplanes, and trains. We will name them as Agents of Complex Systems (ACS). If we want to generate ACS, we must discuss common primary principles of a complex system.
An agent is a basic concept in a complex system that represents autonomous entities which can interact in order to handle their particular task. One important concept of agents is that they must be adaptive. These agents must reflect real life situations. When different phenomena occur in the environment, these agents must adapt to these changes. Usually in a complex system, we can identify patterns that reflect interaction between agents. By understanding these three concepts of ACS, we will have enough knowledge about the behavior of an agent in a complex system to determine its source mid destination path.
1.3 Spatial Dataset and the Agent Behavior
We discussed in Section 1.2 the principles of an Agent for Complex System. Also, we will review in Section 2.1.1 the nine statements from the Brinkoff [4] method to better understand the ideas and principles of a complex system. We must know how we can extract patterns of agents in ACS. Furthermore, we should get a suitable spatial dataset from a map.
1.3.1. Spatial dataset
Agent for Complex System such as Car (ACSC) will be moving on a road network. The road network data represents a spatial dataset. We can get a spatial dataset from a map

such as Open Street Map (OSM) or Topologically Integrated Geographic Encoding mid Referencing (TIGER). There is a free editable map of the world, (OSM). The OSM illustrates physical features on the ground. Moreover, we can download an XML file that has the information about physical position on the ground. This XML file has three main elements:
Node it demonstrates a point on the earths surface. This point has a pair of coordinates (latitude and longitude), mid an id number.
Wav it represents rivers, roads, buildings, or forest. The way consists of group of nodes which is ordered.
Relation occurs between two or more data elements such as nodes, way.
We will discuss more details of the elements and their tags in Chapter 3.
1.3.2. Patterns of an Agent
In general, we consider agents such as cars, trains, and airplanes. In an Agent for Complex System we need to know about the patterns of ACS in order to understand the system. These patterns happen due to different situations. For example, people go to work from Monday to Friday. This causes most of the traffic jams to happen from 8:00 am to 9:00 am and 4:00 pm to 5:00 pm. We call this the goingto-work situation, a phenomenon that affects the agent distribution on a transportation network. Other elements, like demography and economy define different patterns. For example, it is most likely that more trips start from a big neighborhood rather than a small one in a city, and if an area is known as a poor area, there will be few trips starting from that area. Moreover, we can have find

patterns which are dependent on the trips purpose, household characteristics, and transportation mode [24].
1.4 Motivation
In order to simulate a GRID algorithm, we must know the essential features of the provided model representative of real-world network situation with an extensible workload to ensure the model is useful. Real world ACS which are cars, trucks, trains mid airplanes do not change their location in a completely random fashion, but they are affected by the surrounding environment. There would be some goals that guide their behavior. Going from home to work is a goal that affects ACS. Therefore, generating the population of ACS in a completely random way will not reflect the principle of representativeness of a real-world transportation system. In order to make GRID reflect a real-life transportation network, we have to find out ways of generating ACS-cars, trains and airplanes in the transportation network with the defined source to destination points.

1.5 Thesis Outline
The thesis is arranged as follows.
Chapter 2 demonstrates the literature review in which we present the theoretical background about ACS of cars, ships, and mobile devices. We also briefly describe Brinkoff, BerlinMOD, and An Extensible Web-based Traffic Generator (MNTG) for which known simulators exist for ACS. In addition, we review ACS of fishing ships such as A Realistic Scenario Generator for Moving Object (OPORTO). Finally, we present Synthetic Generation of Cellular Network Positioning Data (CENTRE) of mobile devices and some different other generators of ACS.
Chapter 3 presents the modeling mid simulation of an ACS. The modeling and simulation (M&S) consist of the physical model, conceptual model, mathematical model and computational model. Each one of these explains different concepts of GMACS simulation. The physical model illustrates elements and tags of OSM. Furthermore, the conceptual model demonstrates concepts and patterns of ACS. Finally, the mathematical model and computational model show math calculation and what we need to generate the GMACS.
Chapter 4 considers the implementation. We start to explain how we parsed OSM and which data are useful to parse. Also, we provide a database diagram to store the information. We implement different simulation approaches. We start with Scenario 1 which does not represent a real-life situation because we do not consider the distance between source and destination of ACSC. Scenario 5 is the most realistic simulation, which

can determine the busy area by selecting downtown to generate source or destination of ACSC. Finally, we show application development mid generation of an agent.
Chapter 5 presents the simulation experiments. GMACS simulation provides a specific format for the GRID algorithm. Moreover, the simulation results of experiments are discussed mid presented by using different distances mid numbers of ACS.
Chapter 6 illustrates the conclusion of the work which is presented in this thesis and future work.

2. Theoretical background
Several types of generators are studied. The first generator of ACS1 is a car. Brinkoff, BerlinMOD, and MNTG [4-6] consider Agent of Complex System with a Car (ACSC) by combing the network base with user defined data in order to reflect a real-life situation. The second generator is OPORTO [7]. OPORTO generates data for fishing ships instead of cars. The OPORTO data generator tries to avoid storm areas and the fact that fishing ships will go toward attractive shoals for fish. Finally, CENTRE [8] is a data generator of mobile device location data. This type of data generator is useful for traffic flow and helps people travel efficiently. This chapter will also demonstrate a summary introduction to create a suitable dataset of different generators of moving object, e.g., see [14-20].
2.1 Brinkoff
Thomas Brinkoff [4] demonstrates that the spatiotemporal database system should have the definition of a suitable dataset to simulate the typical behavior of ACSC. Moreover, a given network is followed by ACSC. The author points out that previous approaches, which generate spatiotemporal data do not assess network-based data of ACSC. Brinkoff uses a network-based generator for ACSC that combines real data with user-defined parameters to get the resulting dataset. Brinkoff s generator has a feature which visualizes the data and the road net. Furthermore, the research paper discusses and presents paramount properties of network-based ACSC, the maximum speed mid the
1 ACS refers to a moving object in these research papers i.e., Brinkoff, BerlinMOD, etc...

maximum capacity of connection, the impact for other AC SCs on the route and the speed of an object, the appropriate determination for the source and destination of an agent; time-scheduled traffic; and the influence of external events. We will review the behavior of ACSC, and determine the starting and destination nodes.
2.1.1 Behavior of ACSC
Brinkhoff [4] demonstrates the behavior of ACSC in nine statements which consider vehicles and their motion. Their motion and behavior is affected by numerous factors which are: the network, the traffic, external impacts, the moving object, and time-scheduled traffic as shown in Figure 2.1 [4], Each of these factors of the behavior for ACSC follows some parameters or statements as shown below in Figure 2.1 [4]
Statement 5: The path of moving objects may change if the speed on a connection is modified.
Statement 6: The number of moving objects is a time-dependent function.
Statement 1: Moving real-world objects very often follow a network.
Statement 2: Most moving objects use a fast path to their destination.
Statement 4: The number of moving objects will influence the speed of the objects if a threshold is exceeded.
function, which is independent of the network and of the objects moving on
the networks____________________________
Statement 8: Moving objects belong to a class. This class restricts the maximum speed of the object.
Statement 9: Time-scheduled traffic may influence the speed and the paths of moving objects.
Statement 3: Networks consist of classified connections, which have impact on the speed of spatiotemporal objects.
^The traffic
Figure 2.1: Behavior of moving object [4]

First, the network factor has three statements. Statement One mentions that ACSCs usually are guided by network. This network may represent road traffic, railway traffic, air traffic, or animal migration as shown in Figure 2.2 [4], Statement Two refers to that fact that ACSC tries to find the shortest path to its destination so they do not follow an arbitrary path in the network. This is not the chaotic behavior of real-world objects [7], Statement Three considers classified connections in the networks. Brinkoff found different connections in the network such as motorways, federal roads, secondary roads, side road and so on. These different connections of networks influence the motion of objects.
Figure 2.2: Networks of motorways in Northwest Germany [4]
The traffic factor illustrates two statements. Statement Four considers that the speed for ACSC is also influenced by another ACSC, i.e., traffic jam. If several vehicles use a street exceeding the capacity of that street, the situation leads to a decrease in the maximum

and average speeds of the vehicles on the street. Statement 5 is related to Statement Four. If the maximum speed of a street decreases, ACSC tries to find another path which is faster than the original path.
The external impacts are included in Statements Six mid Seven. Statement Six says that "the number of moving objects is a time dependent function". The time is important if we want to consider weekday or weekend traffic. On weekdays, we have rush hour at specific times. On the other hand, the times of rush hours are different on weekends. Moreover, the author says in Statement Seven, "the speed for ACSC is influenced by a spatiotemporal function, which is independent of the network and for ACSC on the network". The author refers to weather conditions such as rain, snow, sun, etc.
The ACSC factor for Statement Eight explains the statement, "moving objects belong to a class. This class restricts the maximum speed of the object". For instance, a small car has a different speed from a truck on the same street. Finally, Statement Nine addresses time-scheduled traffic by stating that "time-scheduled traffic may influence the speed and the paths of moving objects". For example, the coordinated takeoffs and landings of airplanes at an airport are time scheduled traffic.
These nine statements consider the behavior for ACSC as shown in Figure 2.1, but they do not present all possible behaviors for ACSC. These nine statements illustrate vivid descriptions about the behaviors of agents in a complex system that can help us to understand the behavior for ACSC.

2.1.2 The starting node of ACSC (moving object)
It is paramount to locate the starting position of an agent before an agent begins to move. The starting point represents a node in the network. We will call this point the starting node. The author illustrates three different methods to compute the start nodes as shown. The data-space oriented approach (DSO)
The DSOs idea is to calculate a position (x, y) by using a two-dimensional
distribution function [9] to determine the nearest node of the network which is the starting node for ACSC, as shown in Figure 2.3. A [4]. In the DSO approach, the author assumes a uniform distribution. The approach is classified as sparse or dense network. The sparse network generates more starting points from nodes. When we have a dense network, we have a concentration of starting nodes in an empty region on the map as shown in Figure 2.3.B [4],

using th nearest node as starting point
Figure 2.3.A: Starting point
Figure 2.3.B: Concentration of Starting nodes on empty network
Figure 2.3: Demonstration and an example of the data-space oriented approach [4] The region-based approach (RB)
The RB approach optimizes the data-space oriented approach by representing a value of the probability for each region (or more specific cell) of a network in order to determine the starting nodes on the map. The suitable measure for this probability may be represented by the density of population in each region. The RB uses the idea of the DSO approach to compute a position (x, y). The RB uses the concepts, dense and sparse, from the DSO approach. For instance, a downtown area, indicted by the circle in Figure 2.4.A [4], has a nine-time higher probability to get a starting point than in the neighboring region as shown in Figure 2.4 [4]. The RB approach has a feature which increases the probability of a starting point of the region while the corresponding region probability decreases.

Figure 2.4. A: The region-based approach Figure 2.4. B: The network base approach
Figure 2.4: Example for RB and NB [4] The network-based approach (NB)
This approach selects a starting node and another node that has the same probability. Therefore, the starting nodes of distribution reflect the density of the network as shown in Figure 2.4.B.
2.1.3 The destination of a moving object
We will get a unsatisfactory result if we choose the destination node to be the same as the starting point. We should consider the most important features to determine the destination node as explained below:
Length of the route: If we compute the destination without considering the starting node, we will have uncontrolled distribution for the length of the route.

Starting time: This is another factor which is determined by the user.

Therefore, it is important to compute the destination node by using another function.
2.2 BerlinMOD
Brinkoff is not flexible enough to monitor trajectories which are long-term histories for AC SC. For instance, Brinkoff generates dataset of the trip-based approach. The trip-based approach means ACSC is generated at a source node and takes a route to the destination. When the ACSC reaches the destination, it will disappear. This type of approach is not suitable for researchers. Researchers prefer to observe trajectories. BerlinMOD [5] overcomes this problem in Brinkoff s generator. Additionally, there are some features and advantages to using BerlinMOD as shown:
User-interface that views dataset on road network.
Generates dataset for any network, if it has a specific format.
Generates the trip-based approaches, and the trajectory- based approach.
Has more parameters than Brinkoff.
BerlinMOD uses SECONDO DBMS to simulate the scenario of ACSC for a random period of time on the street network for Berlin. The generator illustrates how to get most natural movement patterns and generates a noisy dataset. The BerlinMOD models labor trips and other trips on an evening or a weekend. The author assumes a person works on weekdays mid travels between the Home Node and the Work Node. The person leaves Home Node at 8:00 am to go to Work Node, and return at 4:00 pm to the Home Node. The time of a trip on Berlins street network is less than two hours as shown in Figure 2.5 [5],

so the person will stay at home until 8:00 pm, and the spare time of 4:00 hours stmts. On the weekend, the person only has two 5-hour blocks of additional time per day, for the first one, it stmts at 9:00 am, and at 7:00 pm for the second. The person will have a probability of 0.4 for mi additional trip. Therefore, this generator uses Home Node, Work Node, and Neighborhood to model a labor trip. Each Home Node represents a vehicle that the resident has there. It simulates the behavior of ACSC by choosing two methods as shown in the previous section for the Brinkoff method. The first one is the network-based approach which uses a uniform distribution to choose nodes from a set of nodes. The region-based approach is the second approach for Brinkoff. This approach depends on weight. It divides the map into multi-regions. Each region uses a probabilistic function to choose a node by utilizing a uniform distribution. Every object has a Work Node, which refers to a working place for the vehicle owner. The Work Node chooses randomly by using the network-based approach or region-based approach through all Berlins nodes.

Distribution of Trip Durations
Figure 2.5: Typical Distribution of Single Trip Duration [5]
2.3 MNTG: An Extensible Web-based Traffic Generator
In this authors point of view, Brinkoff and BerlinMOD together are the best for a spatiotemporal data generator, but they lack some features. The authors mention two important drawbacks of Brinkoff and BerlinMOD in the paper discussing MNTG [6],
First is the time. Sometimes, the user must exert an effort to configure and install the generator. For instance, BerlinMOD requires the user to complete the following steps to run the generator: for the installation of SECONDO [10] ACSC database, the user should have knowledge about the script commands that are used to install SECONDO, and the

user needs to understand a comprehensive list of configuration parameters for each traffic
Second, it is difficult to choose arbitrary spatial regions to generate datasets of traffic by using the Brinkoff and BerlinMOD generator. Brinkoff uses the city of Oldenburg to generate the dataset. On other hand, BerlinMOD utilizes the city of Berlin. Therefore, it is tedious for the user to generate datasets of different cities. For instance, the user has to understand the elements of the OpenStreetMap [11] to get the road network information of the city in which the user is interested. Moreover, the user needs to write a program to extract and obtain the road network of the correct city from Open Street Map. The next step for the user is to modify the obtained format of the road network of a city to match the required circumstances of Brinkoff or BerlinMOD.
The Minnesota Traffic Generator (MNTG) is an extensible web-based road network. MNTG resolves the issues of Brinkoff and BerlinMOD, so it is not a new traffic generator, but an enhancement of these tools with useful features. The first feature is that it is easy to use. MNTG has a web service with map interface. In the back end, the MNTG handles the configuring and running of the traffic generators, which are Brinkoff and BerlinMOD. Therefore, the user can save time by skipping the installation or configuration of them on their local machine. The Brinkoff and BerlinMOD generator requires configuring and installs different things as mentioned previously. Choosing a spatial region randomly is not a trivial task when using Brinkoff mid BerlinMOD, but MNTG has the ability to choose any arbitrary city area in the world. The user can use the Minnesota Web-

based Traffic Generator website to navigate the map and choose the city by drawing a
rectangular area as shown in Figure 2.6 [6].
O Cl ii
Figure 2.6: Minnesota Traffic Generator (MNTG) website for selecting Denver [6]
MNTG will extract the network information for the selected area, after the user submits the request to generate the traffic by using one of the traffic models as shown in Figure 2.6 [4]. Finally, MNTG possesses its own server machine. Therefore, computing resources and processing time is handled by MNTG in three steps. First, it gets a traffic request from the user. Next, a multi-core and multi-threaded program will process the request internally. Finally, it will notify the user by email when the requested data is ready.

MNTG has three fundamental components. First, the Road Network Converter is responsible for extracting the information of an area from OpenStreetMap or Tiger in order to match the format of MNTG. Second, the Traffic Processor, receives traffic generation requests scheduled and executed by Traffic Processor. The Road Network Converter and Traffic Processor are both backend (of the code) as shown in Figure 2.7 [6], Finally, the system frontend consists of a web interface to submit the request of traffic generation from the user, an email notifier to inform the user the work is done, and a collection of tools to visualize and download the traffic data for user as shown in Figure 2.7 [6],
Traffic generation request
Download/visuaHze traffic results
Traffic Data Users
E c o 3 w c o c/>
I Web Interface i Email Notifier Download/ Visualization Tools
Status Notification
-o Road Network
s I r) Converter
(0 ^ Traffic Requests
Traffic Processor
| New Generator J
Traffic Results
Road Network Data Sources
0 0 0
New Data Source OpenStreetMaps US Tiger Files t
Traffic Generator Developers
Figure 2.7: MNTG System Overview [6]

2.4 OPORTO: A Realistic Scenario Generator for Moving Objects
OPORTO [12] considers a scenario with fishing ships moving to attractive shoals for fish instead of vehicles and traffic. The generator tries to avoid storm areas, mid will send fishing ships to attractive shoals for fish, which are also plankton areas. The historical observation in the process of the location is taken into account by the OPORTO generator. The ships behaviors rely on awareness of the areas evolution of interest for ships such as the areas extension of attraction, mid the location ships repulsion. The plankton and storm zones are divided into areas with a constant center with changing shape. Both ships and shoals of fish are modeled as moving areas. Therefore, the spatiotemporal data generates in the form of two-dimensional movement. The programs output represents observation of a massive number of objects through long periods of time (trajectories). The OPORTO generator has a visualization tool for a real data set as shown in Figure 2.7 [12].

Figure 2.7: The OFORTO visualization tools [12]
2.5 CENTRE: Synthetic Generation of Cellular Network Positioning Data
CENTRE [13] generates data of mobile device location. For privacy concerns, telecom companies do not make this data available. On the other hand, this type of rich data is able to help us in many applications, such as traffic flow information, by using a data mining algorithm. Use of these data also helps people travel efficiently, aids decision making related to sustainable mobility and security management of public administrations and assists the mobile user to optimize bandwidth and power allocation in the network.

CENTRE represents experimental and validation data mining algorithms on mobile telephone data. It consists of fundamental components as shown in Figure 2.8 [13],
Real World
/ _l \
{Parameter 1
S*. /
Trajectories Reconstruction
Trajectory Generator
LOG Generator LOGs
LOG Production
Figure 2.8: General Architecture for CENTRE [13]
This system generates data in a two-step process. First, the synthetic trajectories are built according to a combination of the user-defined statistical distribution and driving movement behavior. This first step is completed by using an existing generator GSTD [14] (This generator will be discussed in the next section). The next step is that the synthetic trajectories transform into LOGs Generator, the detection action performed by the antennas network. The log generation creates a loss of information, if we compare it to the original position for points. Finally, the generator transforms the suitable format data for analysis by the data mining algorithm.

2.6 Review Different generator for ACS
The GSTD [ 14] generator assumes ACS are able to move in two-dimensional space. The GSTD generator is able to generate data for rectangular regions, mid gives permission to the users in order to control the lifetime for each ACS. The authors improved the GSTD by designing the G-TERD.
The G-TERD [15] illustrates new advantages for the traffic generator. The new advantages allow users to generate arbitrary objects with the setting of parameters that can control rotation through time, color, and object speed. Micro simulators [17-18] and SUMO [16] rely on short-term observations. These generators represent human behavior that is observed for short, discrete periods of time. On the other hand, ST-ACTS [19] depends on long term observations. This generator demonstrates human behavior for several consecutive days. Moreover, the research paper MWGen [20] considers multienvironments for an ACS, for instance, indoors and outdoors implies that the ACS is simultaneously indoors and outdoors generates a scenario such as a person moving from their home building, who then walks to the car, mid drives the car to a parking lot near to their work, walks inside the building to go to their office, and reverses this path to go home after work.

3. GRID Algorithm Generator
Simulation of the GRID algorithm is important to provide a model that represents a real-world scenario with an extensible workload to ensure the model is useful. Simulation of ACSC (i.e., cars in our case) is not an easy task. We must be careful in generating source to destination locations of ACSC, so the distribution of ACSC emulates reality as close as possible.
Suppose that we have a map that we need to populate with ACSC. One naive
solution is to randomly pick a node as the source node and another one as the destination
node, and generate a source-destination location that ACSC will take. This solution does
not provide a good simulation of reality. Consider the case when the randomly selected
source node represents a home, and the destination node represents a coffee shop very close
to the source node. Usually in reality, no one would drive such short distances. Providing
a distance measure, and selecting a destination node with distance d from the source node,
such that d > dr where dr is the minimum allowed distance for a node to be selected, can
solve the above-mentioned problem. Other approaches can be used to generate source-
destination locations for all ACSC. These approaches employ the k-means algorithm
[38-39], which helps to identify the areas where most people live. Normally, where most
of the people are living, there is more likelihood for a source or destination node to be
selected. We can leverage the use of k-means in many other ways. One of the other ways
is to define a downtown area and the surrounding areas. Then we can randomly pick points
outside downtown that go to downtown. This is a good simulation of reality, since many
offices and working places are downtown, and many living places are located outside

downtown. Moreover, we will demonstrate how we can generate Agent of Complex System such as Train (ACST) and Agent of Complex System such as Airplane (ACSA).
3.1 Modeling and the Simulation (M&S)
Dr. M. Williams [21] defined that "M&S is the use of models, including simulation, to develop and analyze data as a basis for decision making system". Moreover, the group of complex systems at the Santa Fe Institute (SFI) says that [3] "Complexity refers to the condition of the universe which is integrated and yet too rich and varied for us to understand in simple common ... ways. We can understand many parts of the universe in these ways, but the larger more intricately related phenomena can only be understood by principles and patterns not in details.". Furthermore, in the research paper "Oporto" [12] the authors mention that "Real-world objects do not change their location, or their shape in a completely random fashion. Instead, the surrounding environment affects them, and their behavior is guided by some goals". Therefore, a generator producing completely random data will not fulfil the principle of representativeness. According to Dr. Williamss definition, the group of complex systems at the Santa Fe Institute (SFI), and the research paper Oporto, we can get all of this information to build the simulation by using the outline as shown in the Figure 3.1. We will explain each step in the subsequent section.

Figure 3.1: Outline information
Dr. Williams [14] demonstrates a taxonomy of simulation by supplying concepts, data, and equation as shown in Figure 3. 2 [21],
Figure 3.2: A taxonomy of simulation [21]
By merging Figure 3.1 with Figure 3.2, we will get a new Figure, which is Figure 3.3 as shown.

Figure 3.3: New taxonomy of simulation represents merging Figure 3.1 with Figure 3.2
After we decide on the final model of simulation, we will start to explain how we build die simulation according to Figure 3.3.
3.2 Physical model
The physical model recaps the physical representation for the system. We analyze the systems physical aspect such as tags are in OSM file (XML) to know which information have to parse from die XML file. We can download the XML file of any city from the OpenStreetMap website. The OpenStreetMap XML file contains diree fundamental elements [22], which represent the conceptual data and die relation model of the physical world as shown in Figure 3.4 [22],

timestajr?>="2016-04-07T02 : 58 : 33Z"
user*"Your Village Maps

3 user*"Your Village Maps" uid*"227972"*


Avenue Limited"/>

Figure 3.4: The fundamental element in Open Street Map [22]
3.2.1 Nodes in OSM
The nodes represent a point on the map that is defined by latitude and longitude. We have two types of nodes in the XML file. The first type of node just contains attributes without tags. These attributes demonstrate valuable information about the node and the user who modified or added this node such as the nodes tD number, which is a unique number, longitude, latitude, user ID number, the person who modified or added this node,

the time stamp, etc. The second type of node has attributes with tags. Each tag has two attributes K, which represents key, and V, which represents value. The tags K and V will illustrate extra information about the node. For instance, K=" lei sure", andV="playground" as shown in Figure 3.5 [40], whi ch represents the playground in Pulaski Park in Denver.
Figure 3.5: The playground in the Pulaski Parkin Denver [40]

3.2.2 Ways in OSM
The way can represent rivers, roads, and boundaries of areas, such as buildings or forests. The way consists of attributes, tags, and node references. Way attributes contain id, timestamp, and user id, but they do not have the longitude and latitude. The way elements contain node references (nd ref). The node reference refers to the nodes ID number that has the longitude and latitude as shown in Figure 3.6 number 1 and 2". If you put this longitude and latitude in google maps, you will see the building as shown in Figure 3.6 "number 3 and 4". The ways tag is more enhanced than a nodes tag in order to give us more information about each node reference.

Figure 3.6: Node reference for Way [40]
3.2.3 Relations in OSM
Wikipedia [23] states that "a relation is a multi-purpose data structure that documents a relationship between two or more data elements (nodes, ways, and/or other relation) 11. Therefore, the relation is able to explain various meanings which are defined by its tags. The relation represents the relations members which is an ordered list of ways, nodes, or other relations as shown in Figure 3.4.

3.2.4 Tags of OSM
We need to have knowledge about the tags in an OSM file in order to parse the data we need to generate the ACS. An XML file for downtown Denver was downloaded in order to know the types of keys and values inside each tag. It was discovered that the XML file of Denver has a lot of different keys as shown in Table 3.1 in appendix A. The XML file of Pueblo, Colorado was also downloaded. Pueblo has new keys mid values in the tags that you cannot find in Denver. A code was built to parse the K (keys) and V (values) of the XML files of different cities. We found new keys and values in the tags that you cannot find in Denver mid Pueblo. Thus, we parsed all keys and values and stored them in a MySQL table in order to retrieve them when the user needs to generate the ACS. The user cmi choose any key mid value from the table to generate the dataset as shown in Table
3.1 in Appendix A.
3.3 Conceptual model
According to the simulation of GMACS, the conceptual model considers principle and pattern, which has some important goals, and the surrounding environments of ACS. We should analyze and understand each one of these components of the conceptual model in order to reflect a real-life scenario. Thus, we will demonstrate mode of transportation.
3.3.1 Mode of transportation
There are different transportation types, i.e., bike, car, airplane, train, light rail etc. USDOT demonstrates annual person trip by purpose and Day of Week as shown in Figure 3.8. The mode consists of personal vehicle, transit, walk, and other. On the other hand, the

purpose presents family/personal errands, social/recreational, to/from work, school/church, work-related business, and other as shown in Figure 3.8 [24], Thus, we will consider ACSC, ACST, ACS A. Firstly, we will start with the principle and pattern of ACS.
60.000 r 50,000
£ 40,000
£ 30,000
Z 20,000 10,000 0
Trip purpose
I Family/
personal errands
To/from work School/church Other
Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Figure 3.8: Annual Person-Trips by Purpose and Day of Week 2009 [24]
3.3.2 Principle and Patterns of (ACS)
In the simulations of GMACS, ACS represents cars, airplanes, or trains, but we have to know the general principle and pattern of people-who use the cars, airplanes, and trains to travel from one place to another-in order to generate car, airplane, or train. There are some principles that we need to consider about the people before we generate the ACS. ACS do not change their behavior randomly. They have repetitive patterns or some goals to achieve. We will discuss them in Section 3.3.2. Surrounding environments of ACS are

affected by new technologies such as Uber and Lyft. We will explain more about Uber and Lyft in the next section.
3.3.3. Repetitive patterns and achieving the goals of ACSC
Brinkoff [4] illustrates the behavior of ACS in nine statements. These nine statements present the general ideas of some purposes or goals. Therefore, we should find research about common purposes or goals of ACS to know what we need to parse from the XML file to generate the source and destination of ACS. According to the United States Department of Transportation (USDOT) [24], Passenger Travel Facts and Figures for 2016 illustrates average annual person trip, which is ACS, by purpose and Day of Week. USDOT demonstrates the common purposes or goals for ACS to travel such as personal errands, family, recreational social activities, and work-to-work related purposes. Figure 3.9 [24] shows the Average Annual Person Trip and Trip Length per Household by Purpose. USDOT refers to the behavior of ACS on weekdays and weekends. On weekdays, the average number of trips is greater than on weekends. This fact is presented in Figure 3.9 [24]. The busiest day is Friday, because ACS make extra social and recreational trips along with work and school trips. On Sundays, numbers of work trips decrease, but there are additional church, social, and recreational trips. On Sunday, the least travel occurred. These common purposes or goals of ACS gives knowledge about what we have to parse from OpenStreetMap(OSM). These goals are the most common of ACS, in general.

3,466 Person-trips per household
33,004 Person-miles traveled per household
Shopping / \
(4.620/14.0%/ Family/ \ , / personal
\ / errands
N. / (5.134/15.6%)
Social and recreational (9.989/30.3%)
r Other
Social and
Average trip length in miles
Social and recreational All trip purposes Family/posuMl a iamb Shopping Schooi/church
50 60
Figure 3.9: Average Annual Person Trip, Person Miles, and Trip Length per
Household by purpose: 2009 [24]
3.3.4 Surrounding Environment for Agent in complex system.
Surrounding environments of ACSC are impacted by emerging technologies recent example of which are Uber and Lyft. Alejandro Henio [25] demonstrates that Uber completed its first billion rides in six years, but Uber finished the next billion in six months. Lyft has a rate of rides in the US of seventeen million rides per month, but Alejandro Henio reveals that 'Uber and Lyft add Traffic, Reduce Efficiency on Denver and Boulder

Roads" [25], Uber and Lyft companies do not provide trip data for cities. Therefore, Henio suggest that the best way to collect data is to be a driver for both companies in order to understand how people are riding Uber and Lyft and what modes of transportation they were changing. He did a survey with 311 riders while he was working with Uber and Lyft. In his survey, Henio asked the rider this question "Q5. For this trip, how would you have traveled if Lyft/Uber was not an option" [25], 34 percent of rider says in the survey "they would have either taken transit, hiked, or walked instead of using the car service" [25] as shown in Figure 3.10 [25],
Q5. For this trip, how would you have traveled if Lyft/Uber wasn't an option?
Carpool (drive)
3 2% Other, 1.6%
Car rental, 4.2%
Get a ride, 4.5%
Other ridesourcing
Carpool (ride)
6.1% ________________________________________
Drive alone 19.0%
Bike or Walk 11.9%
Wouldn't have traveled, 12.2%
Figure 3.10: Mode Replacement [25]
Furthermore, according to his research, mileage is greater because a driver has to drive an average of two miles to the location to pick up the passenger, then drive the rider

an additional these miles. Therefore, the rider accounts for five miles on the car see the
Figure 3.11 [25].
From last Drop-off to End location
Figure 3.11: Travel attributes [25]
Also, we received information about patterns of ACSC from those 311 riders who use Uber and Lyft as shown in Figure 3.12 [25]. Thus, according to this thesis, we should also consider the scenarios when big companies like Uber and Lyft affect the distribution of ACSC in the transportation network. In the next section, we will elaborate the Uber and Lyft effects in detail.
x DESTINATION ORIGIN Home Work School Shopping Errands Going Out/ Social Airport Hotel/ Airbnb Family/ Friend Other Totals
Home 2 36 16 7 34 18 0 4 12 129
Work 21 8 1 1 1 2 6 0 1 41
School 5 0 0 3 0 0 0 2 0 10
S hopping Enauds 11 1 0 3 1 0 0 0 0 16
Going Out/Social 30 1 0 3 10 0 3 3 1 51
Airport 3 0 0 0 0 0 9 0 0 5
HoteLAiibub 0 2 0 0 7 4 0 0 4 17
Family /Erie nd 10 1 0 0 1 1 3 1 2 19
Other 8 3 0 2 2 1 3 1 3 23
Totals 90 52 17 19 56 26 17 11 23 311
Figure 3.12: Origin Destination [25]
38 Uber and Lyft App
Uber tries to control the behavior of ACSC in a transportation network by encouraging its drivers to work specific hours and in specific regions during weekdays or weekends by giving them monetary awards such as Quest, prime time, or boost. For instance, in the morning of a weekday Uber gives the award Quest, which is $25, if the driver finishes 5 trips in Denver between 6:00 am to 9:00 am as shown in Figure 3.13.A.
Drive 5 trips, make $25 extra
Mon, Apr 17, 6 AM 9 AM
Trips completed:
5 trips
0 trips
1 trip left (o)
for $50 39/40
40 trips $50 50 trips $90 70 trips $150
20 hours 4 minutes left
Figure 3.13A: Quest promotion from Uber
Figure 3.13B: Quest for one week from
3.13: Quest promotion from Uber
There is another type of Quest that Uber offers to the driver. The time for this Quest is one week. It is in a specific area, in our example Denver. If the driver finishes 40 trips,

a driver will get $50 extra. If the driver can make 50 trips, a driver will get $90 extra. If the driver can continue to make 70 trips, he/she will get $150 extra as shown in Figure 3.13 B.
These trips must be in Denver to count toward the Quest, so the Uber app has a feature to select the area to work. For example, if the promotion is just in Denver county, you can use the button select areas to drive, and choose Denver as shown in Figure 3.14.
Figure 3.14: Select area to drive from Uber App
On weekday nights, Uber gives the driver Boost, which is determined by a red line around a specific area as shown in Figure 3.15A This Boost will increase the cost to the rider by double or more. On the weekend, there are prime times, which have a different shape on the map and different prices as shown in Figure 3.15 B. Uber effects a real-life scenario by encouraging these drivers to work in a specific location.

I DETAILS A V O Offline =
Boost: 1.2x 1.8x Fares, Guaranteed
Sat, May 20, 12 AM 1 AM
uberX, uberXL, EATS, POOL, uberListen
LMhH Y tI It
Figure 3.15.A: Uber Boost
Figure 3.15.B: Uber Prime time
3.15: Uber Boost and Prime time
Lvft motivates its drivers by giving them three types of promotions. The first is bonus. Each bonus has specific times and areas. Lyft will give the driver $3 extra, for every ride he/she gets in this specific time and area. The second promotion is opt. The opt is determined by specific time and area as shown in Figure 3.16.A. This promotion will guarantee $20 or more for each hour. For instance, if the guarantee hours of this promotion

are from 6:00 am to 9:00 am, the driver gets $40 during these hours. Lyft will give the
driver $20 per hour for $60 total for these hours guaranteed.
= lyft
=4i tM h
. 1 ; \ :
7 i
. - A ; [ Q ~1 \
Guarantee zone Mon, May 22
5am-9am $24
4pm-6pm $22
Tue, May 23
6am-9am $22
4pm-6pm $22
75-300% MORE
vada Berkley


(265) JgjJ
W fSf


Bow Mar
Englewood ^
Earnings Referrals
Figure 3.16 A: Lyft opt
Figure 3.16 B: Prime time for Lyft
3.16: Prime time and opt for Lyft
Also, Lyft has prime-time which is a multi-rectangular shape on the map, see Figure 3.16.B. These rectangular shapes on the map increase the price of the ride, if the rider requests the driver from this rectangular area.
Uber and Lyft use different motivations to encourage drivers to work on specific days and hours and in specific areas, but both of them have the same promotions, hours,

and area. For instance, Quest in Uber and bonus in Lyft give their driver extra money, if they work specific areas and hours. Boost in Uber mid guarantee hours in Lyft usually refer to peak hours for the same areas and times such as downtown during the weekend. Uber and Lyft have prime time. Uber and Lyft use the promotion to motivate the driver to work in same days, hours, and area.
We found some interesting information about traffic jams in Denver from my work with Uber and Lyft mid my survey with Uber and Lyft drivers. We did a verbal survey with Uber and Lyft drivers at Denver International Airport for Uber and Lyft. There are three types of scenarios that make traffic jams. Firstly, when Uber and Lyft have prime time in specific areas and time for trips, the distance of these trips is short-less than 5 miles. Therefore, we will have traffic jams. For instance, on St. Patricks Day, there are a lot of riders in downtown Denver, so there is prime time. Most drivers for Lyft, Uber and taxi companies work in downtown. Most of these trips are less than 5 miles, so the traffic jam will appear. On other hand, if there are a lot of riders at Denver International Airport, we will not see any traffic jam. The reason for that, is that the distance of trips is longer than 8 miles. Second, the same destination for all drivers makes a traffic jam. If all drivers try to go to the same destination at the same time, the traffic jam will happen. For instance, if there is a game at Coors Field in downtown Denver. Most of the cars try to reach the Coors Field as their destination. Finally, when a game or an event is over in a specific place such as the Pepsi Center, Red Rocks Amphitheater, or the Convention Center, we will see too many riders requesting Uber, Lyft, or taxis. Thus, we will have too many source points in this area.

According to all of the information above, we have to consider the impact of Uber and Lyft because they try to control the behavior of the ACSC. We can apply the behavior of Uber and Lyft drivers in our simulation in three scenarios. First, we can have one cluster that contains downtown information to build source mid destination inside this cluster. This reflects short distance trips that make traffic jams. Secondly, we can determine a cluster that has all sources of ACSC in order to give real-life situation for when an event or game finishes. Finally, the cluster has the destinations of ACSC when riders try to go to a specific place.
3.3.5 Distribution of Train
The distribution of trains is not the same as the distribution of ACSC. The railway network requests numerous detailed status data, i.e., positions for signals, gradients, track lengths, stopping points, and turnout. We need to define the train speed in order to think about other restrictions and speeds. This information forces us to study the train protection system known as Automatic Train Control (ATC) [26], which ensures safe operation for trains. The infrastructure of railways consists of nodes, which represent stations, mid links, links represent connections from one station to another. Train routes inside stations, from entry to exit signals model the paths of trains that can pass and utilize a station. If the information about the train route dependencies is available, it will not set another train route, i.e., when a certain train route is set, it is not allowed to set another train route. Depending on the purpose of GMACS, we will consider these restrictions and parameters to generate Agent in Complex System for Train (ACST) as following.

The train has a timetable. Sometimes, we have to delay the timetable of a train. This causes us to put two parameters for each train, time and position (longitude, latitude). These parameters help us know information about each train in order to avoid generating two trains in the same position and time to simulate a real-life situation.
Generating of AC ST will be in the station "node", because if the route of train A is set, in most cases, train B is not allowed to use the route of train A. Therefore, we will generate more than one train in a station.
If we want to generate two trains, one of them in station A, which is "Train A", and another one in station B which is "Train B", we will increase the time depending on this equation as shown Figure 3.17. The equation is "time = distance/speed". We will get the time for a train to go from station A to station B by getting the distance and speed from train A, see the Figure in 3.17. This is the scenario if the train goes from station A to B. Since the time for each train is important to avoid an accident, we have to consider that station A is the center of nodes to generate the next train behind it.

Figure 3.17 Calculation of the time for train in station B If we want to generate a train behind station A, we have to subtract the time as
shown in Figure 3.18. The time will increase when a train moves from one station to
Figure 3.18 Calculation of the time for train in station C 3.3.6 Distribution of Airplanes
Before we start to generate airplanes, there are some important features for generating Agent of Complex System such as Airplanes (ACS A). An airplanes pattern differs from ACSC and ACST. At least one day in advance, a pilot has to predefine flight plans which are routes and times of the airplane before its departure from the airport. SID

and STAR [27] are two terms for departure and arrival. SID stands for Standard Instrument Departure. It is departure procedure. STAR stands for Standard Terminal Arrival Route. Weather conditions also affect the airplane. Conditions may cancel departure or delay arrival time of an airplane. Furthermore, there are two types of civil aviation, commercial aviation mid private aviation. In the GMACS simulation, we will consider private aviation. Private aviation [28] represents private airplanes which can be used to reach meetings, business, etc. We consider two scenarios for the generation of private aviation (PA). First, one has to generate source and destination with different timetables for departure and arrival of PA in the same airport. Most of the time, civil aviation departs from and returns to the same airport. In the second scenario, PA go from Airport A to Airport B with a timetable for departure from Airport A and arrival to Airport B.
3.4 Mathematical model
Building a simulator to simulate driver distribution on the road requires some mathematical background for modeling. First of all, to simply select a source and destination location is not difficult. You can randomly select two points from the dataset and you are done. But a question arises, whether our simulator resembles reality or not. To make our simulator as realistic as possible, we need to identify some driver distribution scenarios and model them mathematically. The modeling of these scenarios is largely discussed in the following sections. In this section, we will describe the foundations behind every scenario modeled.

First, we will talk about a single node. A node in our dataset is identified by its ID. Other attributes are visible, version, change set, timestamp, user, uid, lat and Ion. In our case, we use only id, lat and lot, since the other attributes are not important for our version of the application. An instance of a node looks like:
vis ible="true"
time stamp-'2009-11 9T23:41:34Z"
To select a node from a dataset of the nodes, we simply run a random function. Usually, we select two nodes to represent source and destination nodes. To measure the distance between two nodes, we use a distance equation. To explain this equation, consider the two points: S and D where S is the source and D is the destination. Let Siat, Diat and Sion, Dion present the latitude and longitude points respectively. The equation consists of the following steps:

1. Convert to radians: S
&iat anc^ &lon
2. Find d-iai- &lat anc^ dIon ~ SIon ^lon
3. Calculate res = sin x sin (-^p) + co s(Siat) x sin(D]at) x sin (-p1) x
4. Calculate distance = 2xatan2(^res,y/l res^xearthradius This calculation, returns the distance between two given nodes on a map.
To simulate some real-world scenarios for driver distribution on the road, we will use the k-means algorithm. K-means is a clustering algorithm that is used to cluster unsupervised data. By using k-means, we can select the regions on the map where most people live. We can also use this algorithm to select the places where most of the coffee shops are located, most of the parks are located, etc.
Before running the algorithm, we will set some parameters. With c we will denote the number of centroids, where each centroid represents the mean of all the points inside a cluster. With k we denote the number of iterations the algorithm will make until the final result. Now we randomly pick c centroids from the dataset. To ensure that the centroids are not close to each other, we can use a distance function such that the distance between cn and cv c2,..., cn is greater than a predefined minimum distance d.
After we define the primary centroids, we scan through our dataset. For every node in the dataset, we assign as part of the cluster, defined by the centroid, where the dist ance of this node and that centroid is the lowest from the other centroids. Doing so, we complete the first iteration. After this we compute the mean of all the nodes inside a cluster, so that we find a centroid in the center of the cluster. After this step, we scan again through the

dataset, and perform the same operations until we complete all k operations. The final result represents the clusters of nodes in a map. Now these regions present a group of buildings, coffee shops, schools or other objects concentrated in specific regions. We can use these regions to generate start and destination points of ACS.
The parameter k in our case can be eliminated. Instead we can use a parameter e to stop the execution of the program. The program will stop executing if the distance between all the old centroids mid new centroids is not greater than e.
3.5 Computational model
For the implementation of this project, I have used the Java programming language and MySQL relational database management system. The spatial data is downloaded from Open Street Map. To visualize the data, I have used JMapViewer 2.0 which is a Java library for visualizing OSM data.

4. Implementation
4.1 Open Street Map data
The data used in this project is downloaded from OSM in XML format. The XML format is not very suitable for performing queries. For this reason, the application provides a parser that reads XML data, extracts all the useful data, mid stores it in a MySQL database. The types of data we store in the database are:
Node: represents an object in the map like home, office, bar etc.
Way: represents a road or building in the map.
Node tag: represents the tags related to a node.
Way tag: represents the tags related to a way.
Tags: all OSM tags.
Reference: presents all the nodes in a way node.
Next, we present the database diagram used to store this information:

Figure 4.1 Database diagram to store the information Converting data from XML to MySQL made it easier to query and retrieve
information from the database. When using XML, for every query you need to perform,
you need to fetch all of the data in memory and write your own code to retrieve the required
information. MySQL in this case provides a wide variety of data manipulation. Creating
tables, inserting, deleting, and updating records can all be handled by MySQL. Also as the
amount of data increase, XML loses efficiency. In this case, MySQL again shows better
performance than XML.

4.2 Implementation of different generator approaches
Developing a simulator Generating for Mobile Agent in Complex System (GMACS) in a transportation network is not an easy task. The first problem that arises is that we do not have prior data of cars distributed on the road. If we had the information about cars traveling through a transportation network for an interval of a week, we could use this information to build an intelligent simulator by using machine learning algorithms. In our case, the generation of cars is done by considering different scenarios. These scenarios approach reality in different ways. Some of them tend to be more realistic than others. In the following section, we will discuss our simulation approaches, starting from one that does not emulate reality very well to one that emulates reality best.
4.2.1 Scenario 1: Simple random selection
Consider that we have a map of a specific area downloaded from Open Street Map. This map consists of nodes, where each node represents a map obj ect like house, apartment, school, coffee shop, bar, restaurant, park, etc. The idea behind this scenario is to select source and destination locations randomly. This scenario emulates reality of vehicle distribution at an unsatisfactory level. This is because this scenario faces many problems. First of all, the two selected random nodes could be very close to one another (let us say 30 ft.). For this short distance, you can rarely find someone who takes his car to drive from the source to the destination location. In the case where the two generated points are sufficiently far from one another, we consider the scenario to be good.

Pepsi Center
Coors Field O
6th Ave
Downtown Denver Partnership
^ Hyatt Regency Denver at Colorado.
E 23rd Ave E 21 Bt Ave
E 18th Ave
Ogden Theatre
E Alameda Ave

lo (f)
a 3
s- i s = § £2 £2
City Park Golf Cours Denver
E 13th Ave 12th Ave
7H 0> £
I 3" £2 s E 4th Ave
W 3rd Ave 3 £2 %
*9 Wist Ave E,1 st Ave
E 7th Ave Pkw E6th Av
Figure 4.2: Scenario 1: Simple random selection 4.2.2. Scenario 2: Simple random selection with a distance measure
In this scenario, we will try to solve the problem of short distance source-destination node generation. The main idea is to define a distance d which represents the radius of a circle with a center at the source node. The destination location will be selected outside of this circle. Figure 4.3 provides a visualization of this method.

Pepsi Center
Coors Field O
Downtown Denver Q Partnership

E 23rd Ave E 21 Bt Ave
City Park Golf Cours Denver
^ Hyatt Regency Denver at Colorado.
E 18lh Ave
Ogden Theatre
*9 s ra O i 1
<2 \ i i CAPITOL HILL iS <2
CO f. % ** 7. o u
5 V5 c n
E 13th Ave 12th Ave
El 7th Av
" E 7th Ave Pkv>
6th Ave ./[6th Ave ] ^ J ^ 6lh Ave E 6 th Av

0 I L- fSs, E 4 th Ave
/e W 3rd Ave g: %./J CHERRY CREE
9 W c! Ave ^
E Alameda '
; co o
§ w, 5 O j~ 1
<2 5' g 2 = § i2 M BELCARO
<2 is s? I
Figure 4.3 Scenario 2: Simple random selection with a distance measure
With this scenario, we are emulating the reality of source to destination location generation better than the first scenario. As we progress through the scenarios, we more accurately represent reality.
4.2.3 Scenario 3: Using K-Means algorithm to define regions
Assume we have a map downloaded from Open Street Map. U sually in a map, there exist some areas where most peoples homes are concentrated. Often, these areas are outside of downtown and in the periphery. We need a way to identify these because most cars starting points would be there. We can cluster the map based on different

perspectives. For example, we can generate clusters where most people live, where most of the bars are located, where most of the parks are located, etc. Assume that in our case, we are interested in clustering the map based on the places where most of the people live. To do so, we will select from the database those nodes with tags related to a building (home, apartment, rental, etc). After this, we will randomly select k nodes as centroids. To escape the possibility of selecting two nodes close to each other, we have used a distance d. Every centroid must be at least d meters from all the others.
After selecting these centroids, we scan through our dataset of nodes mid every node is assigned to the closest centroid. After we go through all the nodes, we re-calculate the centroids until the distance between the new generated centroid and the old centroid is less than a predefined e. After we generate all the clusters, we can randomly select a source node from one cluster and the destination node from another cluster and so generate a source-destination path. Figure 4.4 provides a visualization.

Figure 4.4: Scenario 3: Using K-Means algorithm to define regions 4.2.4 Scenario 4: Using K-Means to generate source to destination locations
After the nodes are assigned to clusters in a map, we can use different approaches to generate source to destination locations. We can use intra source-destination location generation. This approach selects one random node from cluster A, and the destination point is also taken from cluster A. We can define a measure d such that we escape the problem identified in Scenario 1.
Another approach is to generate an inter source-destination location. We select a source node from cluster A and the destination location from cluster B. We can also use a combination of these two approaches.

Figure 4.5: Inter source-destination location 4.2.5 Scenario 5: Defining downtown and the around area
Our approach to this simulator was scenario-based. This approach simulates one pattern that we have identified in the transportation network. From the research completed, most of the offices and work places are in the downtown area. At the same time, fewer living places are focused in the downtown area. Most of the living places are concentrated outside of downtown. During the weekdays, most people travel to work from their homes. They travel from the periphery to downtown. On the weekend, people mostly travel from their homes to recreational places like coffee shops, night clubs, etc, that are mostly focused in the downtown area. Thus, we can see a pattern of cars moving from outside of downtown to downtown. To simulate this, we can simply click on the downtown area in our

application, and define a radius r that builds a circle covering the downtown area. Now, we can generate source locations from the periphery and destination locations in downtown. We can also define specific tags for source destinations like home and apartment and select recreational locations like coffee shops and night clubs for the destination location in downtown. Figure 4.5 provides a visualization.
Pepsi Center
Coors Field O LODO &

.Downtown Denver
Hyatt Regency Denver at Colorado..
E 23rd Ave E 21 st Ave
Ogden Theatre
City Park Golf Cours Denver
^ Bluet
o o
< E 13th Ave I %
2 <3 E
£ 12th Ave J £
X CP £
I S' £ s E 4th Ave
W 3rd Ave 3 \
Wist Ave E 1st Ave
BAKER E Alameda Ave
§. 8ihAve
E 7th Ave Pkvi E 6th A v
s-1 s = § $3 p I5 1 5 § % BELCARO
H2 E X I
Figure 4.6: Scenario 5: Defining downtown and the around area
4.3 Application developed
In the application that we developed, the user needs to first provide the data from OSM. These data are automatically parsed and stored in a MySQL database. Let us say in our case that the user wants to run Scenario 2. When the application runs, it opens the main window. One of the main ideas we kept in mind while developing this simulator was to

offer the user a very flexible solution to generate cars on the transportation network. We have the Key on which are provided a number of keys which serve as a general instance of many values. For example, the amenity key holds values like bank, bar, bbq, bench, etc. While selecting these values, the user can click in the area where source tags or destination tags are, such that the values previously selected appear in that location.
The two other important forms here are time and weather condition. Time mid weather condition affect the distribution of drivers in the transportation network in different ways. For example, on the weekdays, at 8:00 a.m and 4:00 p.m most of the drivers are traveling from home to work or the other way around. So, Scenario 5 is more suitable in this case.
Weather conditions also play a crucial role in the driver distribution in a transportation network. In our implementation, if the weather is sunny, all of the cars defined by the user are generated. If the weather is rainy, more people are likely to start their cars and drive to their preferred destination. The user can define a percentage of how many more drivers he wants to generate on top of the predefined number he previously set. If the weather is snowy, more people are likely to go to their destination by using public transportation rather than by their own cars. So, the user can define a percentage of cars to be inactive from the predefined number of cars he set before.
4.4 Graphical User Interface for GMACS
We will describe a user scenario of generating a number of ACS in a transportation network. We will not go into detail explaining how the user downloads the data from OSM

and how the script is run to convert the data from XML to an SQL table. We consider that the data are already downloaded and inserted into the database. Now, we will break down the scenario:
1. The user runs the application, and the window in the Figure 4.7 appears. The user has the possibility to choose one of the five starting scenarios. These scenarios are described in the grey boxes.
Figure 4.7: Graphical User Interface for GMACS 2. Let us say that the user has selected scenario 4. The base scenario here is called
inter & intra random generation of ACSC using a k-means approach and distance
as a measure. The chosen points in the map, will be clustered in k clusters, and after
that the source-destination locations will be generated. After clicking the button
showing Scenario 4, the window in Figure 4.8 will appear on the screen.

|A| Data generator v1.0.0
Distance: Nr. mov. objects: Nr. centroids:
Source Destination Src. Tag Des.Tag Hour Weather


Figure 4.8 Source-Destination Locations
This is the main window where the user has the flexibility to build his/her own choice of ACSC scenario. First, let us separate the above windows. On the right side, we have the map that we will use to generate the moving objects into. We will use all the map-tags consisting of K (keys), V (values) attributes, which represent every specified object in the map like house, school, bus station, etc. These tags will be used to build the second step of the simulation. In the left part, we have a number of forms. Starting from the top-left, we have distance, which defines at least how a source node must be from a destination
node. Providing a lower value for the distance, makes the simulation deviate a little from reality. This is because the probability that somebody will drive his car to go from point A to B where dist(A, B) = 10 is very low.

The next form is Number of ACSC. Here, the user defines how many ACSC they want to generate in the transportation network. This number may change depending on the user-defined specific time and weather. Since in this scenario we use the k-means algorithm to cluster the selected items from the map, we need to define the number of centroids. We have the possibility in the code to set a distance specific to centroids, to help define the clusters as well as possible. For example, setting this distance to 4000m, the algorithm will choose k centroids such that each of them is 4000m far away from all the other centroids.
The next form is weather. Here, the user can select one of the weather conditions provided: sunny, rainy, or snowy. Each of these conditions will affect the number of ACSC. For example, when the weather is rainy, the number for ACSC in the transportation network is increased.
The next form is hour. The time of the day also affects the number for ACSC in the transportation network. For instance, more ACSC will appear during the working hours rather than other parts of the day.
The key and value forms are the most important forms here. The user can build his/her own scenario by playing with the values in these forms. Values under the key form, stand as generic items that consist of many other specific values under the value form. For instance, selecting building from the key form, the value form will be populated with values that are related to building like apartment, bridge, house, commercial, school, etc. As we click these values, they will be shown in the source tags text area. In order to define the destination tags, we simply click the destination tags text area again select the key and

values that we want. In this way, we generate the scenario of a simulation. For example, to generate some ACSC that go from their homes to their jobs as source tags we can select apartment, house, renting, or studio, and for the destinations, we can select office, school, restaurant, coffee shop and other related tags.
When the user presses the run button on the bottom of the page, the algorithm starts running and the resulting table is populated with the data generated. The source and destination nodes can be visualized in the map by clicking the eye button, and the data can be saved in a text file by clicking the save button. For the data stored in the text file, we can choose the structure we want.
Figure 4.9 provides an example of a home-to-work scenario of 15 ACSC, where
the distance is 1000, number of centroids 3, the weather is sunny and the time shows 08:00.
L&j Data generator v 1.0.0
Distance: Nr. mov. objects: Nr. centroids: Weather Hour
1000 15 FII M M Sunny S M 1 J 1 II
Key: Wmt Source tags:
ootway - apartments M apartments: house: residential hotel
addnnterpoiation bakehouse
addr.postcode bam
aeroway t >.7
amenity bunker
area cabin
barrier carport
boundary cathedral Destination tags:
bridge chapel lofflce. commercial, roof, university, school
building church
busway tovtc
craft commercial
cutting construction
cycleway cowshed
electrified a detached
East 17th M*ou* Partway
hAwue 1}rJ Awoue + -
uwa? ,
Source Destination Src. Tag Des. Tag Hour Weather
17606B224 3750134298 house loftice 08:00 sunny -
176068374 3750134302 apartments [commercial 08:00 sunny -
176 ^750143856 apartments office 08:00 sunny
176105089 3751127520 apartments ischool 08:00 Isunny
176109187 2626499031 apartmunts school 0800 tunny
<: .. .vj 2393490722 2026498150 house untunffii residential poof 08:00 isunny 08.00 Isunny
2393490744 12626405436 residential [office 08:00 tunny
2930578147 1129129095 house office 03 00 Jsunn,
1 354093113 apartments commercial 08:00 tunny .
3750131989 |176091093 apartments university 0800 Isunny
Era 7 9. WhAvWuf IS eJJL

UU T 1 Avenue)1 #
6 8
rat Alameda Avenue E*K Alameda *
# *
i tmetti
Expovtion Avenue
h 1 1 k

Fast Tmneme Avenue
Figure 4.9 Home-to-work scenario

As we can see in the map, the green dots show the starting node for each ACSC, whereas the blue dots show the destination nodes.
In the same way, we can we can select other base scenarios, continue to build it as we want, mid generate ACSC. One interesting case is when we want to generate trains mid airplanes as moving objects. Usually in a location, there are few tags that represent bus stations or small jet airports. In the case where the map does not contain any train stations, the simulator will not consider generating any trains. This truly simulates reality, because where there are no train stations or rails, you cannot have trains. On the other hand, for airplanes, there are two specific cases: small airplanes and commercial jets. For a small airplane, we simply generate a trajectory of latitude mid longitude nodes that the airplane passes through. This trajectory can be of any shape. In the case of commercial jets, they usually have a linear trajectory. Given that the map we consider most is of the size of a city, most of the time the trajectory remains linear. This is the case that we can handle.

5. Experiments
The purpose of this thesis was an inner portion of a bigger application called GRID (Green Routing Investigation mid Design). We were responsible for providing a way to generate ACSC in a transportation network, so that the distribution would be as realistic as possible. Moreover, the output of the GMACS simulation will be the input for the GRID algorithm, so the format consists of the node ID, longitude, and latitude. In this section, we will describe some of the experiments regarding the scenarios described above. All the experiments were performed on a Lenovo Yoga 710, CPU i7-6500, 2.50Ghz, 16GB of RAM, 256GB SSD. For the experiments, we used Denvers downtown map. The dataset representing this region consists of 18,129 nodes with 16,931 tags. The experiments can vary, depending on the map that we provide, the number of ACSC we need to generate, the number of tags that are available in the Open Street Map for that specific area, the selected tags we use to build the scenario, mid the approach we use to generate the ACSC, i.e., randomly with distance considered, k-means and its variations.
In the first experiment, we created a scenario of cars traveling from their living places to their work places. The tags we considered as source tags are apartments, hotel, house, houseboat, and the tags we considered as destination tags are university, commercial, hospital, garage, office, retail, roof, school, service,
In total, there are 2120 rows retrieved from the database with the specified source tags and 2921 rows with the specified destination tags. Clearly, as the user provides more tags, the number of rows will increase. Using this scenario, we tested the execution time

of different mentioned approaches by providing different parameters of distance and number of cars. The following graph shows these results:
100 500 1000 1500 2000 5000 10000
4500 3.094 5.673 9.025 11.695 15.15 32.767 64.661
^4000 2.755 3.094 3.653 4.111 4.552 7.721 12.811
^2000 2.625 2.576 2.594 2.58 2.649 2.555 2.562
^1000 2.825 2.597 2.61 2.578 2.619 2.625 2.553
Figure 5.1 Execution time of generating various number of moving agents using the
"random with distance" scenario
From Figure 5.1, we can see that we generated 100, 500, 1000, 1500, 2000, 5000 and 10000 ACSC considering distances 1000, 2000, 4000, and 4500 meters. Just looking at the graph we can infer some interesting facts about the map. For example, the execution time to generate ACSC for the distance 1000 and 2000 meters is almost the same. This happens because the map consists of many nodes which are farther away from each other than 2000 meters. The source and destination nodes, when selected randomly, will usually pass the criteria of distance. For the distance equal to 4000m, we can see an increase in the execution time. This increase happens because there are fewer points where the distance
between them is more than 4000 meters. The possibility of randomly selecting these points

is low. This effect is most prominent where the distance is 4500m. Few nodes in the dataset qualify for that distance condition. For a source-destination, ACSC to be generated under this distance condition, there is a small possibility to randomly be selected. Thus, the execution time is much higher.
In the second experiment, we will measure the execution time of generating ACSC when using the k-means approach. Before doing that, we will test the execution time of the centroid generation. The way we stop the execution of the program is by defining an as a distance measure. If the distance between the kth generated centroids and (k l)th generated centroid is less than or equal to e, then we stop the execution of the application. Now, we will test the execution time of centroid generation based on the different values, and different number of centroids.
3 2.594 2.593 2.578 2.558
5 2.625 2.515 2.496 2.487
10 3.012 2.547 2.531 2.516
Epsilon value
Figure 5.2 Execution time of centroid generation for various number of centroids and
epsilon values

As expected, the execution time decreased as the epsilon value is increased. In our case, this decrease in execution time can best be seen when generating 10 centroids. In other cases, the difference in execution time cannot easily be detected by the graph, but the numeric values help us understand that.
From this graph and how the k-means algorithm is implemented, we can infer that the execution time of the centroid generator is dependent on the first step when the centroids are randomly generated, the epsilon value we choose, the number of centroids we need to generate, the distance between first centroids generated, and the number of tuples in the data set (in our case 5041).
In the third experiment, we measured the execution time of generating ACSC when using k-means approach. Using k-means, we can cluster the regions where most of the tags are found, depending on the tags that the user specifies. For example, in the above specified tags, the algorithm will classify the regions where living places and working places are located. We can test this method in three ways. We can generate inter, intra, and mixed moving object source-destination locations. Between these three approaches there is not much difference in execution time. For this reason, we present only the execution time when using the mixed (i.e., inter and intra cluster moving object source-destination generation) approach.

5000 20000 50000 100000
Number of cars generated
50 250 500
Figure 5.3 Execution time when using mixed moving object source-destination location
Figure 5.3, the values 50, 250 and 500 present the value. As we can see from the graph, increasing the value causes the algorithm to run much faster. The execution time in this case, is mostly dependent on the size of the input (i.e., number of ACSC).

6. Conclusions and Future Research
6.1 Conclusions
Reality scenarios are very complex to simulate, especially when we lack data, we
need to analyze and find some approaches to approximate the simulation. Think of the case
where we need to build a generator that generates the distribution of ACSC (vehicles) in a
transportation network. There are numerous surrounding effects that directly or indirectly
affect the behavior for ACSC. As an example, the weather conditions are a factor that
largely affects the number of agents in a transportation network. Due to the lack of data, it
is very difficult to exactly define a function where given a type of weather, i.e., sunny,
rainy, or snowy as an input, it can directly show the number of ACSC as output. Having
data that provide this and much more information, we would have the chance to extract
object movement patterns, identify behaviors when objects face different surrounding
environments, identify the locations that most people prefer to go on weekdays/weekends,
identify the reasons behind traffic jams, and more. With all this being said, the thesis
proposes a solution to GMACS generator that generates ACSC in a transportation network.
This generator is a portion under an umbrella project called Green Routing Investigation
and Design. The generator is flexible, easy to use, and easily understood. The generator
reads the map data provided by the user and downloaded from OSM. The user then has the
flexibility to build a scenario by choosing one of the approaches provided: random
generation for ACSC providing distance as measure; inter, intra or mixed random
generation for ACSC locations using spatial k-means; choosing a specific point as a hot
spot where all the ACSC are leading to; etc. The beauty of this simulator is that the

aforementioned approaches represent just the first step toward generation for ACSC locations. On top of them, the user is allowed to add the specific locations they want to be involved in the simulation, the specific time, the weather condition, the number of ACSC, the distance, etc. As a result, the user is provided with the data generated from the generator. These data consist of various attributes, e.g., source location, destination location, source tag, destination tag, time, and weather. Furthermore, the generator provides a visualization of all ACSC source mid destination locations. To come up with this generator, we used the information taken from the Santa Fe Institute on how to build an Agent Complex System, information from Brinkoff on identifying the behaviors of ACSC, data from USDOT that provides the locations people visited during a one year interval, average person trips, trip lengths per household, and locations visited during weekends and weekdays. Moreover, we also analyzed the affect that new transportation technologies like Uber mid Lyft have on the transportation network. We did these analyses by reading papers regarding this topic and going out as a driver and surveying the passengers. Doing all this, helped us complete a generator that generates the generation of ACSC in a transportation network as realistically as possible.
6.2 Future work
Nowadays, smart cities are the main topic in many news magazines mid portals. A smart city cannot be smart if people living there only use GPS applications to guide their path from point A to point B. GPS applications only consider the shortest arrival time and shortest path to generate the best route. A smart city cannot be smart if we are not interested in reducing the negative impact of gas emissions released by many vehicles. Having these

two previously mentioned cases in mind, we plan to improve the vehicle GPS systems so they consider environmental issues in order to provide better routes considering eco-friendliness. The generator presented in this thesis, is ready to generate ACS source mid destination locations as realistically as possible. The data generated can be used by the GRID application to see how the eco-friendliness impacts the ACS. This thesis is developed using some scenario-based approaches. Thus, the impact of weather or time on the number of agents is statically implemented. In the future, we plan to have real datasets of ACSC trips (cm's, airplanes, trains), so we can better analyze the patterns and implement a smart simulator by using machine learning and data science. Other than this, having a dataset of weather conditions, road conditions, and time, together with the trips of the ACS, we can even better define functions that tell us the impact of these phenomena in the distribution of agents.

[1] United States Environmental Protection Agency EPA. (2014). Overview of Greenhouse Gases [Online]. Available: #carb on-dioxide.
[2] Jerry Hirsch. (2014). 253 million cars and trucks on U.S. roads; average age is 11.4
years [Online]. Available:
[3] ETH Zurich .Agent and Complex System, Journal Of Object Technology, vol. 1, (2), pp 36^15, 2002.
[4] T. Brinkhoff, "A Framework for Generating Network-Based Moving Objects," Geoinformatica, vol. 6, (2), pp. 153-180, 2002.
[5] C. Duntgen, T. Behr and R. H. Guting, "BerlinMOD: a benchmark for moving object databases," The VLDB Journal, vol. 18, (6), pp. 1335, 2009.
[6] M. F. Mokbel et al, "A demonstration of MNTG A web-based road network traffic generator," in 2014, DOE 10.1109/ICDE.2014.6816752.
[7] J. Saglio mid J. Moreira, "Oporto: A Realistic Scenario Generator for Moving Objects," Geoinformatica, vol. 5, (1), pp. 71-93, 2001.
[8] F. Giannotti et al, "Synthetic generation of cellular network positioning data," in 2005,
. DOE 10.1145/1097064.1097068.
[9] O. Gunther et al, "Benchmarking spatial joins a la carte," International Journal of Geographical Information Science, vol. 13, (7), pp. 639-655, 1999.

[10] R. H. Gutin, T. Behr, mid C. Duntgen, SECONDO: A platform for moving object database research and for publishing and integrating research implementation, IEEE Data Engineering Bulletin, vol. 33, (2), pp. 56-63, 2010.
[11] OpenStreetMap, [Online] Available:
[12] R. J. Beckman, K. A. Baggerly and M. D. McKay, "Creating synthetic baseline populations," Transportation Research Part A, vol. 30, (6), pp. 415-429, 1996.
[13] F. Giannotti et al, "Synthetic generation of cellular network positioning data," in 2005,
. DOI: 10.1145/1097064.1097068.
[14] M. Song et al, "A stochastic viewpoint on the generation of spatiotemporal datasets," in Anonymous Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 1225-1234.
[15] T. Tzouramanis, M. Vassilakopoulos mid Y. Manolopoulos, "On the Generation of Time-Evolving Regional Data," Geoinformatica, vol. 6, (3), pp. 207-231, 2002.
[16] M. Behrisch et al, Simulation of Urban Mobility: First International Conference, SUMO 2013, Berlin, Germany, may 15-17, 2013. Revised Selected Papers. 20148594.
[17] SMARTEST. Final Report for Publication. Technical report, European Commission Transport RTD Programme of the 4th Framework Programme, 1999, Project Reference: RO-97- SC.1059.
[18] SMARTEST Project Web Site, (1999).
[19] G. Gidofalvi and T. Pedersen, "STACTS: A spatio-temporal activity simulator," in 2006,. DOI: 10.1145/1183471.1183498.
[20] J. Xu and R. H. Guting, "MWGen: A mini world generator," in 2012, DOI: 10.1109/MDM.2012.39.

[21] Dr.Williams. CSCI4800/5800. Class Lecture, Topic: special Topics : Simulation University of Colorado at Denver, Colorado, Denver, July. 2016.
[22] Open street map website, OpenStreetMap, [Online]. Available: https://
[23] Wikipedia website, OpenStreetMap elements, [Online]. Available:
[24] United states Departments of Transportation, Passenger Travel Facts mid Figures
2016, [Online]. Available: https://
[25] Alejandro Henao, Impacts of Rides ore ing Lyft and Uber on Transportation including VMT, Mode Replacement, Parking, and Travel Behavior, Ph.D. dissertation, Dept. Civil Engineering, University of Colorado, Co, Jan. 2017.
[26] HANS SIPILA, Simulation of rail traffic, thesis in Transport science, School of Architecture mid the Built Environment, Department of Transport Science, 2012.
[27] Wikipedia, Flight plan, [Online] Available
[28] Wikipedia, Civil aviation, 2009, [Online] Available
[29] R. A. DeMilli and A. J. Offutt, "Constraint-based automatic test data generation," IEEE Transactions on Software Engineering, vol. 17, (9), pp. 900-910, 1991.

[30] E. Bonabeau, "Agent-based modeling: methods and techniques for simulating human systems," Proceedings of the National Academy of Sciences of the United States of America, vol. 99 Suppl 3, (Supplement 3), pp. 7280-7287, 2002.
[31] C. M. Macal and M. J. North, "Tutorial on agent-based modeling and simulation," in 2005,. DOI: 10.1109/WSC.2005.1574234.
[32] A. Torreno, E. Onaindia and O. Sapena, "FMAP: Distributed cooperative multi-agent planning," Applied Intelligence, vol. 41, (2), pp. 606-626, 2014;2015;.
[33] S. Almeida, J. Queijo mid L. M. Correia, "Spatial and temporal traffic distribution models for GSM," in 1999,. DOI: 10.1109/VETECF. 1999.797068.
[34] D. Lee et al, "Spatial modeling of the traffic density in cellular networks," IEEE Wireless Communications, vol. 21, (1), pp. 80-88, 2014.
[36] J. B. Orlin et al, "A faster algorithm for the single source shortest path problem with few distinct positive lengths," Journal of Discrete Algorithms, vol. 8, (2), pp. 189-198, 2010.
[37] K. Talan, G. Amnote, Shortest path finding using a star algorithm and minimum weight node first principle, International Journal, vol. 3, (2), pp 1258-1262, 2015.
[3 8] W ikipedia website, K-means clustering, [Online]. Available:
[39] J. Wu, Advances in K-Means Clustering: A Data Mining Thinking. 2012.
[40] Google map, Pulaski Park in Denver, [Online]. Available: http s: //www. go o gl e. com/map s

Appendix A
Table 3.1 Several types of keys and values in the tags for Denver and Pueblo
Several types of keys and values inside the tags
No Tag Lat, Ion Meaning of Tag
1. lat="38.2250033" lon="-104.5449727" Street
2. lat="38.2458362" lon="-104.5685844 Parking lot near to main street
3. lat="38.2655599" lon="-104.6287281" traffic_signals in the high way
4. lat="38.2272601" lon="-104.6400195" turning^circle
5. lat="38.2416182" lon="-104.5353802" M otorway-j n ctio n
6. cnode id="177897271" visible="true" ve rsio n="4" ch a ngeset=" 12424855" timestamp="2012-07-22T04:07:31Z" user="GPS_dr" uid=" 117055" lat="38.2979056" lon="-104.5971061"/> lat="38.2979056" lon="-104.5971061" A point on the street
7. lat="38.2908852" lon="-104.6076120" Exit
8. lat="38.2878254" lon="-104.6837018" Tower
9. lat="38.2138228" lon="-104.5710729" Power pole
10. lat="38.2398263" lon="-104.6025856" Power transformer
11. lat="38.3431213" lon="-104.6192202" Railway
12. lat="38.2675027" lon="-104.6513644" Reservoir
13. lat="38.2443699" lon="-104.5993167" 1 do not know
14. Import from tiger
15. end ref = "2507875301"/> end ref ="2507875277"/> Waterway
16. end ref=" 1818429796"/> end ref=" 1818429789"/> B rid age
17. lat="38.2966690" lon="-104.6469195" Radio
18. lat="38.2694470" lon="-104.6038629" Wrong info.
19. lat="38.2804832" lon="-104.6141399" Helipad
20. lat="38.2875024" lon="-104.6233078" Peak
21. lat="38.2720512" lon="-104.6253096" BNSF Pueblo Yard Office
lat="39.6730989" lon="-104.8887745" tree
lat="39.7642216" lon="-104.9598525" Bus stop for RTD
lat="39.6723823" lon="-104.8095932" Cycle barrier

lat="39.7421555" lon="-104.9808689" Custom Creations
lat="39.7265531" lon="-105.0004922" Center Electric Service

Leisure Tags
No Tag Lat, Ion Meaning of Tag
28. lat="38.2613917" lon="-104.6213635" Wrong location Lesisure "park"
29. lat="38.2682503" lon="-104.6108631" Leisure "Sport"
30. lat="38.2834061" lon="-104.6071319" Leisure "swimming_pool"
31. Baseball
32. Leisure"tennis"
33. Leisure "playground"
Amenity Tags |
No Tag Lat, Ion Meaning of Tag
34. lat="38.2877802" lon="-104.6205299" Cemetery
35. lat="38.3037548" lon="-104.6156449" Restaurant
36. lat="38.3045209" lon="-104.6157575" fast_food
37. lat="38.2418781" lon="-104.6469051" Post office
38. lat="38.2242751" lon="-104.6440137" Fire station Nol
39. lat="38.2388920" lon="-104.5844182" School
40. lat="38.2355040" lon="-104.6450239" Pla ce_of_wors h i p
41. lat="38.2573311" lon="-104.6065842" Bar
42. lat="38.2657156" lon="-104.6112225" Theatre
43. lat="38.2823364" lon="-104.6110825" Parking
44. lat="38.2707507" lon="-104.5965308" Fuel
45. lat="38.2720752" lon="-104.5693144" Car_wash
46. lat="38.2884413" lon="-104.5960748" Bank
47. lat="38.2888623" lon="-104.5969787" Pharmacy
48. lat="38.2977083" lon="-104.6141554" ATM
49. lat="38.2691737" lon="-104.6078707" Pub
50. lat="38.2626492" lon="-104.5794467" Recycling
51. shelter
52. lat="39.7976852" lon="-105.0801482" Cinema
53. lat="39.7969577" lon="-105.0785442" cafe
54. lat="39.7263685" lon="-105.0006153"
55. 79

56. lat="39.7377251" lon="-104.9980582" vending^machine
Building Tags
No Tag Lat, Ion Meaning of Tag
57. lat="38.2818377" lon="-104.6125303" Hospital
58. Apartment building
59. end ref ="3263937484"/> Railway station
60. end ref ="3818237097"/> Building "school"
61. lat="39.7459743" lon="-104.9857960" school
62. end ref = "3291533202"/> House
63. end ref = "3408252714"/> Apartment building
Tourism Tags |
No Tag Lat, Ion Meaning of Tag
64. lat="38.2673160" lon="-104.6077827" Hotel
65. lat="38.2672964" lon="-104.6111869" Museum
66. lat="38.2979630" lon="-104.6129591" Motel
67. lat="38.2654392" lon="-104.6126386" Inf or.
68. end ref = "4105062924"/> Attraction
69. lat="39.7500780" lon="-104.9494099" zoo
70. lat="39.6166150" lon="-105.0310967" picnic_site
71. lat="39.7467949" lon="-104.9954972" artwork
No Tag Lat, Ion Meaning of Tag
72. lat="38.2710855" lon="-104.6036521" Shop for car
73. lat="38.3025149" lon="-104.6058402" Shop "target"
74. lat="38.2849654" lon="-104.5725975" Convenience
75. lat="38.3015834" lon="-104.6093117" Shop "Sears"
76. lat="38.3036637" lon="-104.6089823" Shop "Bath Stamp; Body Works"
77. lat="38.3035026" lon="-104.6082533" Shop "Victoria's Secret
78. lat="38.3032415" lon="-104.6090106" Shop "sports"
79. lat="38.3028718" lon="-104.6083659" Shop "clothes"
80. end ref = "4375692860"/> Pueblo Mall
81. lat="39.7318185" lon="-104.9988417" iComputer Denver Repair
82. lat="39.7453919" lon="-104.9859282" Anaconda Printing
83. lat="39.7419565" lon="-104.9808930" art
84. lat="39.7263986" lon="-105.0006158" sports
85. lat="39.7694111" lon="-105.0207898" alcohol

| | |
Office tags
No Tag Lat, Ion Meaning of Tag
86. lat="39.7459857" lon="-104.9846299" lawyer
87. lat="39.7460310" lon="-104.9846299" financial
88. lat="39.7459351" lon="-104.9846299" company
89. lat="39.7424571" lon="-104.9808649" Estate agent
90. lat="39.7424195" lon="-104.9807663" therapist
91. lat="39.7263419" lon="-105.0006149" The Public Works

Appendix B
B. 1 City Database
0. Filter objects
Si Tables node nodetag osm_tags reference way waytag
Stored Procedures
] insertNode ] insertNodeTag ] insertReference ] insertWay ] insertWayTag
s electWay
r-__l:___ V
Figure B. 1 Tables and stored procedures for city database
B.2 Store procedures
----gettingthe valuesfrom specific key
CREATE PROCEDURE 'getValuesFromSpecificKey'

-----Insert node
P_ID_Node CHAR(ll),
INSERT INTO Node (ID_Node, Lat, Lon) VALUES (P_ID_Node, P_Lat, P_Lon);
-----Insert node tag
P_Node_FK CHAR(ll),
INSERT INTO NodeTag (ID_Node_FK, K, V) VALUES (P_Node_FK, P_K, P_V);
----Insert reference
CREATE PROCEDURE 'insertReference'
PJD_Way_FK CHAR(ll), P JD_Node_FK CHAR(ll)
INSERT INTO reference (ID_Way_FK, ID_Node_FK) VALUES (P_ID_Way_FK, P_ID_Node_FK);

-----Insert way
PJD_Way CHAR(ll)
-----Insert way tag
INSERT INTO waytag (ID_Way_FK, K, V) VALUES (P_ID_Way_FK, P_K, P_V);
----Select all living and working place
CREATE PROCEDURE 'selectAIILivingAndWorkingPlaces'O BEGIN
SELECT n.lD_Node, n.Lat, n.Lon FROM Node n INNER JOIN Reference r ON n.lD_Node = r.lD_Node_FK INNERJOIN Way w ON w.lD_Way = r.lD_Way_FK INNERJOIN WayTagwt ON w.lD_Way = wt.lD_Way_Fk
WHERE (wt.V = 'apartments' OR wt.V = 'house' OR wt.V = 'residential') UNION ALL
SELECT n.lD_Node, n.Lat, n.Lon FROM Node n INNERJOIN Reference r

ON n.lD_Node = r.lD_Node_FK INNERJOIN Way w ON w.lD_Way = r.lD_Way_FK INNERJOIN WayTagwt ON w.lD_Way = wt.lD_Way_Fk
WFIERE (wt.V = 'office' OR wt.V = 'retail' OR wt.V = 'school'
OR wt.V = 'roof' OR wt.V = 'yes' OR wt. K = 'shop' OR wt. K = 'amenity');
-----Select all living places
SELECT n.lD_Node, n.Lat, n.Lon FROM Node n INNERJOIN Reference r ON n.lD_Node = r.lD_Node_FK INNERJOIN Way w ON w.lD_Way = r.lD_Way_FK INNERJOIN WayTagwt ON w.lD_Way = wt.lD_Way_Fk
WHERE (wt.V = 'apartments' OR wt.V = 'house' OR wt.V = 'residential');
-----Select all work places
SELECT n.lD_Node, n.Lat, n.Lon FROM Node n INNERJOIN Reference r ON n.lD_Node = r.lD_Node_FK INNERJOIN Way w ON w.lD_Way = r.lD_Way_FK INNERJOIN WayTagwt ON w.lD_Way = wt.lD_Way_Fk
WHERE (wt.V = 'office' OR wt.V = 'retail' OR wt.V = 'school'
OR wt.V = 'roof' OR wt.V = 'yes' OR wt. K = 'shop' OR wt. K = 'amenity');
----Select ID by Longitude and latitude

SELECT ID_Node FROM node WHERE Lat = P_Lat AND Lon = P_Lon;
-----Select key from OSM tags
-----Select node
-----Select node by ID
SELECT Lat, Lon FROM Node WHERE ID_Node = P_Node_ID;
-----Select node based on way tag values
CREATE PROCEDURE 'selectNodesBasedOnWayTagValues'
SELECT n.lD_Node, n.Lat, n.Lon FROM Node n INNER JOIN Reference r ON n.lD_Node = r.lD_Node_FK INNER JOIN Way w ON r.lD_Way_FK = w.lD_Way

INNER JOIN WayTag wt ON w.lD_Way = wt.lD_Way_FK WHERE V = P_V;
-----Select node tag
----Select reference
SELECT ID_Way_FK, ID_Node_FK FROM reference;
----Select reference by way ID
CREATE PROCEDURE 'selectReferencesByWaylD'
SELECT ID_Node, Lat, Lon FROM Node INNER JOIN Reference ON Node.lD_Node = Reference. ID_Node_FK WHERE Reference.ID_Way_FK = P_ID_Way_FK;
-----Select value by key in way tag
CREATE PROCEDURE 'selectValueByKeylnWayTag'
P_Value VARCHAR(512)

-----Select way
-----Select way by key
P_Key VARCHAR(512)
-----Select way tag
SELECT ID_Way_FK, K, V FROM waytag;

Appendix C
C.l some java code
----Database connection
20 21 22
26 27
package DatabaseConnection;
import java.sql.*;
public class DatabaseConnector
public Connection con;
public DatabaseConnector()
con = DriverManager. getConnection { url: jdbc :mysql;//localhost: 3306/ci
catch(Exception ex)
public Connection getConnection() { return con; } public Connection getConO { return con; }

Full Text