Final report of ITS Center project: Adaptive signal decision support system.
UVA Center for Transportation Studies
A Research Project Report
A U.S. DOT University Transportation Center
DATA MINING TOOLS FOR THE SUPPORT OF TRAFFIC SIGNAL TIMING PLAN DEVELOPMENT IN ARTERIAL NETWORKS
Principal Investigators:
William T. Scherer, Brian L.
Smith Graduate Assistant:
Trisha Ann Hauser
Center
for Transportation Studies
University of Virginia
Thornton
Hall
351 McCormick Road, P.O. Box 400742
Charlottesville,
VA 22904-4742
804.924.6362
May 2001
UVA-CE-ITS-02-6
The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the information presented herein. This document is disseminated under the sponsorship of the Department of Transportation, University Transportation Centers Program, in the interest of information exchange. The U.S. Government assumes no liability for the contents or use thereof.
Intelligent transportation systems (ITS) include large numbers of traffic sensors that collect enormous quantities of data. The data provided by ITS is necessary for advanced forms of control; however, basic forms of control, primarily time-of-day (TOD) which are prevalent in the United States do not directly rely on the data. Thus sensor data is typically unused and discarded in this country. The sensor data is in fact capable of providing abundant amounts of information that can aid in the development of improved TOD signal timing plans by providing historical data for automatic plan development and TOD interval identification. Data mining tools are necessary to extract the information necessary from the data to improve on timing plan development and in turn would allow the timing plan development and monitoring process to be automated rather than the time-consuming, intuition based practice currently implemented. This project describes research investigating the application of data mining tools, including statistical clustering techniques, to aid in the development of traffic signal timing plans. Specifically, a case study was conducted to illustrate that the use of hierarchical cluster analysis can be used to automatically identify temporal interval break points, based on the data, that support the design of a time-of-day (TOD) signal control system. The cluster analysis approach was able to utilize a high-resolution system state definition that takes full advantage of the extensive set of sensors deployed in a traffic signal system. Timing plans were developed based on the clustering results, providing enhanced TOD intervals and peak volumes, which were then tested through simulation and internal cluster validation, which proved that the use of data mining tools for plan development is beneficial. The results of this research indicate that advanced data mining techniques hold high potential to provide automated techniques to assist traffic engineers in signal control system design, development and operations, the entire process of plan development that is currently practiced based on hand-counted volumes and single intersection TOD intervals.
Disclaimer................................................................................................................ ii
Abstract........................................................................................................ iii
Table of Contents............................................................................................................ v
LIST OF FIGURES.................................................................................................... viii
LIST OF TABLES....................................................................................................................... x
Chapter 1. INTRODUCTION.......................................................................................... 1
1.1 Traffic Signal Systems and ITS.................................................................................................... 1
1.2 Data Mining Tools................................................................................................................................ 2
1.3 Existing Plan Procedures................................................................................................................. 3
1.4 The Need for Improved Control.................................................................................................... 4
1.5 Forms of Advanced Signal Control.......................................................................................... 7
1.6 Data and Data Collection............................................................................................................. 9
1.7 Data Screening Tests........................................................................................................................ 10
1.8 Project Scope.......................................................................................................................................... 14
1.9 Project Statement............................................................................................................................. 16
Chapter 2. BACKGROUND.................................................................................... 18
2.1 Signal Timing Plans............................................................................................................................ 18
2.2 Phase Movements.............................................................................................................. 19
2.3 Local Detection Control............................................................................................................... 20
2.4 TOD Plan Methodology and Issues........................................................................................... 21
2.5 Proposed State Definition............................................................................................................. 24
2.6 RELATED RESEARCH................................................................................................................................ 26
2.6.1 Data Mining as an Emerging Field............................................................................................... 26
2.6.2 Cluster Analysis Applications......................................................................................................... 29
Chapter 3. PROBLEM FORMULATION........................................................................................................... 32
3.1 Cluster Tools and Algorithms.................................................................................................. 32
3.2 Introduction of Research Case Studies............................................................................... 34
3.3 Hierarchical Clustering................................................................................................................ 36
3.4 Cluster Methodologies................................................................................................................... 36
3.5 Suggested Cluster Methodology.............................................................................................. 38
3.6 Interpreting “Bad” Clusters......................................................................................................... 45
3.7 Euclidean Dissimilarity Measure............................................................................................. 46
3.8 Determination of the Optimal Number of Clusters..................................................... 48
3.8.1 Cubic Clustering Criterion (CCC)................................................................................................. 51
3.8.2 Pseudo F and t2 Statistics................................................................................................................. 53
3.8.3 Recent Cluster Stopping Rule Studies........................................................................................... 54
3.9 Cluster Analysis Input Data....................................................................................................... 55
3.10 Cluster Validation............................................................................................................................ 60
3.10.1 Internal Cluster Validation............................................................................................................. 61
3.10.2 Secondary Cluster Validation – CART......................................................................................... 77
3.10.3 External Cluster Validation – Simulation................................................................................... 80
3.11 Timing Plan Development and Simulation (Synchro/SimTraffic)........................ 82
3.11.1 SimTraffic Outputs & Measures of Effectiveness........................................................................ 84
3.12 Chapter Summary............................................................................................................................... 94
Chapter 4. Proposed Procedure............................................................................................................... 95
4.1 Tools............................................................................................................................................................ 95
4.2 Proposed Procedure Flow Chart............................................................................................... 96
4.3 Data Collection.................................................................................................................................. 97
4.4 SAS Procedure for Cluster Analysis...................................................................................... 98
4.5 Determination of TOD Intervals............................................................................................. 100
4.6 Synchro Timing Plan Development......................................................................................... 101
4.7 Validation of Timing Plans with SimTraffic.................................................................. 103
4.7.1 Preparing 15-minute data tables for simulation...................................................................... 103
4.7.2 Preparing SimTraffic Parameters................................................................................................ 104
4.8 Development of Classification Rule using CART (Future Research)................. 105
4.9 Chapter Summary............................................................................................................................. 105
Chapter 5. RESULTS AND ANALYSIS........................................................................................................... 107
5.1 Introduction........................................................................................................................................ 107
5.2 Sensitivity Analyses – Cluster Input Variables............................................................ 108
5.2.1 Standardized Input Variable Cluster Analyses......................................................................... 109
5.2.2 Un-Standardized Input Variable Cluster Analyses.................................................................. 114
5.2.3 Weighted Cluster Input Variables............................................................................................... 118
5.3 Sensitivity Analyses – Minimum Number of Observations Per Cluster.......... 120
5.4 Sensitivity Analyses – Number of Clusters...................................................................... 125
5.5 Single Intersection – Baron Cameron & Reston Parkway Case Study........... 129
5.6 Three-Intersection Corridor Case Study Results........................................................ 132
5.6.1 Three-Intersection Case Study Assumptions............................................................................. 137
5.6.2 Evaluation of Simulations............................................................................................................. 138
5.6.3 Number of Simulation Runs.......................................................................................................... 139
5.6.4 Improvements with New Plans...................................................................................................... 140
5.6.5 Improvements with New Time-of-Day Intervals........................................................................ 142
5.6.6 Time Periods where New TOD Intervals show Significant Improvements.......................... 145
5.6.7 Volumes from Old Timing Plans vs. New Timing Plans.......................................................... 147
5.6.8 Gains of New Plan versus Current Plans.................................................................................... 151
5.6.9 Putting It All Together................................................................................................................... 152
5.6.10 Plan Performance Over 24-Hour Period.................................................................................... 154
5.6.11 Emissions of Timing Plans............................................................................................................ 155
5.6.12 Average Emissions for an "Average" Passenger Car............................................................. 158
5.6.13 Three-Intersection Corridor Conclusions.................................................................................. 159
Chapter 6. Conclusions: Evaluation & Applicability............................................................ 162
6.1 Research Contributions............................................................................................................... 162
6.2 Usability of Procedure.................................................................................................................. 164
6.3 Simulation as Realistic Representation........................................................................... 166
6.4 Future Research................................................................................................................................. 167
6.4.1 Cluster Methodology Analysis..................................................................................................... 168
6.4.2 Transition Effects on Corridor Performance............................................................................ 168
6.4.3 Reduced State Space....................................................................................................................... 169
6.4.4 Historical Data Period................................................................................................................... 170
6.4.5 Weighting of Cluster Input Variables......................................................................................... 170
6.4.6 Simulation Tool............................................................................................................................... 171
6.4.7 Simulation Outputs (MOP’s)......................................................................................................... 172
6.4.8 Verification of Detector Data with the SmartTravelVan........................................................ 172
6.4.9 Classification as a tool for plan maintenance.......................................................................... 173
6.4.10 Investigation of replication criteria............................................................................................ 174
6.4.11 Hand-pick increased number of TOD Intervals........................................................................ 174
6.5 Research Discoveries....................................................................................................................... 174
References............................................................................................................ 177
Appendix A – 3-Intersection Corridor CPCC Matrices for 7 Clusters......................... 179
Cluster 1............................................................................................................................................................... 179
Cluster 2............................................................................................................................................................... 181
Cluster 3............................................................................................................................................................... 183
Cluster 4............................................................................................................................................................... 185
Cluster 5............................................................................................................................................................... 187
Cluster 6............................................................................................................................................................... 189
Cluster 7............................................................................................................................................................... 191
Figure 1. TOD Interval Identification......................................................................................................... 4
Figure 2. Existing TOD Intervals vs. Cluster Intervals.................................................................... 6
Figure 3. Verification of Feasible Volumes test 1............................................................................. 12
Figure 4. Verification for Feasible Volumes test 2........................................................................... 12
Figure 5. Verification for Feasible Volumes test 3........................................................................... 13
Figure 6. Verification for Feasible Volumes test 4........................................................................... 13
Figure 7. Reston Corridor Layout............................................................................................................... 15
Figure 8. Phase Diagram.................................................................................................. 20
Figure 9. NB Volume vs. TOD at one intersection............................................................................... 23
Figure 10. SB Volume vs. TOD at one intersection............................................................................... 23
Figure 11. Reston - Baron Cameron Intersection Layout............................................................. 35
Figure 12. Reston - Sunset Hills, Bluemont, New Dominion Intersections Layout........ 35
Figure 13. Cluster vs. TOD Results for K-nearest Neighbors Method.................................... 40
Figure 14. Cluster vs. TOD Results for Single Linkage Method.................................................. 41
Figure 15. Cluster vs. TOD results for Centroid Method.............................................................. 42
Figure 16. Cluster vs. TOD Results for Ward's Method................................................................... 43
Figure 17. Centroid Cluster Centroids and Standard Deviations......................................... 44
Figure 18. Observation dissimilarity demonstration................................................................... 48
Figure 19. TOD Intervals with Large Data Set..................................................................................... 56
Figure 20. Volume Distribution with Normal Curve at 7:15......................................................... 57
Figure 21. Volume Distribution Compared to Normal Distribution at 7:15....................... 58
Figure 22. NB Volume vs. TOD Plot with Confidence Intervals................................................... 60
Figure 23. Natural Raw Grouping Tendencies...................................................................................... 64
Figure 24. TOD Intervals for Full, 3-Intersection Data Set......................................................... 66
Figure 25. TOD Intervals for subset of 3-Intersection Data Set.............................................. 67
Figure 26. Distance between Clusters for 3-Intersection Data Set...................................... 69
Figure 27. Cluster Isolation and Compactness with Distance Measures......................... 70
Figure 28. Volume Means for 3-Intersection Clusters.................................................................... 72
Figure 29. Occupancy Means for 3-Intersection Clusters............................................................ 72
Figure 30. Cluster Mean Volumes vs. Occupancies for 3-Intersection Case...................... 73
Figure 31. Variable Distribution within Clusters............................................................................. 74
Figure 32. 2-D Projection of Clustered Variables............................................................................... 75
Figure 33. SimTraffic Outputs for 3-Intersection Case Study................................................... 81
Figure 34. SimTraffic Performance Report............................................................................................ 86
Figure 35. SimTraffic Queuing Report........................................................................................................ 90
Figure 36. SimTraffic Actuated Signals, Observed Splits Report............................................ 92
Figure 37. Proposed Procedure Flow Chart........................................................................................... 96
Figure 38. Clustering with Standardized Volume & Occupancy........................................... 110
Figure 39. Cluster Analysis with Standardized Volumes......................................................... 111
Figure 40. Clustering with Standardized Volume & Occupancy < 26................................... 112
Figure 41. Cluster Centroids and Standard Deviations............................................................. 114
Figure 42. Cluster with Un-Standardized Volumes and Occupancies............................... 115
Figure 43. Cluster with Un-Standardized Volumes....................................................................... 116
Figure 44. Cluster with Un-Standardized Volume and Occupancy < 26............................. 117
Figure 45. TOD Intervals with Standardized and Weighted Volumes and Occupancies 119
Figure 46. TOD Intervals with Minimum of 4 Observations Per Cluster............................ 121
Figure 47. TOD Intervals with a Minimum of 2 Observations Per Cluster........................ 122
Figure 48. TOD Intervals from Unconstrained Number of Observations Per Cluster 124
Figure 49. TOD Intervals from Minimum of 4 Observations Per Cluster........................... 125
Figure 50. Optimal Number of Clusters (7 Clusters)....................................................................... 127
Figure 51. Optimal Number of Clusters (6 Clusters)....................................................................... 128
Figure 52. Optimal Number of Clusters (5 clusters)....................................................................... 129
Figure 53. TOD Intervals at Baron Cameron & Reston................................................................. 130
Figure 54. Simulation Outputs from Single Intersection........................................................... 132
Figure 55. TOD Intervals for 3-Intersection Corridor.................................................................. 133
Figure 56. SimTraffic Outputs for Three Intersections............................................................... 135
Figure 57. MOP Gains of New Plan over Old Plans for Old and New TOD's........................ 141
Figure 58. Percent Gains of New TOD's over Old TOD's for Old Plans & New Plans...... 143
Figure 59. Periods of Significant Gains from New TOD's versus Old TOD's......................... 146
Figure 60. Timing Plan Volumes versus Actual Volumes at Sunset Hills........................ 149
Figure 61. Timing Plan Volumes versus Actual volumes at Bluemont............................. 149
Figure 62. Timing Plan Volumes versus Actual Volumes at Bluemont............................. 150
Figure 63. Percent Gains for New Plans over Original Plans................................................... 152
Figure 64. MOP's at 3-Intersection Corridor based on Per Vehicle Per Year.................... 153
Figure 65. Yearly Gains of New Plans over Original Plans........................................................ 153
Figure 66. Delay/Vehicle over 24-Hour Period at 3-Intersections........................................... 155
Figure 67. Emissions for 3-Intersection Corridor Plans.............................................................. 157
Figure 68. Emissions Saved for 3-Intersections Corridor over Current Plan................ 158
Figure 69. Emissions (g/mile/veh) for EPA vs. Plan Averages....................................................... 159
Table 1. Descriptive Statistics for 7:15 Volume Distribution..................................................... 58
Table 2. Graph Symbol Representations for TOD’s from Figure 23......................................... 65
Table 3. TOD Classifications for 3-Intersection Corridor.......................................................... 73
Table 4. CART - Cluster Validation at Baron Cameron................................................................ 79
Table 5. Cluster Validation at Three-Intersections..................................................................... 80
Table 6. SAS Cluster Procedure (Code Example)................................................................................. 99
Table 7. SAS Tree Procedure Cluster Code........................................................................................... 100
Table 8. SAS Mean Procedure Cluster Code......................................................................................... 100
Table 9. TOD Classification for Volume & Occupancy Cluster.............................................. 110
Table 10. TOD Classification for Volume Cluster.......................................................................... 111
Table 11 . TOD classification for V, O < 26 Cluster........................................................................... 113
Table 12. TOD Classifications for Un-Standardized Vol & Occ Clusters........................ 115
Table 13. TOD Classification for Un-Standardized Volume Clusters............................... 116
Table 14. TOD Classification for Un-Standardized Volume and Occupancy < 26....... 118
Table 15. SAS Stopping Rule Outputs........................................................................................................ 126
Table 16. TOD Interval Classification for Baron Cameron..................................................... 130
Table 17. TOD Interval Classifications for 3-Intersection Corridor................................ 133
Table 18. F-tests across 4 scenarios........................................................................................................ 137
Table 19. t-test Results for Number of Simulation Runs.......................................................... 139
Table 20. t-test Results for New Plans vs. Old Plans Evaluated at Old TOD Intervals 142
Table 21. t-test Results for New Plans vs. Old Plans Evaluated at New TOD Intervals 142
Table 22. t-test Results for New TOD vs. Old TOD Intervals Evaluated by Old Plans 144
Table 23. t-test Results for New TOD vs. Old TOD Intervals Evaluated by New Plans 145
Table 24. t-test Results for Old TOD & Old Plan vs. New TOD & New Plan........................ 145
Table 25. PM - Post PM, t-test Results for New vs. Old TOD Intervals................................. 147
Table 26. Post PM - Evening, t-test Results for New vs. Old TOD Intervals...................... 147
Table 27. Evening - Pre-Off - Off, t-test Results for New vs. Old TOD Intervals............ 147
Table 28. EPA Emissions for an "Average" Passenger Car vs. Plan Emissions................. 159
It has been argued that traffic signal systems represent the first widespread deployment of intelligent transportation systems (ITS). Modern signal control systems are highly complex, relying on sensors, advanced communications networks, and sophisticated firmware and software. Advanced forms of signal control, such as second and third generation control, are dependant on the sensor data supplied by ITS. However, basic forms of control such as time-of-day (TOD) do not rely on the sensor data for operation. These basic forms of control are in fact the most widely used methods of traffic signal control in this country due to limited funding for the Department of Transportation and the difficulty in maintaining the sensors for support of advanced control. These signal control systems are collecting enormous quantities of traffic flow data in an attempt to provide information for the support and improvement of signal timing operations. Due to limited storage resources, the lack of available analysis tools, and the fact that the sensor data is not necessary for the support of TOD signal control, the vast majority of signal control systems in the United States do not archive detector data for an appreciable period of time. This is unfortunate, especially since it is plausible to utilize the sensor data not only for advanced forms of control, but also for the most common method of signal control in this country, TOD. Thus, there is a need to use analysis tools that demonstrate the value of this data, and justify the design of systems with increased storage capabilities.
Tools used to analyze and extract information from large sets of data are generally classified as “data mining” tools. This project describes research that is devising a procedure for developing, implementing and monitoring traffic signal timing plans using available data mining tools. The hypothesis premise of the research is that the data collected by signal control systems can be used to improve system design and operations for the current methods of traffic control. The data-mining tool that serves as the foundation for the proposed procedure for signal plan developments is hierarchical cluster analysis. It will also be recommended that a second data-mining tool, classification, be used for monitoring plan effectiveness, however this project will not explore the use of classification in the maintenance of timing plans in depth. This project offers a background on signal timing plan development, with consideration of system state definitions, and detailing a proposed procedure for improved traffic control through the use of hierarchical cluster analysis with a case study at a corridor in northern Virginia. This case study shows that the sensor data provided by ITS holds valuable information regarding the behavior of traffic, capable of automatically generating TOD intervals for transitioning between timing plans as well as providing appropriate volume data for plan development during these automatically generated TOD intervals. The proposed procedure introduced in this project allows for automation of the entire signal timing plan process, which will save time for traffic engineers and improve travel conditions for commuters.
There exist a number of optimization tools to assist traffic engineers in developing timing plans for a particular set of operating conditions. However, few tools exist to help the engineer determine appropriate TOD intervals, or to monitor an existing TOD system to ascertain if the conditions have changed sufficiently to require a new set of plans and/or TOD intervals. Certainly, no tools exist to accomplish these tasks automatically. The premise of this research is that using statistical clustering and classification analyses in a data mining application has high potential to address these needs and allow for automated procedures, while utilizing the information stored in the data for optimal signal development and maintenance.
Clearly, the current practice of using single day, hand counted volumes to define the state for time-of-day (TOD) plan development may be inadequate. Given that considerably more information is available to use in defining the state of the system in electronic form, this research uses a more complete state definition based on a more refined form of data available from the system detectors to identify TOD intervals and develop more appropriate timing plans.
The typical approach used to identify intervals for TOD systems is to plot aggregate traffic volumes over the course of a day, and then use judgment in the identification of significant changes in traffic volume at the critical intersection that indicate a need for a different timing plan. It is important to note that the volumes used to identify TOD intervals are bi-directional aggregate volume values from the critical intersection. An example of this approach is illustrated in Figure 1, which depicts a daily aggregate volume plot at an intersection in Northern Virginia based on historical data. The vertical lines in the graph show the times that the traffic engineers chose to transition between plans, the TOD intervals. These intervals rely heavily on the traffic conditions that exist at the critical intersection. The critical intersection is the signalized intersection in the corridor servicing the largest traffic demand. Along this particular corridor, there exists an AM-peak plan that operates from 06:00 – 08:30, a mid-day plan that operates from 08:30 – 15:00, a PM-peak plan that operates from 15:00 – 19:00, and an off-peak plan for the remainder of the day.

Figure 1. TOD Interval Identification
While this approach is intuitive, there are a number of areas of concern. First, the aggregation of only volume from traffic sensors (that typically measure volume, speed, and occupancy) in different directions (and, often, even lanes), to one aggregate volume measurement results in the loss of considerable information regarding the characteristics of the traffic conditions. In addition, as timing plans are developed for corridors, as opposed to single intersections, this loss of data resolution becomes more apparent. Finally, the visual selection of TOD intervals may be quite difficult for inexperienced engineers, who ultimately spend much time developing and tweaking the plans and TOD intervals. These problems illustrate the need for automated data mining tools that take advantage of the large quantities of data collected by ITS. The use of cluster analysis addresses these issues and uses a more robust state definition based on historical data at all intersections and at all movements in the development of plans and TOD intervals. Figure 2 depicts the TOD intervals developed by a clustering procedure versus those developed manually as described above. It will become clear that the clustering TOD intervals are more robust based on the detector data rather than the one-day hand counts practiced currently. Sensitive traffic trends are detected by the clustering algorithm that occur over a 24-hour period and these TOD intervals developed via clustering will be investigated in detail in Chapter 5.
![]() |

Figure 2. Existing TOD Intervals vs. Cluster Intervals
It is also becoming increasingly vital to provide efficient and up-to-date signal control due to the decreasing availability of land for road expansion (6). With the explosion of population, industry and “suburbia,” traffic conditions are becoming increasingly congested in many spreading areas and with this growth comes the need for more roads. However, the land is becoming less available and the zoning laws stricter, making it extremely difficult to build new roads. The cost of building new roads is also extremely expensive and traffic engineers are relied upon heavily to provide efficient forms of traffic control to deal with growing traffic problems where new roads may be highly needed but nearly impossible to construct. Thus, the TOD signal procedures that have not changed much over the past decades need to be improved, with a more reliable means of developing meaningful plans and monitoring those plans automatically.
Intelligent Transportation Systems (ITS) tend to research areas of advanced signal control such as second and third generation control, fully adaptive traffic signal systems and even the smart highways. However, realistically in the United States, these systems are not ready for implementation and so less advanced systems are employed, such as time-of-day (TOD), due to factors like lack of funding to the transportation infrastructure (5). This research looks at improving and refining the current means of traffic signal plan procedures (TOD), which tend to be overlooked as areas of research. Existing literature focuses on the improvements that can be made by implementing advanced control methods with the resources provided by ITS. Until the more advanced methods become feasible in this country, it appears that little interest is taken in utilizing the ITS resources for less advanced signal development practices. This notion is reflected by the fact that no existing literature was discovered on the use of ITS data for enhancing TOD methods of signal development and implementation. The Transportation Research Circular (6) discusses research initiatives for advanced technology in traffic signal control systems because of the need for improvement in this area. These needs are due to the steady increase in traffic congestion, in some areas reaching crisis proportions, and the decreasing availability of land for the use of highway and road expansion (6).
There are four categories of traffic signal control:
· First Generation
· 1.5 Generation
· Second Generation
· Third Generation
TOD signal control falls into the first generation category, which consists of selecting a timing plan from a library of stored plans, which have been developed off-line using a tool such as Synchro (5). 1.5 Generation is identical to first generation except that it has the automated ability to add plans to the library. Second and third generation control are advanced forms of control that implement traffic signal plans in real time based on existing traffic conditions. Third generation differs from second generation in that the cycle lengths and splits have the capability of variability, whereas second generation has fixed cycle lengths and splits (5). The U.S. is one of the few advanced countries in the world where adaptive control is not installed. This is due to the increased cost of surveillance for monitoring and maintaining the large number of detectors necessary for supporting the use of this type of control. Adaptive control reduces the need for timing plan updates and it handles incidents, holiday and special events more efficiently (5). These advanced forms of control are capable of using information about downstream traffic to update plan parameters at the upstream signals.
Minneapolis is one of the few cities in this country that is testing a second-generation adaptive control signal system. The project is described in the paper, Addition of Adaptive Control to the Minneapolis Signal System: Issues and Solutions (8). This project’s aim is to serve as a representative model for medium-sized North American centrally controlled systems, which assesses costs, problems and potential gains from the addition of such a system. This project recognizes the fact that extensive detection inputs, beyond those installed for existing signal methods are needed to support advanced forms of control. It is also addressing the many other issues to consider with advanced control. These include determining the operational status of the system, how to verify system requirements of the new system are being met, what sort of considerations must be met when adding a new system to an operating system, and many other issues involved with such an advancement (8). There are many challenges to be met before advanced forms of control are fully understood and supported affordably in this country.
Data is collected at signalized intersections in Northern Virginia by single inductive loop detectors. These metallic detectors are embedded in the roadway and produce a magnetic field. The metal of a car passing over the detector interferes with the magnetic field, thus permitting the detection of the vehicle by the detector. The single inductive loop detectors, referred to as system detectors, are recommended in the Traffic Control Systems Handbook to be placed 61 – 76 meters upstream from an intersection’s stop-bar at a minimum (1). The northern Virginia system detectors are typically placed at approximately 100 meters upstream from intersection stop-bars. The placement of system detectors is a key consideration because lane discipline deteriorates in the vicinity of the intersection, especially during periods of spill back, and lane-changing maneuvers from upstream can produce significant errors in volume and occupancy readings. A system detector should never be placed where standing queues from the downstream intersection typically extend. Yet the detector should be placed close enough to the intersection to distinguish between vehicles that are using turning lanes rather than the through movement lanes. Single loop refers to the fact that only one detector is placed upstream from the intersection removing the capability of directly measuring speeds. Volumes and occupancies are directly measured and the speed is an internal calculation formulated based on estimates of vehicle lengths and detector lengths. Thus, speed was not used in this research.
Volume is defined as the number of vehicles that pass over the system detector in a given time period. It is simply a count of the cars that is generally expressed in vehicles per hour (VPH) or in the case of the Northern Virginia research in vehicles per 15-minutes (VP15m). According to the Highway Capacity Manual, a typical roadway capacity for one hour is 1900 vehicles. Occupancy is defined as the percent of time a vehicle occupies a detector. Occupancies are reported as a percentage. Once occupancies reach 25%, the roadway can typically be classified as saturated. Saturation occurs when the volume to capacity ratio (V/C) is near to or greater than one. Occupancies greater than 25% lose meaning as they can fluctuate between 25% and 100% for varying values of volume, with no particular correlation other than the volumes are typically at least greater than 600 VPH at this point.
The data extracted from the database is cleansed prior to its use. Much “bad” data is returned from the detectors. There are many possible reasons for this, such as damaged or dead detectors. Cleansing the data allows the user to look only at reasonable data, thus removing many outliers and observations that will skew the clustering results. Screening rules were determined based on typical data relationships in the database. For this research, the screening rules were all used. The screening rules are as follows to remove bad data:
Non-zero test:
· Volume AND Occupancy AND Speed ¹ 0
Prescreening test:
· Volume AND Occupancy AND Speed >= 0
· Volume < 3100 AND Occupancy < 100
· Volume >= Occupancy
Feasible Volumes:
· IF Occupancy = 0 OR 1 THEN Volume < 580
· IF 1 < Occupancy <= 15 THEN 1 < Volume < 1400
· IF 15 < Occupancy < 25 THEN 180 < Volume < 2000
· IF Occupancy >= 25 THEN Volume > 500
The methods the screening tests use to scrub data can be grouped into two categories, threshold value tests and traffic flow theory tests. The ‘Non-zero Test’ uses a threshold value test, the ‘Prescreening Test’ uses both threshold value and traffic flow theory tests, and the ‘Feasible Volumes’ test uses only traffic flow theory tests (18). Threshold value tests limit data to within physically reasonable values based on characteristics of volume and occupancy. Traffic flow theory tests restrict data to feasible combinations of volume for a given occupancy.
All rules were established by examining data from 5 arbitrary intersections in Northern Virginia for periods of up to one month. The screening tests were then applied to various intersections to test the procedures. Values of volume for the hourly intervals are given in vehicles per hour (VPH), the unit of measurement used in the database. Occupancies are given as a percentage of time vehicles are located over a detector. Speed is not used in the traffic theory rules because it is derived from volume and occupancy, an assumed vehicle length and an assumed detector length, producing inaccurate data. Figure 3, Figure 4, Figure 5 and Figure 6 show how the “Feasible Volumes” tests were derived from the data. Based on typical volume and occupancy relationships derived from the 5 intersections investigated as shown in the graphs below, the numerical values for the threshold value, data screening tests were developed.

Figure 3. Verification of Feasible Volumes test 1

Figure 4. Verification for Feasible Volumes test 2
This project research will introduce the use of data mining tools for timing plan development based on system detector data. The proposed procedure will be conducted on a subset of a single corridor in the Northern Virginia arterial network. The corridor studied will be a piece of the Reston Corridor, consisting of 3 coordinated intersections and 15 system detectors. Figure 7, below shows the majority of the Reston corridor layout taken from a Synchro file, from which the subset corridor is taken for the case study. The timing plan development scheme will be based only on Monday through Friday for the entire 24-hour period. The system detector data will be acquired from system detectors, or single inductive loops, located in select lanes throughout the corridor. Volumes and occupancies collected from these system detectors and archived in an Oracle Database in the Smart Travel Laboratory at the University of Virginia will be used to conduct this research. The Virginia Department of Transportation (VDOT) supplies the data to the Smart Travel Laboratory (STL) at the University of Virginia, which is aggregated to 15-minute observations.



Figure 7. Reston Corridor Layout
This project will contribute to the intelligent transportation systems (ITS) field by utilizing real-time detector data through the use of data mining tools to aid in the development of signal timing plans and fixed time-of-day (TOD) intervals for traffic signal plan implementation. The thesis is that data mining techniques, not traditionally used for timing plan development in transportation, such as clustering, can be used to improve the development of signal timing plans and fixed time-of-day (TOD) intervals for traffic signal plan implementations. The main objective of this project is to propose a procedure for utilizing detector data for improved plan development by detailing the following tasks:
· Use of data mining tools (cluster analysis) to extract information from a large database,
· Improve timing plans through use of data extracted from database versus the current method of one-day volume counts,
· Improve TOD intervals using cluster analysis on detector data with refined and expanded state definition, and
· Test clusters and plan performance through simulation and internal cluster validation.
Chapter 2 will provide background information on the signal timing plans and methods of current traffic control, while detailing current methods of timing plan development. Chapter 2 will also discuss related areas of research to the topics explored in this project. Chapter 3 will detail the problem formulation for each phase of the research, including the selection of a clustering method, validation of the clusters developed, timing plan development in Synchro and simulation with SimTraffic for plan evaluation. The proposed procedure will be outlined in detail in Chapter 4, fully describing the tools used for this research and providing guidance in following and enhancing the procedure. Chapter 4 will provide the major deliverable of this project, the proposed procedure with guidelines for following the procedure and automating the procedure. Discussion of the results of the analysis based on a single corridor case study and a brief analysis at a single intersection in Northern Virginia will be introduced in Chapter 5. Evaluation of the proposed procedure and the applicability of this research are discussed in Chapter 6, with emphasis on the future research needs for a more robust procedure.
The operation of a coordinated signal control system on an arterial corridor, or a series of signalized intersections operating under a common traffic signal plan, requires a timing plan for each signal in the corridor. A corridor-timing plan consists of four main elements: cycle length, splits, offsets and phase sequences (1). The cycle length is the time required for one complete sequence of signal phases to rotate through the green time. The split refers to the percentage of a cycle length allocated to each of the various phases at an intersection in a signal cycle, where phase refers to the portion of a cycle allocated to any single combination of traffic movements simultaneously receiving the right-of-way (1). Finally, the offset is the component of the signal-timing plan that coordinates a series of signalized intersections in a corridor or network. The offset is the time difference (in seconds or in percent of the cycle length) between the start of the green indication at one signal as related to the start of the green indication at the corresponding downstream signal (1).
Aside from the three main elements of a coordinated traffic signal as discussed above, there are many other components that must be taken into account in the development of timing signals. These components are as follows:
· Traffic Volume per lane movement
· Turn type (Protected or Permitted)
· Minimum Initial
· Minimum Split
· Maximum Split
· Total Split
· Yellow Time
· All-Red Time
· Lead/Lag
· Allow Lead/Lag Optimize?
· Vehicle Extension
· Minimum Gap
· Pedestrian Phase
· Walk Time
· Bus Blockages (#/hr)
· Heavy Vehicles (%)
· Growth Factor
· Peak Hour Factor
· Ideal Saturated Flow
· Lane Width
· Grade (%)
· Area Type
· Storage Length (ft)
· Storage Lanes (#)
· Right Turn on Red?
Clearly, the development of traffic signals is highly complex, which is why tools like Synchro are used (16). For this research, the Synchro files developed by VDOT were used so that the all the timing plan components listed above were already archived. The only alterations made to the Synchro files for this research are the volumes for the modified TOD intervals for which the timing plans are servicing. With the alteration of the volumes, Synchro optimizes the cycle length, split and offset to best suit the timing plan inputs.
Opposing movements at an intersection are defined by phases. Phases are numerical values (1,2,3,…8) assigned to through/right-hand-turn movements and left-hand-turn movements. Even phase numbers are always assigned to through/right-hand-turn movements and odd phase numbers are assigned to left-hand-turning movements. Figure 8 shows a sample intersection with phase assignments.

Phase numbers are always grouped together as shown in Figure 8. The only dynamic element of the phase diagram is whether phases 2/5 and 1/6 lie on the east-west direction or the north-south direction. This element is dependent on the direction of the main throughway. Phases 2 and 6 always correspond with the main throughway. If the main throughway lies in the east-west direction, then the phase diagram is as shown in Figure 8. If the main throughway lies in the north-south direction, then the phase diagram in Figure 8 would have to be rotated 90 degrees to the right.
Traffic signals can be actuated, semi-actuated or pre-timed. Actuated signals are driven by the traffic conditions sensed by the local detectors. Local detectors do not collect volume, occupancy and speed data as do the system detectors. They are solely for the purpose of traffic signal actuation. Fully actuated control is extremely difficult to implement because of the difficulty and expense involved with maintaining enough detectors to support the control. Second and third generation control run with fully actuated corridors. Vehicles trigger the detector to change the green split to that phase movement. Semi-actuation is the form of signal control used in the Northern Virginia arterial system. In semi-actuation, the main throughway is always given its preset green split time, even if no vehicles are detected at the intersection. However, the side streets will only maintain a minimum green time allotted to that phase split if vehicles are not detected. The remaining green time that would make up the full side street split is given back to the main throughway. If vehicles are detected on the side streets, then the maximum green time is given. Non-actuated control would exist at an intersection with no local detectors and the signal would operate under fully pre-timed signal parameters.
The most widely used method for timing plan selection and implementation is time-of-day, or TOD, where a pre-set plan is automatically used for a particular time interval (1). TOD requires traffic engineers to develop signal-timing plans that are affective for particular time intervals in a day. For example, an AM-peak plan that favors work-bound commuter traffic might be used from 06:00 – 09:30. The AM-peak plan would typically be developed using timing optimization tools such as Synchro, based on a single volume count from the critical intersection. The volume count used for timing plan development in Synchro is taken from the traffic engineers’ hand-counts of cars during assumed peak traffic time for the TOD interval. This single-day count is used for developing a timing plan for the entire corridor. Therefore, one will note that the challenge in designing a TOD system lies in identifying the appropriate time intervals for plans, and then developing effective corridor plans to operate within each interval. Another challenge faced by traffic engineers is monitoring the performance of timing plans over time and retaining up-to-date timing plans. Because of the time and effort that goes into the current method of TOD plan development, the plans are generally left in place for many years, with no automated form of performance feedback. The use of electronic data and data mining tools would make automated timing plan development and maintenance readily available. Another issue that must be overlooked with the current means of TOD traffic control is that variance in traffic conditions can not be accounted for and variance over time may go overlooked until conditions become severe. Figure 9 and Figure 10 show a volume vs. time plot in the northbound and southbound direction for an intersection in the Reston corridor. Volume data from March 8, 2000 until September 29, 2000 were plotted. Traffic trends remain similar over such a short period of time, but there are erroneous days where variant traffic conditions get serviced by timing plans constructed for “normal” conditions. With automated maintenance tools, that will be made possible with the use of data mining tools, erroneous days and changing trends over time can be detected and archived. This will allow for the development of theories and rules based on traffic variance-time/event trends, thus preparing for changes in the future before they occur. The TOD itnervals may also change over time, where data mining tools would allow for detection of slight variations in transition times.

Figure 9. NB Volume vs. TOD at one intersection

Figure 10. SB Volume vs. TOD at one intersection
Time-of-day (TOD) signal control is an example of a form of system control known as state-based control. A “state” is an abstract representation of the condition of that system at some point in time. The defined state serves as a sufficient statistic for the condition of the system, i.e., it contains all possible information regarding current status, propensity to change and information necessary to evaluate the defined indices of performance for the system (2). The concept of state-based control is to use a set of established rules or policies to guide the selection of a control strategy for a system as the system transitions from one state to another.
Clearly, the current practice of using aggregate volumes to define state, as described in the previous section, may be inadequate. Given that considerably more information is available to use in defining the state of the system, this research uses a more complete state definition based on a refined form of data available from the system detectors to identify TOD intervals.
By considering the data collected by the system detectors in as high a resolution as possible, one can expect to better capture the nuances of the system’s dynamic behavior. Therefore, the state definition used for this case study is a vector of volume and occupancy measures for each directional phase movement at each intersection in the corridor. The directional phase movements are identified by their corresponding phase numbers, which are denoted in Figure 8. In addition, to account for the difference in scale between volume and occupancy measures, the values were standardized using a Z-score, (Z), which represents a dispersion or spread from the mean that each value lies and is defined in the following equation.
Z = X – M / s
Chapter 5 will investigate alternate input cluster variables in the ‘Sensitivity Studies’ section in addition toequally weighted, standardized variables. Since volume and occupancy represent different traffic states, where occupancy values lie on a percent scale of 0 – 100 and volume values lie on a numeric scale of 0 – 1900+, the standardization process is necessary to transfer these values to a uniform, meaningful scale with no units (13). The possible effects of variable weighting will be discussed in more detail in Chapter 5, where consideration of un-standardized occupancy and volumes are taken into account. For the scope of this research, the detectors and cluster variables were weighted equally, however future considerations should include weighting cluster variables such as detectors and intersections to account for influence and importance of those factors in traffic flow through the corridor. The state definition used is as follows, with each variable number assigned according to its phase number. Not all intersections have system detectors located at every phase, so the state definition may vary from intersection to intersection depending on the availability of system detectors.
X(t) = (V1, O1, V2, O2, V3, O3, V4, O4, V5, O5, V6, O6, V7, O7, V8, O8),
Where X(t) = system state at time t
V1 = standardized phase 1 volume at time t (NBL)
O1 = standardized phase 1 occupancy at time t (NBL)
V2 = standardized phase 2 volume at time t (SB)
O2 = standardized phase 2 occupancy at time t (SB)
V3 = standardized phase 3 volume at time t (EBL)
O3 = standardized phase 3 occupancy at time t (EBL)
V4 = standardized phase 4 volume at time t (WB)
O4 = standardized phase 4 occupancy at time t (WB)
V5 = standardized phase 5, volume at time t (SBL)
O5 = standardized phase 5, occupancy at time t (SBL)
V6 = standardized phase 6 volume at time t (NB)
O6 = standardized phase 6 occupancy at time t (NB)
V7 = standardized phase 7 volume at time t (WBL)
O7 = standardized phase 7 occupancy at time t (WBL)
V8 = standardized phase 8 volume at time t (EB)
O8 = standardized phase 8 occupancy at time t (EB)
Data mining tools are not widely used in transportation systems (7). In fact system detector data collection is a fairly recent advancement with the rise of ITS and has not yet been utilized to its full capacity. Traffic may be viewed as unpredictable and uncontrollable, but with archived data that is now available, it can be shown that traffic is in fact predictable to a degree and control can be improved with the utilization of this data. There are other DOT’s that have looked into advanced forms of control such as traffic responsive and second generation, where the system detector data is necessary to support such control techniques, but it has not been found to be used for TOD signal control (6). Data mining tools are useful for uncovering patterns in data and making classifications and these notions can be highly beneficial in transportation systems. These data mining techniques have been used in many other fields and areas to produce similar results from many types of data sets.
Data mining is utilized in the disciplines of computer science and statistics and is making progress in extracting information from large databases (20). It is an emerging field that promotes the progress of data analysis. Due to the competitive nature of today’s business economy, information technology has been invested in heavily to aid in the management of effective business performance. Over the last three decades, increasingly large amounts of critical business data have been stored electronically and this volume is expected to continue to grow considerably in the future (20). Despite this wealth of data, many companies have been unable to fully capitalize on its value. This is because the information implicit in the data is not easily discernable without the use of data mining tools. Data mining tools allow businesses to leverage their data effectively and obtain insightful information that can give them a competitive edge. It enables them to discover previously undetected facts present in the data.
Data mining tools can provide benefits to any number of potential users. The finance and insurance industries have long recognized these benefits, but these principles can be applied in many areas. For example the retail/marketing sector, the banking sector, the insurance and health care sector, the transportation sector and the list goes on to those who can reap benefits from data mining tools (20). The following list summarizes some of the benefits that each of these sectors can achieve (20).
Retail/Marketing
· Identification of buying behavior patterns from customers
· Finding associations among customer demographic characteristics
· Prediction of customers responsive to mailing
Banking
· Detection of patterns of fraudulent credit card use
· Identification of “loyal” customers
· Prediction of customers that are likely to change credit card affiliation
· Determination of credit card spending by customer groups
· Finding hidden correlations between different financial indicators
· Identification of stock trading rules from historical data market
Insurance/Health Care
· Claims analysis – determination of which medical procedures are claimed together
· Prediction of which customers will buy new policies
· Identification of behavior patterns of risky customers
· Identification of fraudulent behavior
Transportation
· Determination of distribution schedules among outlets
· Analysis of loading patterns
· Identification of seasonal and time-of-day traffic trends
·