You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: networks.md
+21-29Lines changed: 21 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ parent: Analyses
17
17
18
18
The goal of this analysis is to understand the differences between trip origin and destination networks between riders who use the different ORCA card reduced fare programs. Understanding the patterns of trip behavior has potential to be used to inform targeted service and stop improvements that could benefit the different demographics that ride transit using the different card types. Specifically, we are interested in the following questions:
19
19
20
-
### 1) How do the trip networks vary between card types?
20
+
####1) How do the trip networks vary between card types?
21
21
Understanding and visualizing trip networks can reveal large-scale patterns of ridership across the different ORCA card types. Identifying similarities and differences between the networks could provide insight into the importance of different stops and connections in the travel patterns of users of each card type. Networks can vary in a multitude of ways, however for this analysis we are focusing on several metrics particularly relevant to transit:
22
22
23
23
A) Degree centrality (based on the number of trip arrivals and departures that each stop has), which will reflect which stops are most frequently used by users of each card type. A higher value indicates that a stop is frequented more by riders that use the same card type. This metric is computed at the stop level, for both origins and destinations independently as well as the total sum for both.
@@ -28,46 +28,40 @@ C) Network density (the proportion of actual origin-destination trips in the net
28
28
29
29
D) Modularity (the strength of division of the network into clusters that see more frequent trips between stops within the module than stops outside of the cluster), which will identify whether there are distinct clusters of stops used by different rider groups. A higher value indicates that there are more distinct clusters in the network. This metric is computed at the whole network level.
30
30
31
-
### 2) Are stops that are central in transit ridership networks shared across all card demographics?
31
+
####2) Are stops that are central in transit ridership networks shared across all card demographics?
32
32
This question will allow us to understand whether there are universally-important stops across card demographics that could be improved to benefit all riders. Conversely, it could reveal stops that are particularly important to certain demographics that would be considered less important when considering all riders, which would provide insight to inform targeted improvements to support those demographics.
33
33
34
-
### 3) Is the structure of these networks reflected by the geographic layout of the transportation network?
34
+
####3) Is the structure of these networks reflected by the geographic layout of the transportation network?
35
35
This question will generate insight to whether the trip networks are geographically structured (i.e. the most frequented stops tend to be in the center of the network geographically). If this is not the case (i.e. the most frequented stops tend to be in the periphery of the network geographically), it will provide evidence for non-geographic drivers of transit patterns that can be further explored in future analyses.
36
36
37
37
38
-
**Data**
38
+
## **Data**
39
39
40
40
For this analysis, we used a subset of ORCA origin-destination trip data from April 2023 in the ORCA next generation database. At the time of analysis, the full updated trip table was not available. This analysis is ready to complete for each month at a later date as the data becomes available.
41
41
42
42
Additionally, we incorporated census block data and USGS National Hydrography Dataset data to create regional spatial hexgrid shapefiles to aggregate stops that are close to each other.
43
43
44
44
Data was filtered by card type into the following groups: adult, youth, lift card (low-income riders), senior, and disability. Each group was analyzed as a separate network for comparison.
45
45
46
+
47
+
## **Methods**
46
48
The following data cleaning steps were taken to prepare the trip table for network analysis:
47
-
1) Duplicated rows were dropped because some trips were duplicated erroneously in the database.
48
-
2) The absolute time difference between boarding and destination was calculated. We used the absolute time difference because some trips erroneously had a destination time that was prior to the origin time.
49
-
3) Trips with duration longer than 3 hours were dropped. This is because some trips had unreasonably long trip times due an issue with the algorithm that determines start and stop location for each trip.
50
-
4) Trip frequency for each unique origin-destination trip was calculated.
51
-
5) Duplicate trips were dropped after trip frequency was calculated.
49
+
1. Duplicated rows were dropped because some trips were duplicated erroneously in the database.
50
+
2. The absolute time difference between boarding and destination was calculated. We used the absolute time difference because some trips erroneously had a destination time that was prior to the origin time.
51
+
3. Trips with duration longer than 3 hours were dropped. This is because some trips had unreasonably long trip times due an issue with the algorithm that determines start and stop location for each trip.
52
+
4. Trip frequency for each unique origin-destination trip was calculated.
53
+
5. Duplicate trips were dropped after trip frequency was calculated.
52
54
53
55
Each of the issues mentioned above in the cleaning steps were reported to the project leads, who maintain the database. These issues will be taken into account and corrected as the project leads prepare to release the most recent iteration of the trips table in the database.
54
56
55
-
56
-
57
-
**Tools**
58
-
59
-
To clean and filter the network data, we used the packages sqlalchemy, pandas, numpy, geopandas, and shapely. To calculate network metrics, we used networkx. For network visualization, we used folium. We also developed an open-source package available in our github repository with custom functions for each analysis, including the cleaning functions to prepare the data for network analysis.
57
+
To clean and filter the network data, we used the packages sqlalchemy, pandas, numpy, geopandas, and shapely. To calculate network metrics, we used networkx. For network visualization, we used folium. We also developed an open-source package available in our github repository with custom functions for each analysis, including the cleaning functions to prepare the data for network analysis.
60
58
59
+
We ran each network analysis separately for each of the card types: adult, youth, senior, disability, and low-income. We imported the origin-destination trips table for April 2023 from the ORCA postgres database and loaded each table as a pandas geodataframe. Data was filtered following the steps outlined above. We then assigned each stop to the centroid of a 1/4 mile hexagonal grid overlaid on the spatial extent of the stop points to aggregate the data and improve visibility in the plots. Then, we calculated trip frequency and filtered out any duplicates as well as origin-destination trip combinations with fewer instances than 20 that month to focus only on the most frequent trips. Next, we used networkx to create networks for each card type with nodes representing origin and destination location and edges representing trip area. We used the networkx object to calculate network metrics. Then, we used folium to create interactive maps for each card type, excluding the downtown Seattle area to reduce overplotting of the high density-high frequency downtown stops.
61
60
62
61
63
-
**Processes**
62
+
## **Results**
64
63
65
-
We imported the origin-destination trips table for April 2023 from the ORCA postgres database and loaded each table as a pandas geodataframe. Data was filtered following the steps outlined in the above Data section. We ran each network analysis separately for each of the card types: adult, youth, senior, disability, and low-income. We then assigned each stop to the centroid of a 1/4 mile hexagonal grid overlaid on the spatial extent of the stop points to aggregate the data and improve visibility in the plots. Then, we calculated trip frequency and filtered out any duplicates as well as origin-destination trip combinations with fewer instances than 20 that month to focus only on the most frequent trips. Next, we used networkx to create networks for each card type with nodes representing origin and destination location and edges representing trip area. We used the networkx object to calculate network metrics. Then, we used folium to create interactive maps for each card type, excluding the downtown Seattle area to reduce overplotting of the high density-high frequency downtown stops.
66
-
67
-
68
-
**Results**
69
-
70
-
# Youth card trip network vs. Adult card trip network
64
+
#### Youth card trip network vs. Adult card trip network
Low-income card trips (n=31334) compared to adult card trips (n=XXXX).
105
99
106
-
**Analyses**
107
-
108
-
Originally, we planned to pursue a multilayer network approach to directly compare the networks of different users, but this quickly became overcomplicated due to the size of the dataset.
109
100
110
-
Instead, analyzing each user type network discretely provided more easily interpretable results and visualizations without overtaxing our computers.
101
+
## **Limitations**
102
+
Originally, we planned to pursue a multilayer network approach to directly compare the networks of different users, but this quickly became overcomplicated due to the size of the dataset.
111
103
112
-
**Limitations**
104
+
Instead, analyzing each user type network discretely provided more easily interpretable results and visualizations without overtaxing our computers.
113
105
114
106
This approach has only been tested with one month of trip data, and even then we ran up against memory and computing limitations to complete the analysis. Additionally, we identified several issues with the data including negative trip times, impossibly long trip times, and trips that had the same start and stop location. These will be addressed in new iterations of the database, but for now were just filtered out.
0 commit comments