Skip to content

Commit 7dee6e9

Browse files
committed
changed header formatting
1 parent 370b55e commit 7dee6e9

File tree

1 file changed

+21
-29
lines changed

1 file changed

+21
-29
lines changed

networks.md

Lines changed: 21 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ parent: Analyses
1717

1818
The goal of this analysis is to understand the differences between trip origin and destination networks between riders who use the different ORCA card reduced fare programs. Understanding the patterns of trip behavior has potential to be used to inform targeted service and stop improvements that could benefit the different demographics that ride transit using the different card types. Specifically, we are interested in the following questions:
1919

20-
### 1) How do the trip networks vary between card types?
20+
#### 1) How do the trip networks vary between card types?
2121
Understanding and visualizing trip networks can reveal large-scale patterns of ridership across the different ORCA card types. Identifying similarities and differences between the networks could provide insight into the importance of different stops and connections in the travel patterns of users of each card type. Networks can vary in a multitude of ways, however for this analysis we are focusing on several metrics particularly relevant to transit:
2222

2323
A) Degree centrality (based on the number of trip arrivals and departures that each stop has), which will reflect which stops are most frequently used by users of each card type. A higher value indicates that a stop is frequented more by riders that use the same card type. This metric is computed at the stop level, for both origins and destinations independently as well as the total sum for both.
@@ -28,46 +28,40 @@ C) Network density (the proportion of actual origin-destination trips in the net
2828

2929
D) Modularity (the strength of division of the network into clusters that see more frequent trips between stops within the module than stops outside of the cluster), which will identify whether there are distinct clusters of stops used by different rider groups. A higher value indicates that there are more distinct clusters in the network. This metric is computed at the whole network level.
3030

31-
### 2) Are stops that are central in transit ridership networks shared across all card demographics?
31+
#### 2) Are stops that are central in transit ridership networks shared across all card demographics?
3232
This question will allow us to understand whether there are universally-important stops across card demographics that could be improved to benefit all riders. Conversely, it could reveal stops that are particularly important to certain demographics that would be considered less important when considering all riders, which would provide insight to inform targeted improvements to support those demographics.
3333

34-
### 3) Is the structure of these networks reflected by the geographic layout of the transportation network?
34+
#### 3) Is the structure of these networks reflected by the geographic layout of the transportation network?
3535
This question will generate insight to whether the trip networks are geographically structured (i.e. the most frequented stops tend to be in the center of the network geographically). If this is not the case (i.e. the most frequented stops tend to be in the periphery of the network geographically), it will provide evidence for non-geographic drivers of transit patterns that can be further explored in future analyses.
3636

3737

38-
**Data**
38+
## **Data**
3939

4040
For this analysis, we used a subset of ORCA origin-destination trip data from April 2023 in the ORCA next generation database. At the time of analysis, the full updated trip table was not available. This analysis is ready to complete for each month at a later date as the data becomes available.
4141

4242
Additionally, we incorporated census block data and USGS National Hydrography Dataset data to create regional spatial hexgrid shapefiles to aggregate stops that are close to each other.
4343

4444
Data was filtered by card type into the following groups: adult, youth, lift card (low-income riders), senior, and disability. Each group was analyzed as a separate network for comparison.
4545

46+
47+
## **Methods**
4648
The following data cleaning steps were taken to prepare the trip table for network analysis:
47-
1) Duplicated rows were dropped because some trips were duplicated erroneously in the database.
48-
2) The absolute time difference between boarding and destination was calculated. We used the absolute time difference because some trips erroneously had a destination time that was prior to the origin time.
49-
3) Trips with duration longer than 3 hours were dropped. This is because some trips had unreasonably long trip times due an issue with the algorithm that determines start and stop location for each trip.
50-
4) Trip frequency for each unique origin-destination trip was calculated.
51-
5) Duplicate trips were dropped after trip frequency was calculated.
49+
1. Duplicated rows were dropped because some trips were duplicated erroneously in the database.
50+
2. The absolute time difference between boarding and destination was calculated. We used the absolute time difference because some trips erroneously had a destination time that was prior to the origin time.
51+
3. Trips with duration longer than 3 hours were dropped. This is because some trips had unreasonably long trip times due an issue with the algorithm that determines start and stop location for each trip.
52+
4. Trip frequency for each unique origin-destination trip was calculated.
53+
5. Duplicate trips were dropped after trip frequency was calculated.
5254

5355
Each of the issues mentioned above in the cleaning steps were reported to the project leads, who maintain the database. These issues will be taken into account and corrected as the project leads prepare to release the most recent iteration of the trips table in the database.
5456

55-
56-
57-
**Tools**
58-
59-
To clean and filter the network data, we used the packages sqlalchemy, pandas, numpy, geopandas, and shapely. To calculate network metrics, we used networkx. For network visualization, we used folium. We also developed an open-source package available in our github repository with custom functions for each analysis, including the cleaning functions to prepare the data for network analysis.
57+
To clean and filter the network data, we used the packages sqlalchemy, pandas, numpy, geopandas, and shapely. To calculate network metrics, we used networkx. For network visualization, we used folium. We also developed an open-source package available in our github repository with custom functions for each analysis, including the cleaning functions to prepare the data for network analysis.
6058

59+
We ran each network analysis separately for each of the card types: adult, youth, senior, disability, and low-income. We imported the origin-destination trips table for April 2023 from the ORCA postgres database and loaded each table as a pandas geodataframe. Data was filtered following the steps outlined above. We then assigned each stop to the centroid of a 1/4 mile hexagonal grid overlaid on the spatial extent of the stop points to aggregate the data and improve visibility in the plots. Then, we calculated trip frequency and filtered out any duplicates as well as origin-destination trip combinations with fewer instances than 20 that month to focus only on the most frequent trips. Next, we used networkx to create networks for each card type with nodes representing origin and destination location and edges representing trip area. We used the networkx object to calculate network metrics. Then, we used folium to create interactive maps for each card type, excluding the downtown Seattle area to reduce overplotting of the high density-high frequency downtown stops.
6160

6261

63-
**Processes**
62+
## **Results**
6463

65-
We imported the origin-destination trips table for April 2023 from the ORCA postgres database and loaded each table as a pandas geodataframe. Data was filtered following the steps outlined in the above Data section. We ran each network analysis separately for each of the card types: adult, youth, senior, disability, and low-income. We then assigned each stop to the centroid of a 1/4 mile hexagonal grid overlaid on the spatial extent of the stop points to aggregate the data and improve visibility in the plots. Then, we calculated trip frequency and filtered out any duplicates as well as origin-destination trip combinations with fewer instances than 20 that month to focus only on the most frequent trips. Next, we used networkx to create networks for each card type with nodes representing origin and destination location and edges representing trip area. We used the networkx object to calculate network metrics. Then, we used folium to create interactive maps for each card type, excluding the downtown Seattle area to reduce overplotting of the high density-high frequency downtown stops.
66-
67-
68-
**Results**
69-
70-
# Youth card trip network vs. Adult card trip network
64+
#### Youth card trip network vs. Adult card trip network
7165

7266
<div style="display: flex; justify-content: space-between; gap: 10px;">
7367
<iframe id="youthMap" src="https://uwescience.github.io/DSSG2024_transit_equity/assets/img/youth_net_no_downtown.html" style="width: 48%; height: 600px; border: none;" onload="centerMap('youthMap')"></iframe>
@@ -76,7 +70,7 @@ We imported the origin-destination trips table for April 2023 from the ORCA post
7670

7771
Youth card trips (n=14119) compared to adult card trips (n=XXXX).
7872

79-
# Senior card trip network vs. Adult card trip network
73+
#### Senior card trip network vs. Adult card trip network
8074

8175
<div style="display: flex; justify-content: space-between; gap: 10px;">
8276
<iframe id="seniorMap" src="https://uwescience.github.io/DSSG2024_transit_equity/assets/img/senior_net_no_downtown.html" style="width: 48%; height: 600px; border: none;" onload="centerMap('seniorMap')"></iframe>
@@ -85,7 +79,7 @@ Youth card trips (n=14119) compared to adult card trips (n=XXXX).
8579

8680
Senior card trips (n=12693) compared to adult card trips (n=XXXX).
8781

88-
# Disability card trip network vs. Adult card trip network
82+
#### Disability card trip network vs. Adult card trip network
8983

9084
<div style="display: flex; justify-content: space-between; gap: 10px;">
9185
<iframe id="disabilityMap" src="https://uwescience.github.io/DSSG2024_transit_equity/assets/img/disability_net_no_downtown.html" style="width: 48%; height: 600px; border: none;" onload="centerMap('disabilityMap')"></iframe>
@@ -94,7 +88,7 @@ Senior card trips (n=12693) compared to adult card trips (n=XXXX).
9488

9589
Disability card trips (n=4640) compared to adult card trips (n=XXXX).
9690

97-
# Low-income card trip network vs. Adult card trip network
91+
#### Low-income card trip network vs. Adult card trip network
9892

9993
<div style="display: flex; justify-content: space-between; gap: 10px;">
10094
<iframe id="lowIncomeMap" src="https://uwescience.github.io/DSSG2024_transit_equity/assets/img/lift_net_no_downtown.html" style="width: 48%; height: 600px; border: none;" onload="centerMap('lowIncomeMap')"></iframe>
@@ -103,13 +97,11 @@ Disability card trips (n=4640) compared to adult card trips (n=XXXX).
10397

10498
Low-income card trips (n=31334) compared to adult card trips (n=XXXX).
10599

106-
**Analyses**
107-
108-
Originally, we planned to pursue a multilayer network approach to directly compare the networks of different users, but this quickly became overcomplicated due to the size of the dataset.
109100

110-
Instead, analyzing each user type network discretely provided more easily interpretable results and visualizations without overtaxing our computers.
101+
## **Limitations**
102+
Originally, we planned to pursue a multilayer network approach to directly compare the networks of different users, but this quickly became overcomplicated due to the size of the dataset.
111103

112-
**Limitations**
104+
Instead, analyzing each user type network discretely provided more easily interpretable results and visualizations without overtaxing our computers.
113105

114106
This approach has only been tested with one month of trip data, and even then we ran up against memory and computing limitations to complete the analysis. Additionally, we identified several issues with the data including negative trip times, impossibly long trip times, and trips that had the same start and stop location. These will be addressed in new iterations of the database, but for now were just filtered out.
115107

0 commit comments

Comments
 (0)