This is part of my research assistant work for Professor Megan S. Ryerson at the University of Pennsylvania who is conducting a comprehensive investigation on the disparity of air service in terms of frequency, direct service, and air fares across across airports within a megaregion. Such disparity leads travelers to drive long distances to access busy airports with lower fares and better air services instead of the nearest local airport, which is defined as “customer leakage” problem.

My work in this project includes:

  • Analyze the DB1B data in R regarding flight frequency, direct service, and air fares.
  • Perform Network Analysis in ArcGIS to find out the distance between each census tract and each selected airport in the studyarea.
  • Calculate the utility of the selected airports in the study and plot the airport catchment map showing each airport’s market catchment. (This is initially done in ArcGIS but has been automated in R by my collegue and also my classmate - Eugene Chong)
  • Convert the Highest Market Share map into a cartogram based on the population in each census tract to better reflect the catchment volume.
  • Analyze data from ARC(Airlines Reporting Corporation) to calculate approximate percentage share of total passengers using the selected airport in the selected year in each zipcode.
  • Use data from airport’s annual report and Network Analysis in ArcGIS to make the extramile maps to visualize how many extra miles were actually traveled by all the “leaked” customers in the selected year.


  • Airline Origin and Destination Survey (“DB1B”): a 10% sample of all airline tickets sold by reporting carriers published by the Bureau of Transportation Statistics (“BTS”)
  • Ticket Purchase data from Expedia including zipcode of customer’s billing address from ARC (Airlines Reporting Corporation)
  • Annual domestic enplanement from each airport’s annual report.

3. Method

In the first half of the analysis, each airport’s direct service utility and connecting service utility in each census tract were calculated using parameters including flight frequence of the selected destination, air fares, distance between each census tract and the airport, and total flight time (including dwelling time for connecting services). The market share of each airport was them computeted as:

e^(utility of selected airport)/sum(e^(utility of each airport in the study area))

In the second half of the analysis, I used ARC data to calculate the dominant airport’s percentage share of passengers in each zipcode in the study area. We made the assumption that the spatial distribution of ticket purchase in Expedia could well represent that of all the ticket purchased. The percetage share of airport i in zipcode j is calculated as:

#of ticket purchase using airport i in zipcode j/#of ticket purchase using airport i in the study area

Lastly, I used the total domestic enplanement data to calculate the approximate number of passengers in each zipcode using the selected airport and how many extramiles they traveled by those customers “leaked” from their nearest airport.

4. Southeast Megaregion

One of the study areas is the Texas-Oklahoma-Louisiana megaregion in which we were looking at eight airports (AUS, DAL, DFW, HOU, IAH, OKC, SAT, and SHV). We picked two destinations (SEA and ATL). This page only shows the analysis for flights from the selected airports to ATL.

4.1 Market Catchment

4.1.1 Highest marketshare

The map below shows which airport has the highest market share on flights to ATL in 2015 for each census tract in the study area. It is interesting to see that although SHV is an airport in this region, it does not have the highest market share for any tract in the study area.