1. Executive Summary

A successful bike-sharing system is the one that allocate resources – bikes – efficiently. In other words, there will neither be excess bikes in stations where demands are low nor inadequate bikes in stations where demands are high. Therefore, one big operational challenge lies in the predicting the demand and re-balancing the resources.

This project develops a space-time predictive model that anticipates bike share trips demand in New York City across different time periods and places, aiming to provide a tool for the bike share system manager to allocate bikes more efficiently.

The model developed in this project performs pretty well in terms of having a small MAE. However, it is not perfectly generalizable across different contexts or places or time periods since the errors are greater during rush hours and for stations or areas with greater trip counts. Spatial lags can be added into the model to improve its predictive power since it is assumed that bike share trip counts are not randomly distributed across places but tend to be correlated with trip counts of nearby stations.

2. Data

2.1 Citi Bike trip data

The bike share trips of New York City in July 2019 are loaded from Citi Bike Trip data (https://www.citibikenyc.com/system-data), which contains information of the start time and data, stop time and date, duration, start station, end station, and station coordinates of each citi bike trip. There were two special events in July. One is the July 4th holiday on Thursday and the other is the July 13th blackout in west Manhattan from 7PM to midnight on Saturday.

The plot below shows the bike share trips aggregated by hour in July 2019 marked by all the Mondays in grey lines, July 4th in red line, and the blackout in blue line. It can be observed that there are more bike share trips taking place on weekdays than on weekends or on holiday, suggesting that plenty of people are using Citi Bike only to commute.