1. Introduction

In 2018, the Chicago Police Department recorded 267,581 incidents of crime (with the exception of murders where data exists for each victim), which equates to 733 incidents per day on average. It is without doubt that crime prevention and reduction is essential for a city to thrive and to enhance the living experience of its residents since crimes can impose great cost on not only the society as a whole but also on individuals either suffered from them or living in fear of them.

This project aims to establish a Prediction Policing model which leverages environmental features of crimes observed in the past and tests the extent to which whose features can generalize to places where few crimes were reported but may at high risk of crime.

Different types of crime are associated with different levels of selection and reporting bias. For example, reporting rate of forcible-entry burglaries is much closer to 100% than that of rape, since rape victims might worry about social stigma attached to sexual assault. In turn, criminals might also choose to commit crimes in areas where people are less likely to report. To test how the prediction model performs for crimes with greater selection bias, this model targets on narcotics – crimes associated with drugs where reporting is highly biased. For example, people in low-income neighborhoods might be more cautious and suspicious of those that might possess or use drugs while people in high-end, white-dominated communities might trust their neighbors more and end up being careless and failing to notice the red flags.

Begin by loading related R packages and create a mapTheme function to standardize the following maps.


mapTheme <- function(base_size = 12) {
    text = element_text( color = "black"),
    plot.title = element_text(size = 14,colour = "black"),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_rect(colour = "black", fill=NA, size=2)


2. Narcotics Data

Data of Chicago narcotics in 2018 is downloaded from Chicago Open Data site using the RSocrata package. The spatial distribution of narcotics is shown below in point form. It can be observed that narcotics is highly concentrated in west of the city while the southern part also witnessed great amounts of narcotics as well.

#read in Chicago Boundary data
chicagoBoundary <- 
  st_read("./riskPrediction_data/chicagoBoundary.shp") %>%

#read in narcotics data
narcotics <- 
  read.socrata("https://data.cityofchicago.org/Public-Safety/2018-Crime/iecj-q4j4") %>% 
  filter(Primary.Type == "NARCOTICS") %>%
  mutate(x = gsub("[()]", "", Location)) %>%
  separate(x,into= c("Y","X"), sep=",") %>%
  mutate(X = as.numeric(X),
         Y = as.numeric(Y)) %>% 
  na.omit %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
  st_transform(102271) %>% 

#map of points
ggplot() + 
  geom_sf(data = chicagoBoundary) +
  geom_sf(data = narcotics, colour="purple4", size=0.1, show.legend = "point") +
  labs(title= "Narcotics, Chicago - 2018") +