Modeling E-Scooter Spatial-Temporal Trip Patterns in Philadelphia: Minneapolis Case Study

Introduction

Micro-mobility is seen as a solution to alleviating the last-mile gridlock, especially shared and dock-less electric scooters. However, legalizing them in Pennsylvania is still at a stand still for two years due to safety concerns. COVID-19 might be a turning point for cities to reconsider e-scooters as a legal transportation method because it enables people to open to the air and keep socially distant at the meantime. This project aims to explore the spatial-temporal e-scooter trip pattern in Minneapolis and build the model to predict the probable spatial pattern of e-scooter in Philadelphia in the future. The project used stepwise regression method with log-transformed dependent variables, outflow trips and inflow trips, to build the models.

Set up

Install these libraries and create themes for graphs and maps

library(ggplot2)
library(dplyr)
library(sf)
library(sp)
library(raster)
library(rgdal)
library(tidycensus)
library(tidyverse)
library(conflicted)
library(viridis)
library(corrplot)
library(lubridate)
library(tigris)
library(car)

theme.graph <- function(base_size = 11, base_family = "Arial", lines_lwd = 0.50, plot_grid = TRUE, axis_font = base_family, title_size = base_size*1.5, legend_size = base_size*0.8,
                        bg_col = "white",title_font = base_family , base_col  = "black", axis_lines = TRUE,
                        minor_grid = ifelse(plot_grid, TRUE, FALSE), vert_grid = ifelse(plot_grid, TRUE, FALSE), ticks_type = "outer", horz_grid = ifelse(plot_grid, TRUE, FALSE), alpha_leg = 0.1, bord_size = 0,
                        legend_bg = "white", strip_bg = "white", grid_thick = 1,
                        grid_type = "solid", ticks_xy  = "xy", grid_cols = c("grey50", "grey70")){
  theme_bw()+
    ggplot2::theme(
      plot.margin = grid::unit(c(1, 1, .5, .7), "cm"),
      text = ggplot2::element_text(family = base_family, size = base_size),
      axis.line =  element_line(size = ifelse(axis_lines, grid::unit(lines_lwd, "mm"),0), color = "black"),
      axis.ticks.length = grid::unit(ifelse(ticks_type == "outer", 0.15, -0.15), "cm"),
      axis.ticks.x =  element_line(size = ifelse(stringr::str_detect(ticks_xy, "x"), grid::unit(lines_lwd, "cm"),0), color = "black"),
      axis.ticks.y =  element_line(size = ifelse(stringr::str_detect(ticks_xy, "y"), grid::unit(lines_lwd, "cm") ,0), color = "black"),
      axis.text.x = ggplot2::element_text(angle=0,hjust= .5, vjust= .5, size = base_size, colour = base_col , family = axis_font,margin=margin(ifelse(ticks_type == "inner", 11, 5),5,10,5,"pt")),
      axis.text.y = ggplot2::element_text(angle = 0,size = base_size, colour = base_col , family = axis_font, margin=margin(5,ifelse(ticks_type == "inner", 11, 5),10,5,"pt")),
      axis.title.y = ggplot2::element_text(angle=90,vjust = .5, margin = margin(r = 15),size =  base_size, colour = base_col , family = axis_font),
      axis.title.x = ggplot2::element_text(size = base_size,colour = base_col ,vjust = -.5, family = axis_font),
      panel.background = ggplot2::element_rect(fill = bg_col),
      plot.background = ggplot2::element_rect(fill = bg_col),
      panel.border = ggplot2::element_rect(colour = "black", fill=NA, size = bord_size),
      panel.grid.major.x = ggplot2::element_line(linetype = grid_type,colour = ifelse(vert_grid, grid_cols[1],bg_col), size = ifelse(vert_grid,0.25 * grid_thick, 0)),
      panel.grid.minor.x = ggplot2::element_line(linetype = grid_type,colour = ifelse(vert_grid, ifelse(minor_grid, grid_cols[2 - (length(grid_cols) == 1)   ],bg_col),bg_col), size = ifelse(vert_grid,0.15* grid_thick, 0)),
      panel.grid.major.y = ggplot2::element_line(linetype = grid_type,colour = ifelse(horz_grid, grid_cols[1],bg_col), size = ifelse(horz_grid,0.25* grid_thick, 0)),
      panel.grid.minor.y = ggplot2::element_line(linetype = grid_type,colour = ifelse(horz_grid, ifelse(minor_grid, grid_cols[2 - (length(grid_cols) == 1)  ],bg_col),bg_col), size = ifelse(horz_grid,0.15* grid_thick, 0)),
      plot.title = ggplot2::element_text(face="bold",vjust = 2, colour = base_col , size = title_size, family = title_font),
      legend.background = ggplot2::element_rect(fill = scales::alpha(legend_bg, alpha_leg)), legend.key = ggplot2::element_blank(),
      legend.text = ggplot2::element_text(size = legend_size, family = base_family),
      legend.title = element_blank(),
      strip.background =  ggplot2::element_rect(colour = strip_bg, fill = strip_bg),
      strip.text.x = ggplot2::element_text(size = base_size + 1),
      strip.text.y = ggplot2::element_text(size = base_size + 1)
    )
}


theme.widegraph<-theme.graph()+
  theme(legend.position = c(.9,.55)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust =.4))

theme.map <-theme(text = element_text(family = "Arial"))+
                  theme(plot.title =element_text(size=11*1.5),
                  plot.subtitle = element_text(size=11),
                  plot.caption = element_text(size = 8),
                  axis.line=element_blank(),
                  axis.text.x=element_blank(),
                  axis.text.y=element_blank(),
                  axis.ticks=element_blank(),
                  axis.title.x=element_blank(),
                  axis.title.y=element_blank(),
                  panel.background=element_blank(),
                  panel.border=element_blank(),
                  panel.grid.major=element_line(colour = 'transparent'),
                  panel.grid.minor=element_blank(),
                  legend.direction = "vertical", 
                  legend.position = "right",
                  plot.margin = margin(1, 1, 1, 1, 'cm'),
                  legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))

The fastest speed for E-Scooter is 15 mph and the mximum trip duration is 2.5h.

trips<-read.csv("2018_Minneapolis_Motorized_Foot_Scooter_Trips.csv")
availability<-read.csv("Scooter_Availability_Minneapolis.csv")
trips$speed<-(trips$TripDistance/1609.344)/(trips$TripDuration/3600)
trips<-subset(trips,TripDuration<=9000 & speed<=15)

Exploratory analysis

Distribution of Trip Duration and Distance

ggplot(data=trips,aes(x=TripDuration/60,y=..y..))+ stat_ecdf()+labs(x="Trip Duration (mins)",y=NULL)+scale_x_continuous(breaks=seq(0, 160, 5)) +ggtitle('Distribution of Trip Durations')+theme.graph()

ggplot(data=trips,aes(x=TripDistance/1609.344,y=..y..))+stat_ecdf()+labs(x="Trip Distance (mi)", y=NULL)+scale_x_continuous(breaks=seq(0, 25, 1))+ggtitle('Distribution of Trip Distance')+theme.graph()

75% of e-scooter trips in Minneapolis are within 2 miles and under 15 mins.

Temporal Pattern: By Time of Day and Days by Week

trips$date<-substr(trips$StartTime,1,10)
trips$date<-as.Date(trips$date,"%Y-%m-%d")
trips$week<-weekdays(trips$date)
trips$time<-substr(trips$StartTime,12,16)
trips$time<-strptime(trips$time, format="%H:%M")
trips$time<-as.POSIXct.POSIXlt(trips$time)


daily_counts <- trips%>%
  mutate(trips$time<-as.Date(trips$time))%>%
  group_by(trips$time)%>%
  summarise(dailyCounts=n())

daily_counts$`trips$time`<-as.character(daily_counts$`trips$time`)
daily_counts$time<-substr(daily_counts$`trips$time`,12,16)
ggplot(data=daily_counts,aes(x=time,y=dailyCounts))+geom_col()+labs(x="Time of Day",y="Total Number of Active Trips per Half an Hour")+ggtitle('Minneapolis E-scooter Usage by Time of  Day ')+theme.widegraph

daily_countsweek <- trips%>%
  mutate(trips$time<-as.Date(trips$time))%>%
  group_by(trips$week,trips$time)%>%
  summarise(dailyCounts=n())

daily_countsweek$`trips$time`<-as.character(daily_countsweek$`trips$time`)
daily_countsweek$time<-substr(daily_countsweek$`trips$time`,12,16)
daily_countsweek$`trips$week` <- factor(daily_countsweek$`trips$week` , levels = c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
ggplot(data=daily_countsweek,aes(x=time,y=dailyCounts))+geom_col()+labs(x="Time of Day",y="Total Number of Active Trips per Half an Hour")+facet_wrap(~daily_countsweek$`trips$week`,nrow=3,ncol=3)+theme(axis.text = element_blank())+theme.widegraph+ theme(axis.text.x=element_text(size=4))

ggplot(data = daily_countsweek, mapping = aes(x =time, y = `trips$week`, fill = dailyCounts))+geom_tile(colour="Black") +scale_fill_viridis_c()+scale_fill_gradient(low = "lightyellow", high = "red")+labs(x="Time of Day",y="Day of Week")+theme.widegraph+theme(legend.position = "bottom")

Peak hours of e-scooter trips in Minneapolis are from Wednesday to Friday during 5pm to 6pm.

Spatial Pattern: Supply and Demand

Spatial Pattern of E-Scooter Supply

Import the shapefile datasets of Minneapolis street centerlines and city boundary; And obtain the census tracts boundary and census data from API.

MPLS_street<-st_read('MPLS_Centerline/MPLS_Centerline.shp')
MPLS_Boundary<-st_read("Minneapolisboundary/Minneapolisboundary.shp")%>%
  st_transform(crs=4326) 
census_api_key("2c3e9f9d2d65abab5f7b81fe418054415d363a43",install=TRUE,overwrite=TRUE)

Minneapoliscensus<-get_acs(geography = "tract", 
                           variables = c("B01003_001", "B19013_001", 
                                         "B02001_002", "B08013_001",
                                         "B08012_001", "B08301_001", 
                                         "B08301_010", "B01002_001"), 
                           year = 2018, 
                           state = "MN", 
                           geometry = TRUE, 
                           county=c("Hennepin"),
                           output = "wide") %>%
  rename(Total_Pop =  B01003_001E,
         Med_Inc = B19013_001E,
         Med_Age = B01002_001E,
         White_Pop = B02001_002E,
         Travel_Time = B08013_001E,
         Num_Commuters = B08012_001E,
         Means_of_Transport = B08301_001E,
         Total_Public_Trans = B08301_010E) %>%
  dplyr::select(Total_Pop, Med_Inc, White_Pop, Travel_Time,
         Means_of_Transport, Total_Public_Trans,
         Med_Age,
         GEOID, geometry) %>%
  mutate(Percent_White = White_Pop / Total_Pop,
         Mean_Commute_Time = Travel_Time / Total_Public_Trans,
         Percent_Taking_Public_Trans = Total_Public_Trans / Means_of_Transport)

Minneapolistract <- 
  Minneapoliscensus%>%
  as.data.frame() %>%
  distinct(GEOID, .keep_all = TRUE) %>%
  dplyr::select(GEOID, geometry) %>% 
  st_sf %>% 
  st_transform(4326)
names(Minneapolistract)<-c("GEOID10","geometry")

Minneapolis <- st_intersection(Minneapolistract,MPLS_Boundary)

Spatial Pattern of E-Scooter Supply by Street Centerlines

availability$time<-substr(availability$PollTime,1,10)
avail<-availability %>%
              group_by(time,ClosestCenterlineID)%>%
              summarise(count=n(),
                        sum_avail=sum(NumberAvailable))

avail2<-avail %>%
              group_by(ClosestCenterlineID)%>%
              summarise(count=n(),
                        avg_avail=mean(sum_avail))
avail2$avg_avail<-round(avail2$avg_avail)
MPLS_street<-left_join(MPLS_street,avail2,by=c('GBSID'='ClosestCenterlineID'))
MPLS_street<-MPLS_street%>%
            dplyr::select(OBJECTID,STREETALL,TYPE,SPEED_LIM,GBSID,count,avg_avail,geometry)
1-(sum(is.na(MPLS_street$avg_avail))/length(MPLS_street$avg_avail))# What percentage of streets in Minneapolis have been provided with e-scooters?

## [1] 0.69854

ggplot() + 
  geom_sf(data = Minneapolis) +
  geom_sf(data = MPLS_street,aes(colour=ntile(avg_avail,4)), size=0.5,show.legend = 'line') +
  scale_colour_viridis('E-Scooter Availability - Quantile Breaks', # Scale label
                     direction = -1, # Direction of the color palette - either 1 or -1
                     option = 'D',  # Viridis has multiple palettes to choose from
                     alpha = 0.6,  # Transparency level
                     labels=as.character(quantile(MPLS_street$avg_avail,
                                                  c(.25,.5,.75,1),na.rm=TRUE)))+
  labs(title= "Daily E-Scooter Availability in Minneapolis ") +
  theme.map

Spatial Pattern of E-Scooter Supply by Census Tracts

census_street<-st_join(Minneapolistract,MPLS_street)

census_street<-census_street%>%
               group_by(GEOID10)%>%
              summarize(Tot_avail=sum(avg_avail,na.rm = TRUE))
              



census_street2<-st_intersection(census_street,MPLS_Boundary)



ggplot() + 
  geom_sf(data=census_street2,aes(fill=Tot_avail))+
  scale_fill_gradient(low='white',high='steelblue')+
  labs(
    title = "Minneapolis E-Scooter Availability by Census Tracts"
  )+
  theme.map

Both two maps show that most of e-scooters are provided in the center and northeast of the city, which are Gateway District, Southeast Como Neighborhood and Dinkytown where University of Minnesota is located.

Spatial Pattern of E-Scooter Demand

summary(trips$StartCenterlineID)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10011   17078   18699   19274   21318   26148

Start<-table(trips$StartCenterlineID)
Startstreet<-as.data.frame(Start)
names(Startstreet)<-c("Street", "outtrips")
Endstreet<-as.data.frame(table(trips$EndCenterlineID))
names(Endstreet)<-c("Street", "intrips")
Trips_data<-merge(Startstreet,Endstreet,by="Street",all=TRUE)
Trips_data[is.na(Trips_data)] <- 0
Trips_data$absolute_trip<-Trips_data$intrips-Trips_data$outtrips
Trips_data$Street<-as.character(Trips_data$Street)
MPLS_street$GBSID<-as.character(MPLS_street$GBSID)

MPLS_street<-left_join(MPLS_street,Trips_data,by=c('GBSID'='Street'))

census_street_demand<-st_join(Minneapolistract,MPLS_street)

census_street_demand<-census_street_demand%>%
  group_by(GEOID10)%>%
  summarize(Abs_trips=sum(absolute_trip,na.rm = TRUE))

census_street_demand<-st_intersection(census_street_demand,MPLS_Boundary)

ggplot() + 
  geom_sf(data=census_street_demand,aes(fill=Abs_trips))+
  scale_fill_viridis('Absolute Trips',
                     option='D',
                     alpha = .4)+
  labs(
    title = "Minneapolis E-Scooter Demands by Census Tracts"
  )+
  geom_sf_label(data=census_street_demand,aes(label=Abs_trips),size=1)+
  theme.map

The absolute trips equal to inflow trips minus outflow trips. Hence, the negative value represent that the census tracts are mainly the origination for citizens, while the positive value represent that the census tracts are mainly the destination. From the map, we could see that Central Minneapolis are an important origin area in Minneapolis, and while the inflow trips are larger than outflow trips in other census tracts, Southeast Como Neighborhood and Dinkytown are the two main destinations in Minneapolis.

Model Building

Explanatory and Dependant Variables

Chosen explanatory variables include: 1. Number of commuters; 2. Number of vehicle in commuting; 3. Median household income; 4. Total population; 5. Population density (per sq mile); 6. Education level; 7. Sex ratio (Male/Female); 8. Median age; 9. Transit stops density

Chosen dependant variables include: 1. Inflow e-scooter trips; 2. Outflow e-scooter trips

ct<-read.csv("census tract.csv")
#commuting
commuting<-read.csv("commuting.csv")
commuting<-commuting[2:300,]
commuting$S0801_C01_001E<-as.character(commuting$S0801_C01_001E)
commuting$S0801_C01_001E<-as.numeric(commuting$S0801_C01_001E)
commuting$S0801_C01_013E<-as.character(commuting$S0801_C01_013E)
commuting$S0801_C01_013E<-as.numeric(commuting$S0801_C01_013E)
  #commuters=total workers-work at home
commuting$commuters<-commuting$S0801_C01_001E-(commuting$S0801_C01_013E*commuting$S0801_C01_001E*0.01)
commuting$commuters<-round(commuting$commuters)
commuting$NAME<-as.character(commuting$NAME)
commuting$NAME<-str_sub(commuting$NAME,14,-23)
commuting$NAME<-str_sub(commuting$NAME,end=-6)
commuting$NAME<-str_sub(commuting$NAME,end=-2)
commuting$NAME<-as.numeric(commuting$NAME)
commuters<-commuting[,c("NAME","commuters")]
ct<-merge(ct,commuters,by.x="NAME10",by.y="NAME",all=FALSE)
social<-read.csv("social explorer data.csv")
social<-social[2:300,]
social$Census.Tract<-as.numeric(as.character(social$Census.Tract))
social$Census.Tract<-social$Census.Tract/100
  #vehicle amount
vehicle<-social[,c("Census.Tract","Aggregate.Number.Of.Vehicles..Car.Truck.Or.Van..Used.In.Commuting")]
names(vehicle)<-c("tract","number of vehicle in commuting")
ct<-merge(ct,vehicle, by.x="NAME10", by.y="tract",all=FALSE)
#household income
income<-social[,c("Census.Tract","Median.Household.Income..In.2018.Inflation.Adjusted.Dollars.")]
names(income)<-c("tract","median household income")
ct<-merge(ct,income, by.x="NAME10", by.y="tract",all=FALSE)
#population and density
pop<-social[,c("Census.Tract","Total.Population","Population.Density..Per.Sq..Mile.")]
names(pop)<-c("tract","pop","pop density(per sq mile)")
ct<-merge(ct,pop, by.x="NAME10", by.y="tract",all=FALSE)
#Education: 1-less than high school,2-high school,3-college,4-bachelor,5-master, 6-professional school,7-doctorate
social$X..Population.25.Years.and.Over..Less.than.High.School<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Less.than.High.School))
social$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.<-as.numeric(as.character(social$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.))
social$X..Population.25.Years.and.Over..Some.College<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Some.College))
social$X..Population.25.Years.and.Over..Bachelor.s.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Bachelor.s.Degree))
social$X..Population.25.Years.and.Over..Master.s.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Master.s.Degree))
social$X..Population.25.Years.and.Over..Professional.School.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Professional.School.Degree))
social$X..Population.25.Years.and.Over..Doctorate.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Doctorate.Degree))
social$education_level<-(1*social$X..Population.25.Years.and.Over..Less.than.High.School+2*social$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.
                         +3*social$X..Population.25.Years.and.Over..Some.College+4*social$X..Population.25.Years.and.Over..Bachelor.s.Degree
                         +5*social$X..Population.25.Years.and.Over..Master.s.Degree+6*social$X..Population.25.Years.and.Over..Professional.School.Degree
                         +7*social$X..Population.25.Years.and.Over..Doctorate.Degree)/100
Education<-social[,c("Census.Tract","education_level")]
names(Education)<-c("tract","education level")
ct<-merge(ct,Education, by.x="NAME10", by.y="tract",all=FALSE)
#Age
medianage<-Minneapoliscensus[,c("GEOID","Med_Age")]
ct<-merge(ct,medianage, by.x="GEOID10", by.y="GEOID",all=FALSE)
#Sex
social$Total.Population..Male<-as.numeric(as.character(social$Total.Population..Male))
social$Total.Population..Female<-as.numeric(as.character(social$Total.Population..Female))
social$sexratio<-social$Total.Population..Male/social$Total.Population..Female
sex<-social[,c("Census.Tract","sexratio")]
names(sex)<-c("tract","sex ratio")
ct<-merge(ct,sex,by.x="NAME10", by.y="tract",all=FALSE)
#Transit Accessibility
Minneapolisstops<-read.csv("stops.csv")

stops <- Minneapolisstops %>% 
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
  st_transform(crs=4326)
stops_join <- st_intersection(stops,Minneapolis) %>% 
  dplyr::select(-8)
stops_join$GEOID10<-as.numeric(stops_join$GEOID10)

stop_counts <- stops_join%>%
  mutate(stops_join$GEOID10<-as.numeric(GEOID10))%>%
  group_by(stops_join$GEOID10)%>%
  summarise(stopcounts=n())
stop_counts
names(stop_counts)<-c("GEOID","stopcounts","geometry")
ct<-merge(ct,stop_counts,by.x="GEOID10", by.y="GEOID",all=FALSE)
ct$stopdensity<-ct$stopcounts/ct$LAND_SQML

Preliminary Tests

Model<-ct[,c("GEOID10","NAME10","NAMELSAD10","INTPTLAT10","INTPTLON10","Trips__abs","Trips__out","Trips__int","commuters","number of vehicle in commuting","median household income","pop","pop density(per sq mile)","education level", "sex ratio","Med_Age","stopdensity")]
Model$`number of vehicle in commuting`<-as.numeric(as.character(Model$`number of vehicle in commuting`))
Model$`median household income`<-as.numeric(as.character(Model$`median household income`))
Model$pop<-as.numeric(as.character(Model$pop))
Model$`pop density(per sq mile)`<-as.numeric(as.character(Model$`pop density(per sq mile)`))
names(Model)<-c("GEOID","tract","censustract","LAT","LON","trips_abs","trips_out","trips_in","commuters","vehicle_number","Med_inc","pop","pop_density","education","sex_ratio","Med_Age","stopdensity")
library(psych)
Explanatory<-Model[,9:17]

explanatory_cor<-cor(Explanatory)
kappa(explanatory_cor, exact=T)

## [1] 114.7813

res1 <- cor.mtest(Explanatory, conf.level = .95)
corrplot(corr=explanatory_cor,method="color",order="AOE",type="upper",tl.pos = "d",tl.cex=0.4,p.mat = res1$p, sig.level = .01)  
corrplot(corr=explanatory_cor,add=TRUE, type="lower",method="number",order="AOE",diag = FALSE,tl.pos="n", cl.pos="n",p.mat = res1$p, sig.level = .01)

The chart shows that there are significant correlation between commuters, population and number of vehicles. Using Kappa() function in R to calculate the IV matrix condition number k value could also testify the multicollinearity. The k value is 114.8 which is larger than 100 and smaller than 1000, indicating strong multicollinearity between independent variables.

Building Inflow Model

First, try the multiple linear regression of all the chosen explanatory variables and inflow trips.

Model1<-lm(trips_in~commuters+vehicle_number+Med_inc+pop+pop_density
           +education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model1)

## 
## Call:
## lm(formula = trips_in ~ commuters + vehicle_number + Med_inc + 
##     pop + pop_density + education + sex_ratio + Med_Age + stopdensity, 
##     data = Model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9166.3 -1076.2   186.8  1349.6 14159.0 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.218e+04  3.074e+03  -3.962 0.000135 ***
## commuters       4.519e+00  1.453e+00   3.110 0.002402 ** 
## vehicle_number -8.190e+00  1.366e+00  -5.994 2.86e-08 ***
## Med_inc         1.108e-02  1.834e-02   0.605 0.546774    
## pop             1.327e+00  4.505e-01   2.945 0.003965 ** 
## pop_density    -2.321e-01  5.965e-02  -3.891 0.000175 ***
## education       2.175e+03  9.355e+02   2.325 0.021973 *  
## sex_ratio       2.208e+03  1.689e+03   1.307 0.194092    
## Med_Age        -1.037e+01  6.222e+01  -0.167 0.867914    
## stopdensity     4.057e+01  4.968e+00   8.165 7.19e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2774 on 106 degrees of freedom
## Multiple R-squared:  0.7527, Adjusted R-squared:  0.7317 
## F-statistic: 35.86 on 9 and 106 DF,  p-value: < 2.2e-16

#med_inc,sex ratio,med_age not significant
step(Model1)

## Start:  AIC=1848.87
## trips_in ~ commuters + vehicle_number + Med_inc + pop + pop_density + 
##     education + sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq        RSS    AIC
## - Med_Age         1    213908  816042329 1846.9
## - Med_inc         1   2812867  818641287 1847.3
## - sex_ratio       1  13144509  828972929 1848.7
## <none>                         815828420 1848.9
## - education       1  41606912  857435332 1852.6
## - pop             1  66769784  882598204 1856.0
## - commuters       1  74452095  890280515 1857.0
## - pop_density     1 116500777  932329198 1862.3
## - vehicle_number  1 276549481 1092377902 1880.7
## - stopdensity     1 513160796 1328989216 1903.5
## 
## Step:  AIC=1846.9
## trips_in ~ commuters + vehicle_number + Med_inc + pop + pop_density + 
##     education + sex_ratio + stopdensity
## 
##                  Df Sum of Sq        RSS    AIC
## - Med_inc         1   2713416  818755745 1845.3
## - sex_ratio       1  12931082  828973411 1846.7
## <none>                         816042329 1846.9
## - education       1  44837694  860880023 1851.1
## - pop             1  66694264  882736592 1854.0
## - commuters       1  83166641  899208970 1856.2
## - pop_density     1 117653548  933695876 1860.5
## - vehicle_number  1 299217993 1115260321 1881.1
## - stopdensity     1 513079179 1329121507 1901.5
## 
## Step:  AIC=1845.29
## trips_in ~ commuters + vehicle_number + pop + pop_density + education + 
##     sex_ratio + stopdensity
## 
##                  Df Sum of Sq        RSS    AIC
## - sex_ratio       1  12707718  831463463 1845.1
## <none>                         818755745 1845.3
## - pop             1  88076757  906832502 1855.1
## - commuters       1  97488116  916243861 1856.3
## - pop_density     1 127046476  945802221 1860.0
## - education       1 148884533  967640278 1862.7
## - vehicle_number  1 398134129 1216889874 1889.2
## - stopdensity     1 533658800 1352414545 1901.5
## 
## Step:  AIC=1845.07
## trips_in ~ commuters + vehicle_number + pop + pop_density + education + 
##     stopdensity
## 
##                  Df Sum of Sq        RSS    AIC
## <none>                         831463463 1845.1
## - pop             1  77431353  908894816 1853.4
## - commuters       1 121615369  953078831 1858.9
## - pop_density     1 123500523  954963986 1859.1
## - education       1 142489154  973952616 1861.4
## - vehicle_number  1 429618243 1261081706 1891.4
## - stopdensity     1 653359213 1484822676 1910.3

## 
## Call:
## lm(formula = trips_in ~ commuters + vehicle_number + pop + pop_density + 
##     education + stopdensity, data = Model)
## 
## Coefficients:
##    (Intercept)       commuters  vehicle_number             pop     pop_density  
##     -1.069e+04       4.477e+00      -8.007e+00       1.291e+00      -2.328e-01  
##      education     stopdensity  
##      2.454e+03       4.296e+01

The summary of model 1 shows that median income, sex ratio and median age are not significantly correlated to inflow trips. Hence, using stepwise regression to select fitted variables. The stepwise method excludes median income, sex ratio and median age.

Model1_stepwise<-lm(formula = trips_in ~ commuters + vehicle_number + pop + pop_density + education + stopdensity, data = Model)
summary(Model1_stepwise)

## 
## Call:
## lm(formula = trips_in ~ commuters + vehicle_number + pop + pop_density + 
##     education + stopdensity, data = Model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9595.9  -898.3     8.9  1183.1 13910.7 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.069e+04  2.033e+03  -5.257 7.35e-07 ***
## commuters       4.477e+00  1.121e+00   3.993 0.000119 ***
## vehicle_number -8.007e+00  1.067e+00  -7.505 1.78e-11 ***
## pop             1.291e+00  4.051e-01   3.186 0.001882 ** 
## pop_density    -2.328e-01  5.785e-02  -4.024 0.000106 ***
## education       2.454e+03  5.678e+02   4.322 3.43e-05 ***
## stopdensity     4.296e+01  4.641e+00   9.255 2.14e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2762 on 109 degrees of freedom
## Multiple R-squared:  0.748,  Adjusted R-squared:  0.7341 
## F-statistic: 53.92 on 6 and 109 DF,  p-value: < 2.2e-16

par(mfrow=c(1,1))
vif(Model1_stepwise)

##      commuters vehicle_number            pop    pop_density      education 
##      12.954674       5.118265       5.716493       1.523300       1.756836 
##    stopdensity 
##       1.209620

Although all the explanatory variables in the stepwise Model 1 are significant, the vif result shows that the number of commuters has strong multicollnearity with other variables. Therefore, exclude the number of commuters in the following model.

Model1_stepwise_renewed<-lm(formula = trips_in ~ vehicle_number + pop + pop_density + education + stopdensity, data = Model)
summary(Model1_stepwise_renewed)

## 
## Call:
## lm(formula = trips_in ~ vehicle_number + pop + pop_density + 
##     education + stopdensity, data = Model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9677.3 -1134.1  -110.7  1249.9 14345.3 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.511e+04  1.816e+03  -8.321 2.62e-13 ***
## vehicle_number -5.013e+00  8.088e-01  -6.198 1.02e-08 ***
## pop             2.608e+00  2.506e-01  10.408  < 2e-16 ***
## pop_density    -1.221e-01  5.412e-02  -2.257    0.026 *  
## education       3.323e+03  5.589e+02   5.946 3.31e-08 ***
## stopdensity     4.757e+01  4.791e+00   9.929  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2944 on 110 degrees of freedom
## Multiple R-squared:  0.7111, Adjusted R-squared:  0.698 
## F-statistic: 54.16 on 5 and 110 DF,  p-value: < 2.2e-16

par(mfrow=c(1,1))
vif(Model1_stepwise_renewed)

## vehicle_number            pop    pop_density      education    stopdensity 
##       2.589139       1.925317       1.173816       1.498684       1.134619

layout(matrix(c(1,2,3,4),2,2))
plot(Model1_stepwise_renewed)

After excluding the number of commuters, the vif values are all under 5, indicating poor or no multicollnearity with other variables. However, the regression diagnostics indicate that the model is not fitted for prediction. First, the first and second plots of ‘Residuals vs Fitted’ and ‘Scale-Location’ show that as the fitted values increase, the residuals decrease, indicating that the original data is not linear relationship; Second, the third plot of ‘Normal Q-Q’ shows that the residuals are non-normality; Third, the last plot of ‘Residuals vs Leverage’ indicates that there are many influential points in the model. Based on the reasons, try to replace dependant variable with log-transformed inflow trips.

Model$logtripsin<-log(Model$trips_in)
Model1_log<-lm(logtripsin~commuters+vehicle_number+Med_inc+pop+pop_density
               +education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model1_log)

## 
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + 
##     pop + pop_density + education + sex_ratio + Med_Age + stopdensity, 
##     data = Model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.19144 -0.58818 -0.08074  0.56144  3.02367 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     2.697e-01  1.068e+00   0.253 0.801049    
## commuters       1.164e-03  5.046e-04   2.306 0.023060 *  
## vehicle_number -1.680e-03  4.746e-04  -3.540 0.000596 ***
## Med_inc        -2.570e-05  6.368e-06  -4.036 0.000103 ***
## pop             8.981e-05  1.565e-04   0.574 0.567163    
## pop_density    -3.416e-06  2.072e-05  -0.165 0.869369    
## education       1.893e+00  3.249e-01   5.827 6.14e-08 ***
## sex_ratio       8.655e-01  5.867e-01   1.475 0.143138    
## Med_Age        -3.226e-02  2.161e-02  -1.493 0.138482    
## stopdensity     6.116e-03  1.726e-03   3.544 0.000587 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9635 on 106 degrees of freedom
## Multiple R-squared:  0.7011, Adjusted R-squared:  0.6757 
## F-statistic: 27.63 on 9 and 106 DF,  p-value: < 2.2e-16

#pop,pop_density,sex_ratio,med_age not significant
step(Model1_log)

## Start:  AIC=0.92
## logtripsin ~ commuters + vehicle_number + Med_inc + pop + pop_density + 
##     education + sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq     RSS     AIC
## - pop_density     1    0.0252  98.436 -1.0455
## - pop             1    0.3059  98.717 -0.7152
## <none>                         98.411  0.9248
## - sex_ratio       1    2.0202 100.431  1.2820
## - Med_Age         1    2.0687 100.479  1.3379
## - commuters       1    4.9366 103.347  4.6025
## - vehicle_number  1   11.6319 110.043 11.8841
## - stopdensity     1   11.6621 110.073 11.9159
## - Med_inc         1   15.1198 113.530 15.5037
## - education       1   31.5235 129.934 31.1587
## 
## Step:  AIC=-1.05
## logtripsin ~ commuters + vehicle_number + Med_inc + pop + education + 
##     sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq     RSS     AIC
## - pop             1    0.3414  98.777 -2.6439
## <none>                         98.436 -1.0455
## - sex_ratio       1    2.0001 100.436 -0.7121
## - Med_Age         1    2.0467 100.483 -0.6583
## - commuters       1    5.0324 103.468  2.7383
## - stopdensity     1   11.6419 110.078  9.9212
## - vehicle_number  1   11.9314 110.367 10.2259
## - Med_inc         1   15.2824 113.718 13.6955
## - education       1   31.5017 129.938 29.1618
## 
## Step:  AIC=-2.64
## logtripsin ~ commuters + vehicle_number + Med_inc + education + 
##     sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq     RSS    AIC
## <none>                         98.777 -2.644
## - sex_ratio       1     1.729 100.506 -2.631
## - Med_Age         1     1.869 100.646 -2.469
## - stopdensity     1    11.372 110.150  7.997
## - Med_inc         1    15.896 114.674 12.666
## - vehicle_number  1    17.128 115.905 13.905
## - commuters       1    23.430 122.208 20.047
## - education       1    40.413 139.190 35.141

## 
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + 
##     education + sex_ratio + Med_Age + stopdensity, data = Model)
## 
## Coefficients:
##    (Intercept)       commuters  vehicle_number         Med_inc       education  
##      5.613e-01       1.391e-03      -1.788e-03      -2.409e-05       1.782e+00  
##      sex_ratio         Med_Age     stopdensity  
##      7.775e-01      -3.007e-02       6.000e-03

The summary of log-transformed model shows that total population, population density, sex ratio and median age are not significantly correlated to log-transformed inflow trips. Hence, using stepwise regression to select fitted variables. The stepwise method only excludes total population and population density.

Model1_log_stepwise<-lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + 
                    education + sex_ratio + Med_Age + stopdensity, data = Model)
summary(Model1_log_stepwise)

## 
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + 
##     education + sex_ratio + Med_Age + stopdensity, data = Model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1798 -0.5744 -0.1060  0.5694  2.9992 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.613e-01  8.595e-01   0.653  0.51508    
## commuters       1.391e-03  2.749e-04   5.061 1.72e-06 ***
## vehicle_number -1.788e-03  4.131e-04  -4.327 3.38e-05 ***
## Med_inc        -2.409e-05  5.778e-06  -4.169 6.19e-05 ***
## education       1.782e+00  2.681e-01   6.647 1.26e-09 ***
## sex_ratio       7.775e-01  5.655e-01   1.375  0.17201    
## Med_Age        -3.007e-02  2.103e-02  -1.430  0.15572    
## stopdensity     6.000e-03  1.701e-03   3.526  0.00062 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9563 on 108 degrees of freedom
## Multiple R-squared:    0.7,  Adjusted R-squared:  0.6805 
## F-statistic:    36 on 7 and 108 DF,  p-value: < 2.2e-16

vif(Model1_log_stepwise)

##      commuters vehicle_number        Med_inc      education      sex_ratio 
##       6.494096       6.398672       3.691302       3.266617       1.268209 
##        Med_Age    stopdensity 
##       2.046143       1.355672

Model1_log_stepwise_renewed<-lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + education + stopdensity, data = Model)
summary(Model1_log_stepwise_renewed)

## 
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + 
##     education + stopdensity, data = Model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.47359 -0.62628 -0.05064  0.54694  2.49811 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.685e-01  6.003e-01   1.114    0.268    
## commuters       1.559e-03  2.546e-04   6.123 1.45e-08 ***
## vehicle_number -1.964e-03  4.010e-04  -4.897 3.37e-06 ***
## Med_inc        -2.634e-05  5.696e-06  -4.624 1.03e-05 ***
## education       1.687e+00  2.541e-01   6.640 1.23e-09 ***
## stopdensity     6.712e-03  1.605e-03   4.183 5.80e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9634 on 110 degrees of freedom
## Multiple R-squared:  0.6899, Adjusted R-squared:  0.6758 
## F-statistic: 48.95 on 5 and 110 DF,  p-value: < 2.2e-16

vif(Model1_log_stepwise_renewed)

##      commuters vehicle_number        Med_inc      education    stopdensity 
##       5.487730       5.941523       3.536103       2.891618       1.188357

anova(Model1_log_stepwise,Model1_log_stepwise_renewed)

## Analysis of Variance Table
## 
## Model 1: logtripsin ~ commuters + vehicle_number + Med_inc + education + 
##     sex_ratio + Med_Age + stopdensity
## Model 2: logtripsin ~ commuters + vehicle_number + Med_inc + education + 
##     stopdensity
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1    108  98.777                           
## 2    110 102.086 -2   -3.3092 1.8091 0.1687

AIC(Model1_log_stepwise,Model1_log_stepwise_renewed)

##                             df      AIC
## Model1_log_stepwise          9 328.5498
## Model1_log_stepwise_renewed  7 328.3724

Since the sex ratio and median age are not significantly correlated to inflow trips, exclude them from the stepwise log-transformed model to build a new one. The Anova analysis p value is 0.1687, which is larger than 0.05, indicating that excluding sex ratio and median age has no influence on the inflow trips. Besides, the AIC analysis indicates that the latter model is more fitted because it has a smaller AIC of the model without sex ratio and median age.

layout(matrix(c(1,2,3,4),2,2))
plot(Model1_log_stepwise_renewed)

The regression diagnostics indicate the model has a normal-distributed residuals, the explanatory variables and inflow trips are linear relationship, and there are no influential points in the model. Hence, with the R*2 equals to 0.6758, the model could explain the inflow trips in Minneapolis efficiently.

Therefore, the inflow trips model is : ln(Inflow trips) = 1.559e-03commuters - 1.964e-03vehicle member - 2.634e-05median income + 1.687e+00edcuation +6.712e-03stop density + 6.685e-01

TripinModel<-lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + 
                  education + stopdensity, data = Model)
summary(TripinModel)

## 
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + 
##     education + stopdensity, data = Model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.47359 -0.62628 -0.05064  0.54694  2.49811 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.685e-01  6.003e-01   1.114    0.268    
## commuters       1.559e-03  2.546e-04   6.123 1.45e-08 ***
## vehicle_number -1.964e-03  4.010e-04  -4.897 3.37e-06 ***
## Med_inc        -2.634e-05  5.696e-06  -4.624 1.03e-05 ***
## education       1.687e+00  2.541e-01   6.640 1.23e-09 ***
## stopdensity     6.712e-03  1.605e-03   4.183 5.80e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9634 on 110 degrees of freedom
## Multiple R-squared:  0.6899, Adjusted R-squared:  0.6758 
## F-statistic: 48.95 on 5 and 110 DF,  p-value: < 2.2e-16

Building Outflow Model

Procedure same as building the inflow model.

Model2<-lm(trips_out~commuters+vehicle_number+Med_inc+pop+pop_density
           +education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model2)#med_inc,sex ratio,med_age not significant

## 
## Call:
## lm(formula = trips_out ~ commuters + vehicle_number + Med_inc + 
##     pop + pop_density + education + sex_ratio + Med_Age + stopdensity, 
##     data = Model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10230.3  -1159.3    119.8   1438.5  17129.4 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.395e+04  3.472e+03  -4.019 0.000110 ***
## commuters       4.371e+00  1.641e+00   2.664 0.008939 ** 
## vehicle_number -8.487e+00  1.543e+00  -5.500 2.65e-07 ***
## Med_inc         1.579e-02  2.071e-02   0.762 0.447554    
## pop             1.547e+00  5.088e-01   3.041 0.002973 ** 
## pop_density    -2.569e-01  6.738e-02  -3.813 0.000231 ***
## education       2.197e+03  1.057e+03   2.080 0.039981 *  
## sex_ratio       2.703e+03  1.908e+03   1.417 0.159460    
## Med_Age        -1.284e+00  7.028e+01  -0.018 0.985459    
## stopdensity     4.883e+01  5.612e+00   8.702 4.63e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3133 on 106 degrees of freedom
## Multiple R-squared:  0.7429, Adjusted R-squared:  0.7211 
## F-statistic: 34.04 on 9 and 106 DF,  p-value: < 2.2e-16

step(Model2)

## Start:  AIC=1877.11
## trips_out ~ commuters + vehicle_number + Med_inc + pop + pop_density + 
##     education + sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq        RSS    AIC
## - Med_Age         1      3277 1040737462 1875.1
## - Med_inc         1   5705904 1046440089 1875.8
## <none>                        1040734185 1877.1
## - sex_ratio       1  19709672 1060443857 1877.3
## - education       1  42458553 1083192738 1879.8
## - commuters       1  69654235 1110388420 1882.6
## - pop             1  90786063 1131520247 1884.8
## - pop_density     1 142737705 1183471890 1890.0
## - vehicle_number  1 296977565 1337711749 1904.2
## - stopdensity     1 743476693 1784210878 1937.6
## 
## Step:  AIC=1875.11
## trips_out ~ commuters + vehicle_number + Med_inc + pop + pop_density + 
##     education + sex_ratio + stopdensity
## 
##                  Df Sum of Sq        RSS    AIC
## - Med_inc         1   5717343 1046454805 1873.8
## <none>                        1040737462 1875.1
## - sex_ratio       1  19942625 1060680086 1875.3
## - education       1  47796807 1088534269 1878.3
## - commuters       1  75820901 1116558362 1881.3
## - pop             1  91626174 1132363636 1882.9
## - pop_density     1 145810617 1186548078 1888.3
## - vehicle_number  1 317414916 1358152378 1904.0
## - stopdensity     1 744352482 1785089943 1935.7
## 
## Step:  AIC=1873.75
## trips_out ~ commuters + vehicle_number + pop + pop_density + 
##     education + sex_ratio + stopdensity
## 
##                  Df Sum of Sq        RSS    AIC
## <none>                        1046454805 1873.8
## - sex_ratio       1  19539025 1065993830 1873.9
## - commuters       1  78840787 1125295592 1880.2
## - pop             1 124932018 1171386823 1884.8
## - pop_density     1 159793202 1206248007 1888.2
## - education       1 179627496 1226082301 1890.1
## - vehicle_number  1 402629520 1449084325 1909.5
## - stopdensity     1 777374754 1823829559 1936.2

## 
## Call:
## lm(formula = trips_out ~ commuters + vehicle_number + pop + pop_density + 
##     education + sex_ratio + stopdensity, data = Model)
## 
## Coefficients:
##    (Intercept)       commuters  vehicle_number             pop     pop_density  
##     -1.491e+04       3.710e+00      -7.841e+00       1.685e+00      -2.651e-01  
##      education       sex_ratio     stopdensity  
##      2.766e+03       2.671e+03       4.942e+01

Model2_stepwise<-lm(formula = trips_out ~ commuters + vehicle_number + pop + pop_density + 
     education + sex_ratio + stopdensity, data = Model)
summary(Model2_stepwise)

## 
## Call:
## lm(formula = trips_out ~ commuters + vehicle_number + pop + pop_density + 
##     education + sex_ratio + stopdensity, data = Model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10576.1  -1060.1    128.1   1441.4  17381.4 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.491e+04  3.044e+03  -4.898 3.41e-06 ***
## commuters       3.710e+00  1.301e+00   2.853 0.005199 ** 
## vehicle_number -7.841e+00  1.216e+00  -6.446 3.30e-09 ***
## pop             1.685e+00  4.693e-01   3.591 0.000498 ***
## pop_density    -2.651e-01  6.528e-02  -4.061 9.28e-05 ***
## education       2.766e+03  6.423e+02   4.306 3.68e-05 ***
## sex_ratio       2.671e+03  1.881e+03   1.420 0.158474    
## stopdensity     4.942e+01  5.517e+00   8.957 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3113 on 108 degrees of freedom
## Multiple R-squared:  0.7415, Adjusted R-squared:  0.7248 
## F-statistic: 44.26 on 7 and 108 DF,  p-value: < 2.2e-16

vif(Model2_stepwise)

##      commuters vehicle_number            pop    pop_density      education 
##      13.720438       5.237206       6.038607       1.526824       1.770243 
##      sex_ratio    stopdensity 
##       1.324355       1.345655

Model2_stepwise_renewed<-lm(formula = trips_out ~ vehicle_number + pop + pop_density + education + stopdensity, data = Model)
summary(Model2_stepwise_renewed)

## 
## Call:
## lm(formula = trips_out ~ vehicle_number + pop + pop_density + 
##     education + stopdensity, data = Model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10879.8  -1129.6   -104.2   1446.8  17317.9 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.616e+04  2.012e+03  -8.031 1.17e-12 ***
## vehicle_number -5.328e+00  8.962e-01  -5.945 3.31e-08 ***
## pop             2.751e+00  2.777e-01   9.908  < 2e-16 ***
## pop_density    -1.582e-01  5.997e-02  -2.637  0.00957 ** 
## education       3.491e+03  6.193e+02   5.637 1.35e-07 ***
## stopdensity     5.618e+01  5.309e+00  10.583  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3262 on 110 degrees of freedom
## Multiple R-squared:  0.7109, Adjusted R-squared:  0.6978 
## F-statistic: 54.11 on 5 and 110 DF,  p-value: < 2.2e-16

vif(Model2_stepwise_renewed)

## vehicle_number            pop    pop_density      education    stopdensity 
##       2.589139       1.925317       1.173816       1.498684       1.134619

layout(matrix(c(1,2,3,4),2,2))
plot(Model2_stepwise_renewed)

Model$logtripsout<-log(Model$trips_out)
Model2_log<-lm(logtripsout~commuters+vehicle_number+Med_inc+pop+pop_density
               +education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model2_log)

## 
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + 
##     pop + pop_density + education + sex_ratio + Med_Age + stopdensity, 
##     data = Model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7503 -0.7531 -0.1169  0.5225  3.4622 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -6.712e-01  1.248e+00  -0.538 0.591957    
## commuters       1.143e-03  5.901e-04   1.937 0.055373 .  
## vehicle_number -1.871e-03  5.549e-04  -3.372 0.001042 ** 
## Med_inc        -2.537e-05  7.446e-06  -3.407 0.000931 ***
## pop             1.197e-04  1.829e-04   0.654 0.514356    
## pop_density     1.113e-05  2.423e-05   0.459 0.646976    
## education       2.153e+00  3.799e-01   5.667 1.26e-07 ***
## sex_ratio       1.067e+00  6.860e-01   1.556 0.122725    
## Med_Age        -5.135e-02  2.527e-02  -2.032 0.044632 *  
## stopdensity     7.491e-03  2.018e-03   3.712 0.000329 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.127 on 106 degrees of freedom
## Multiple R-squared:  0.6874, Adjusted R-squared:  0.6608 
## F-statistic: 25.89 on 9 and 106 DF,  p-value: < 2.2e-16

step(Model2_log)

## Start:  AIC=37.21
## logtripsout ~ commuters + vehicle_number + Med_inc + pop + pop_density + 
##     education + sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq    RSS    AIC
## - pop_density     1     0.268 134.82 35.444
## - pop             1     0.543 135.10 35.681
## <none>                        134.56 37.214
## - sex_ratio       1     3.073 137.63 37.833
## - commuters       1     4.764 139.32 39.250
## - Med_Age         1     5.243 139.80 39.647
## - vehicle_number  1    14.434 148.99 47.034
## - Med_inc         1    14.730 149.29 47.264
## - stopdensity     1    17.496 152.05 49.393
## - education       1    40.767 175.32 65.913
## 
## Step:  AIC=35.44
## logtripsout ~ commuters + vehicle_number + Med_inc + pop + education + 
##     sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq    RSS    AIC
## - pop             1     0.443 135.27 33.825
## <none>                        134.82 35.444
## - sex_ratio       1     3.198 138.02 36.163
## - commuters       1     5.563 140.39 38.135
## - Med_Age         1     5.733 140.56 38.275
## - Med_inc         1    15.695 150.52 46.218
## - vehicle_number  1    15.975 150.80 46.434
## - stopdensity     1    17.869 152.69 47.881
## - education       1    41.085 175.91 64.300
## 
## Step:  AIC=33.83
## logtripsout ~ commuters + vehicle_number + Med_inc + education + 
##     sex_ratio + Med_Age + stopdensity
## 
##                  Df Sum of Sq    RSS    AIC
## <none>                        135.27 33.825
## - sex_ratio       1     2.822 138.09 34.220
## - Med_Age         1     5.421 140.69 36.383
## - Med_inc         1    16.081 151.35 44.855
## - stopdensity     1    17.500 152.77 45.938
## - vehicle_number  1    22.884 158.15 49.956
## - commuters       1    26.677 161.94 52.705
## - education       1    52.720 187.99 70.003

## 
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + 
##     education + sex_ratio + Med_Age + stopdensity, data = Model)
## 
## Coefficients:
##    (Intercept)       commuters  vehicle_number         Med_inc       education  
##     -1.348e-01       1.485e-03      -2.066e-03      -2.423e-05       2.035e+00  
##      sex_ratio         Med_Age     stopdensity  
##      9.933e-01      -5.121e-02       7.443e-03

Model2_log_stepwise<-lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + 
     education + sex_ratio + Med_Age + stopdensity, data = Model)
summary(Model2_log_stepwise)

## 
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + 
##     education + sex_ratio + Med_Age + stopdensity, data = Model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8510 -0.7568 -0.1007  0.5830  3.4643 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -1.348e-01  1.006e+00  -0.134 0.893658    
## commuters       1.485e-03  3.217e-04   4.615 1.09e-05 ***
## vehicle_number -2.066e-03  4.834e-04  -4.274 4.15e-05 ***
## Med_inc        -2.423e-05  6.761e-06  -3.583 0.000511 ***
## education       2.035e+00  3.137e-01   6.488 2.70e-09 ***
## sex_ratio       9.933e-01  6.618e-01   1.501 0.136267    
## Med_Age        -5.121e-02  2.461e-02  -2.080 0.039850 *  
## stopdensity     7.443e-03  1.991e-03   3.738 0.000298 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.119 on 108 degrees of freedom
## Multiple R-squared:  0.6857, Adjusted R-squared:  0.6653 
## F-statistic: 33.66 on 7 and 108 DF,  p-value: < 2.2e-16

vif(Model2_log_stepwise)

##      commuters vehicle_number        Med_inc      education      sex_ratio 
##       6.494096       6.398672       3.691302       3.266617       1.268209 
##        Med_Age    stopdensity 
##       2.046143       1.355672

Model2_log_stepwise_renewed<-lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + education + Med_Age + stopdensity, data = Model)
summary(Model2_log_stepwise_renewed)

## 
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + 
##     education + Med_Age + stopdensity, data = Model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8913 -0.7736 -0.0959  0.7640  3.2131 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     6.854e-01  8.492e-01   0.807 0.421400    
## commuters       1.520e-03  3.227e-04   4.710 7.33e-06 ***
## vehicle_number -2.111e-03  4.853e-04  -4.349 3.09e-05 ***
## Med_inc        -2.565e-05  6.733e-06  -3.809 0.000231 ***
## education       2.064e+00  3.149e-01   6.555 1.92e-09 ***
## Med_Age        -4.798e-02  2.466e-02  -1.946 0.054276 .  
## stopdensity     8.484e-03  1.877e-03   4.520 1.58e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.126 on 109 degrees of freedom
## Multiple R-squared:  0.6791, Adjusted R-squared:  0.6615 
## F-statistic: 38.45 on 6 and 109 DF,  p-value: < 2.2e-16

vif(Model2_log_stepwise_renewed)

##      commuters vehicle_number        Med_inc      education        Med_Age 
##       6.459487       6.374968       3.619008       3.254371       2.030517 
##    stopdensity 
##       1.191183

anova(Model2_log_stepwise,Model2_log_stepwise_renewed)

## Analysis of Variance Table
## 
## Model 1: logtripsout ~ commuters + vehicle_number + Med_inc + education + 
##     sex_ratio + Med_Age + stopdensity
## Model 2: logtripsout ~ commuters + vehicle_number + Med_inc + education + 
##     Med_Age + stopdensity
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    108 135.27                           
## 2    109 138.09 -1   -2.8219 2.2531 0.1363

AIC(Model2_log_stepwise,Model2_log_stepwise_renewed)

##                             df      AIC
## Model2_log_stepwise          9 365.0189
## Model2_log_stepwise_renewed  8 365.4139

layout(matrix(c(1,2,3,4),2,2))
plot(Model2_log_stepwise_renewed)

TripoutModel<-lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + education + Med_Age + stopdensity, data = Model)

The outflow trips model is : ln(Outflow trips) = 1.520e-03commuters - 2.111e-03vehicle number - 2.565e-05median income + 2.064e+00edcuation -4.798e-02Median Age + 8.484e-03stop density + 6.854e-01

Inflow and Outflow Models

Inflow model: ln(Inflow trips) = 1.559e-03commuters - 1.964e-03vehicle number - 2.634e-05median income + 1.687e+00edcuation +6.712e-03stop density + 6.685e-01

Outflow model: ln(Outflow trips) = 1.520e-03commuters - 2.111e-03vehicle number - 2.565e-05median income + 2.064e+00edcuation -4.798e-02Median Age + 8.484e-03stop density + 6.854e-01

E-Scooter Trips Prediction in Philadelphia

Prepare variables

phillycensus<-read.csv("philly census.csv",stringsAsFactors=FALSE)
  #commuters
phillycensus$commuters<-phillycensus$Workers.16.Years.and.Over.-phillycensus$Workers.16.Years.and.Over..Worked.At.Home
  #vehicle number
phillycensus$vehicle_number<-phillycensus$Aggregate.Number.Of.Vehicles..Car.Truck.Or.Van..Used.In.Commuting
  #median income
phillycensus$Med_inc<-phillycensus$Median.Household.Income..In.2018.Inflation.Adjusted.Dollars.
  #education
phillycensus$education<-(1*phillycensus$X..Population.25.Years.and.Over..Less.than.High.School+2*phillycensus$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.
                         +3*phillycensus$X..Population.25.Years.and.Over..Some.College+4*phillycensus$X..Population.25.Years.and.Over..Bachelor.s.Degree
                         +5*phillycensus$X..Population.25.Years.and.Over..Master.s.Degree+6*phillycensus$X..Population.25.Years.and.Over..Professional.School.Degree
                         +7*phillycensus$X..Population.25.Years.and.Over..Doctorate.Degree)/100
  #median age

phillycensus2<-get_acs(geography = "tract", 
                           variables = c("B01003_001", "B19013_001", 
                                         "B02001_002", "B08013_001",
                                         "B08012_001", "B08301_001", 
                                         "B08301_010", "B01002_001"), 
                           year = 2018, 
                           state = "PA", 
                           geometry = TRUE, 
                           county=c("Philadelphia"),
                           output = "wide") %>%
  rename(Total_Pop =  B01003_001E,
         Med_Inc = B19013_001E,
         Med_Age = B01002_001E,
         White_Pop = B02001_002E,
         Travel_Time = B08013_001E,
         Num_Commuters = B08012_001E,
         Means_of_Transport = B08301_001E,
         Total_Public_Trans = B08301_010E) 
phillyage<-phillycensus2[,c("Med_Age","GEOID")]
phillycensus<-merge(phillycensus,phillyage,by.x="FIPS",by.y="GEOID",all=TRUE)
# Transit Accessibility
phillyTracts <- 
  phillycensus2 %>%
  as.data.frame() %>%
  distinct(GEOID, .keep_all = TRUE) %>%
  dplyr::select(GEOID, geometry) %>% 
  st_sf %>% 
  st_transform(102271)
Phillytracts_boundary<-st_read("Census_Tracts_2010.shp") %>%
  st_transform(crs=4326) 

Phillybusstops<-read.csv("SEPTA__Bus_Stops.csv")
busstops <- Phillybusstops %>% 
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
  st_transform(4326)
busstops_join <- st_intersection(busstops,Phillytracts_boundary) %>% 
  dplyr::select(-8)
busstop_counts <- busstops_join%>%
  mutate(busstops_join$GEOID10<-as.numeric(GEOID10))%>%
  group_by(busstops_join$GEOID10)%>%
  summarise(bustopcounts=n())

Phillytrolleystops<-read.csv("SEPTA__Trolley_Stops.csv")
trolleystops<-Phillytrolleystops %>% 
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
  st_transform(4326)
trolleystops_join <- st_intersection(trolleystops,Phillytracts_boundary) %>% 
  dplyr::select(-8)
trolleystop_counts <- trolleystops_join%>%
  mutate(trolleystops_join$GEOID10<-as.numeric(GEOID10))%>%
  group_by(trolleystops_join$GEOID10)%>%
  summarise(trolleystoopcounts=n())

PhillyMFL<-read.csv("SEPTA__MarketFrankford_Line_Stations.csv")
MFLstops<-PhillyMFL %>% 
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
  st_transform(4326)
MFL_join <- st_intersection(MFLstops,Phillytracts_boundary) %>% 
  dplyr::select(-8)
MFLstop_counts <- MFL_join%>%
  mutate(MFL_join$GEOID10<-as.numeric(GEOID10))%>%
  group_by(MFL_join$GEOID10)%>%
  summarise(MFLstopcounts=n())

PhillyBSL<-read.csv("SEPTA__Broad_Street_Line_Stations.csv")
BSLstops<-PhillyBSL %>% 
  st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
  st_transform(4326)
BSL_join <- st_intersection(BSLstops,Phillytracts_boundary) %>% 
  dplyr::select(-8)
BSLstop_counts <- BSL_join%>%
  mutate(BSL_join$GEOID10<-as.numeric(GEOID10))%>%
  group_by(BSL_join$GEOID10)%>%
  summarise(BSLstopscounts=n())

PhillyNHL<-read.csv("SEPTA__Norristown_Highspeed_Line_Stations.csv")
NHLstops<-PhillyNHL%>%
  st_as_sf(coords=c("LONGITUDE","LATITUDE"),crs=4326, agr="constant")%>%
  st_transform(4326)
NHL_join<-st_intersection(NHLstops,Phillytracts_boundary) %>% 
  dplyr::select(-8)#No NHL

PhillyRL<-read.csv("SEPTA__Regional_Rail_Stations.csv")
RLstops<-PhillyRL%>%
  st_as_sf(coords=c("LONGITUDE","LATITUDE"),crs=4326, agr="constant")%>%
  st_transform(4326)
RL_join<-st_intersection(RLstops,Phillytracts_boundary) %>% 
  dplyr::select(-8)
RLstop_counts <- RL_join%>%
  mutate(RL_join$GEOID10<-as.numeric(GEOID10))%>%
  group_by(RL_join$GEOID10)%>%
  summarise(RLstopscounts=n())

phillycensus<-merge(phillycensus,busstop_counts,by.x="FIPS", by.y="busstops_join$GEOID10",all=FALSE)
phillycensus<-merge(phillycensus,trolleystop_counts,by.x="FIPS",by.y="trolleystops_join$GEOID10",all=TRUE)
phillycensus<-merge(phillycensus,MFLstop_counts,by.x="FIPS",by.y="MFL_join$GEOID10",all=TRUE)
phillycensus<-merge(phillycensus,BSLstop_counts,by.x="FIPS",by.y="BSL_join$GEOID10",all=TRUE)
phillycensus<-merge(phillycensus,RLstop_counts,by.x="FIPS",by.y="RL_join$GEOID10",all=TRUE)

phillycensus[is.na(phillycensus)] <- 0

phillycensus$stopamount<-1*phillycensus$bustopcounts+2*phillycensus$trolleystoopcounts+3*(phillycensus$BSLstopscounts+phillycensus$MFLstopcounts)+4*phillycensus$RLstopscounts

phillyland<-read.csv("philly land.csv")
phillycensus<-merge(phillycensus,phillyland, by="FIPS",all=TRUE)
phillycensus$stopdensity<-phillycensus$stopamount/phillycensus$Area.Total..Area..Land.

Inflow Trips Prediction

phillycensus$log.tripsin<-TripinModel$coefficients[2]*phillycensus$commuters +TripinModel$coefficients[3]*phillycensus$vehicle_number+TripinModel$coefficients[4]*phillycensus$Med_inc+TripinModel$coefficients[5]*phillycensus$education+TripinModel$coefficients[6]*phillycensus$stopdensity+TripinModel$coefficients[1]
phillycensus$tripsin<-exp(phillycensus$log.tripsin)
options(scipen = 200)
phillycensus$tripsin<-round(phillycensus$tripsin)

philly_prediction<-phillycensus[,c("FIPS","tripsin")]
names(philly_prediction)<-c("GEOID","tripsin")
Phillytracts_boundary$GEOID10<-as.numeric(as.character(Phillytracts_boundary$GEOID10))
Phillytracts_boundary$INTPTLAT10<-as.numeric(as.character(Phillytracts_boundary$INTPTLAT10))
Phillytracts_boundary$INTPTLON10<-as.numeric(as.character(Phillytracts_boundary$INTPTLON10))
philly_prediction<-merge(Phillytracts_boundary,philly_prediction,by.x="GEOID10",by.y="GEOID",all=TRUE)

ggplot()+
  geom_sf(data=philly_prediction,aes(fill=ntile(tripsin,4)))+
  scale_fill_viridis('Inflow Trips - quantile breaks', # Scale label
                     direction = -1, # Direction of the color palette - either 1 or -1
                     option = 'D',  # Viridis has multiple palettes to choose from
                     alpha = 0.6,  # Transparency level
                     labels=as.character(quantile(philly_prediction$tripsin,
                                                  c(.25,.5,.75,1))))+
  labs(title= "Predicted E-Scooter Inflow Trips in Philadelphia") +
  theme.map

Outflow Trips Prediction

phillycensus$log.tripsout<-TripoutModel$coefficients[2]*phillycensus$commuters +TripoutModel$coefficients[3]*phillycensus$vehicle_number+TripoutModel$coefficients[4]*phillycensus$Med_inc+TripoutModel$coefficients[5]*phillycensus$education+TripoutModel$coefficients[6]*phillycensus$Med_Age+TripoutModel$coefficients[7]*phillycensus$stopdensity +TripinModel$coefficients[1]
phillycensus$tripsout<-exp(phillycensus$log.tripsout)
options(scipen = 200)
phillycensus$tripsout<-round(phillycensus$tripsout)

philly_prediction<-phillycensus[,c("FIPS","tripsout","tripsin")]
names(philly_prediction)<-c("GEOID","tripsout","tripsin")
Phillytracts_boundary$GEOID10<-as.numeric(as.character(Phillytracts_boundary$GEOID10))
Phillytracts_boundary$INTPTLAT10<-as.numeric(as.character(Phillytracts_boundary$INTPTLAT10))
Phillytracts_boundary$INTPTLON10<-as.numeric(as.character(Phillytracts_boundary$INTPTLON10))
philly_prediction<-merge(Phillytracts_boundary,philly_prediction,by.x="GEOID10",by.y="GEOID",all=TRUE)
ggplot()+
  geom_sf(data=philly_prediction,aes(fill=ntile(tripsout,4)))+
  scale_fill_viridis('Outflow Trips - quantile breaks', # Scale label
                     direction = -1, # Direction of the color palette - either 1 or -1
                     option = 'D',  # Viridis has multiple palettes to choose from
                     alpha = 0.6,  # Transparency level
                     labels=as.character(quantile(philly_prediction$tripsout,
                                                  c(.25,.5,.75,1))))+
  labs(title= "Predicted E-Scooter Outflow Trips in Philadelphia") +
  theme.map

Absolute Trips in Philadelphia

philly_prediction$abs_trip<-philly_prediction$tripsin-philly_prediction$tripsout
ggplot()+
  geom_sf(data=philly_prediction,aes(fill=ntile(abs_trip,4)))+
  scale_fill_viridis('Absolute Trips - quantile breaks', # Scale label
                     direction = -1, # Direction of the color palette - either 1 or -1
                     option = 'D',  # Viridis has multiple palettes to choose from
                     alpha = 0.6,  # Transparency level
                     labels=as.character(quantile(philly_prediction$abs_trip,
                                                  c(.25,.5,.75,1))))+
  labs(title= "Predicted E-Scooter Absolute Trips (Inflow - Outflow) in Philadelphia") +
  theme.map

The spatial pattern of inflow and outflow trips in Philadelphia are similar. University City and Center City are both popular origin and destination for the impending e-scooter in Philadelphia. While looking into the map of predicted absolute trips, which is the gap between inflow trips and outflow trips, the Center City, University City,Wynnefield, Powelton Village,Belmont Village are several main origins (yellow), while the outskirts of Philadelphia are mostly inflow trips.

Policy Recommendation

To better prepare for the upcoming e-scooter in Philadelphia:

City should regulate the e-scooter speed, right-of-way and other regulations to guarantee the safety of all road users;
City could assign boundary limits to e-scooter pilot program at the beginning of e-scooter rental markets coming to Philadelphia, such as University City and Center City;
E-scooter rental company should provide e-scooters based on the demand of residents;
E-scooter rental company should allocate more staff to collect e-scooters at the outskirts of Philadelphia and move them back to places where inflow trips are less than outflow trips

Further Study

Use hexagons as the units of the analysis, so that the distances between each units and the area are same;
Add land use area as a kind of factor;
Try negative binominal regression to predict the accurate number of trips