Modeling E-Scooter Spatial-Temporal Trip Patterns in Philadelphia: Minneapolis Case Study
Introduction
Micro-mobility is seen as a solution to alleviating the last-mile gridlock, especially shared and dock-less electric scooters. However, legalizing them in Pennsylvania is still at a stand still for two years due to safety concerns. COVID-19 might be a turning point for cities to reconsider e-scooters as a legal transportation method because it enables people to open to the air and keep socially distant at the meantime. This project aims to explore the spatial-temporal e-scooter trip pattern in Minneapolis and build the model to predict the probable spatial pattern of e-scooter in Philadelphia in the future. The project used stepwise regression method with log-transformed dependent variables, outflow trips and inflow trips, to build the models.
Set up
Install these libraries and create themes for graphs and maps
library(ggplot2)
library(dplyr)
library(sf)
library(sp)
library(raster)
library(rgdal)
library(tidycensus)
library(tidyverse)
library(conflicted)
library(viridis)
library(corrplot)
library(lubridate)
library(tigris)
library(car)
theme.graph <- function(base_size = 11, base_family = "Arial", lines_lwd = 0.50, plot_grid = TRUE, axis_font = base_family, title_size = base_size*1.5, legend_size = base_size*0.8,
bg_col = "white",title_font = base_family , base_col = "black", axis_lines = TRUE,
minor_grid = ifelse(plot_grid, TRUE, FALSE), vert_grid = ifelse(plot_grid, TRUE, FALSE), ticks_type = "outer", horz_grid = ifelse(plot_grid, TRUE, FALSE), alpha_leg = 0.1, bord_size = 0,
legend_bg = "white", strip_bg = "white", grid_thick = 1,
grid_type = "solid", ticks_xy = "xy", grid_cols = c("grey50", "grey70")){
theme_bw()+
ggplot2::theme(
plot.margin = grid::unit(c(1, 1, .5, .7), "cm"),
text = ggplot2::element_text(family = base_family, size = base_size),
axis.line = element_line(size = ifelse(axis_lines, grid::unit(lines_lwd, "mm"),0), color = "black"),
axis.ticks.length = grid::unit(ifelse(ticks_type == "outer", 0.15, -0.15), "cm"),
axis.ticks.x = element_line(size = ifelse(stringr::str_detect(ticks_xy, "x"), grid::unit(lines_lwd, "cm"),0), color = "black"),
axis.ticks.y = element_line(size = ifelse(stringr::str_detect(ticks_xy, "y"), grid::unit(lines_lwd, "cm") ,0), color = "black"),
axis.text.x = ggplot2::element_text(angle=0,hjust= .5, vjust= .5, size = base_size, colour = base_col , family = axis_font,margin=margin(ifelse(ticks_type == "inner", 11, 5),5,10,5,"pt")),
axis.text.y = ggplot2::element_text(angle = 0,size = base_size, colour = base_col , family = axis_font, margin=margin(5,ifelse(ticks_type == "inner", 11, 5),10,5,"pt")),
axis.title.y = ggplot2::element_text(angle=90,vjust = .5, margin = margin(r = 15),size = base_size, colour = base_col , family = axis_font),
axis.title.x = ggplot2::element_text(size = base_size,colour = base_col ,vjust = -.5, family = axis_font),
panel.background = ggplot2::element_rect(fill = bg_col),
plot.background = ggplot2::element_rect(fill = bg_col),
panel.border = ggplot2::element_rect(colour = "black", fill=NA, size = bord_size),
panel.grid.major.x = ggplot2::element_line(linetype = grid_type,colour = ifelse(vert_grid, grid_cols[1],bg_col), size = ifelse(vert_grid,0.25 * grid_thick, 0)),
panel.grid.minor.x = ggplot2::element_line(linetype = grid_type,colour = ifelse(vert_grid, ifelse(minor_grid, grid_cols[2 - (length(grid_cols) == 1) ],bg_col),bg_col), size = ifelse(vert_grid,0.15* grid_thick, 0)),
panel.grid.major.y = ggplot2::element_line(linetype = grid_type,colour = ifelse(horz_grid, grid_cols[1],bg_col), size = ifelse(horz_grid,0.25* grid_thick, 0)),
panel.grid.minor.y = ggplot2::element_line(linetype = grid_type,colour = ifelse(horz_grid, ifelse(minor_grid, grid_cols[2 - (length(grid_cols) == 1) ],bg_col),bg_col), size = ifelse(horz_grid,0.15* grid_thick, 0)),
plot.title = ggplot2::element_text(face="bold",vjust = 2, colour = base_col , size = title_size, family = title_font),
legend.background = ggplot2::element_rect(fill = scales::alpha(legend_bg, alpha_leg)), legend.key = ggplot2::element_blank(),
legend.text = ggplot2::element_text(size = legend_size, family = base_family),
legend.title = element_blank(),
strip.background = ggplot2::element_rect(colour = strip_bg, fill = strip_bg),
strip.text.x = ggplot2::element_text(size = base_size + 1),
strip.text.y = ggplot2::element_text(size = base_size + 1)
)
}
theme.widegraph<-theme.graph()+
theme(legend.position = c(.9,.55)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust =.4))
theme.map <-theme(text = element_text(family = "Arial"))+
theme(plot.title =element_text(size=11*1.5),
plot.subtitle = element_text(size=11),
plot.caption = element_text(size = 8),
axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_line(colour = 'transparent'),
panel.grid.minor=element_blank(),
legend.direction = "vertical",
legend.position = "right",
plot.margin = margin(1, 1, 1, 1, 'cm'),
legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))
The fastest speed for E-Scooter is 15 mph and the mximum trip duration is 2.5h.
Exploratory analysis
Distribution of Trip Duration and Distance
75% of e-scooter trips in Minneapolis are within 2 miles and under 15 mins.
Temporal Pattern: By Time of Day and Days by Week
trips$date<-substr(trips$StartTime,1,10)
trips$date<-as.Date(trips$date,"%Y-%m-%d")
trips$week<-weekdays(trips$date)
trips$time<-substr(trips$StartTime,12,16)
trips$time<-strptime(trips$time, format="%H:%M")
trips$time<-as.POSIXct.POSIXlt(trips$time)
daily_counts <- trips%>%
mutate(trips$time<-as.Date(trips$time))%>%
group_by(trips$time)%>%
summarise(dailyCounts=n())
daily_counts$`trips$time`<-as.character(daily_counts$`trips$time`)
daily_counts$time<-substr(daily_counts$`trips$time`,12,16)
ggplot(data=daily_counts,aes(x=time,y=dailyCounts))+geom_col()+labs(x="Time of Day",y="Total Number of Active Trips per Half an Hour")+ggtitle('Minneapolis E-scooter Usage by Time of Day ')+theme.widegraph
daily_countsweek <- trips%>%
mutate(trips$time<-as.Date(trips$time))%>%
group_by(trips$week,trips$time)%>%
summarise(dailyCounts=n())
daily_countsweek$`trips$time`<-as.character(daily_countsweek$`trips$time`)
daily_countsweek$time<-substr(daily_countsweek$`trips$time`,12,16)
daily_countsweek$`trips$week` <- factor(daily_countsweek$`trips$week` , levels = c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
ggplot(data=daily_countsweek,aes(x=time,y=dailyCounts))+geom_col()+labs(x="Time of Day",y="Total Number of Active Trips per Half an Hour")+facet_wrap(~daily_countsweek$`trips$week`,nrow=3,ncol=3)+theme(axis.text = element_blank())+theme.widegraph+ theme(axis.text.x=element_text(size=4))
Peak hours of e-scooter trips in Minneapolis are from Wednesday to Friday during 5pm to 6pm.
Spatial Pattern: Supply and Demand
Spatial Pattern of E-Scooter Supply
Import the shapefile datasets of Minneapolis street centerlines and city boundary; And obtain the census tracts boundary and census data from API.
MPLS_street<-st_read('MPLS_Centerline/MPLS_Centerline.shp')
MPLS_Boundary<-st_read("Minneapolisboundary/Minneapolisboundary.shp")%>%
st_transform(crs=4326)
census_api_key("2c3e9f9d2d65abab5f7b81fe418054415d363a43",install=TRUE,overwrite=TRUE)
Minneapoliscensus<-get_acs(geography = "tract",
variables = c("B01003_001", "B19013_001",
"B02001_002", "B08013_001",
"B08012_001", "B08301_001",
"B08301_010", "B01002_001"),
year = 2018,
state = "MN",
geometry = TRUE,
county=c("Hennepin"),
output = "wide") %>%
rename(Total_Pop = B01003_001E,
Med_Inc = B19013_001E,
Med_Age = B01002_001E,
White_Pop = B02001_002E,
Travel_Time = B08013_001E,
Num_Commuters = B08012_001E,
Means_of_Transport = B08301_001E,
Total_Public_Trans = B08301_010E) %>%
dplyr::select(Total_Pop, Med_Inc, White_Pop, Travel_Time,
Means_of_Transport, Total_Public_Trans,
Med_Age,
GEOID, geometry) %>%
mutate(Percent_White = White_Pop / Total_Pop,
Mean_Commute_Time = Travel_Time / Total_Public_Trans,
Percent_Taking_Public_Trans = Total_Public_Trans / Means_of_Transport)
Minneapolistract <-
Minneapoliscensus%>%
as.data.frame() %>%
distinct(GEOID, .keep_all = TRUE) %>%
dplyr::select(GEOID, geometry) %>%
st_sf %>%
st_transform(4326)
names(Minneapolistract)<-c("GEOID10","geometry")
Minneapolis <- st_intersection(Minneapolistract,MPLS_Boundary)
Spatial Pattern of E-Scooter Supply by Street Centerlines
availability$time<-substr(availability$PollTime,1,10)
avail<-availability %>%
group_by(time,ClosestCenterlineID)%>%
summarise(count=n(),
sum_avail=sum(NumberAvailable))
avail2<-avail %>%
group_by(ClosestCenterlineID)%>%
summarise(count=n(),
avg_avail=mean(sum_avail))
avail2$avg_avail<-round(avail2$avg_avail)
MPLS_street<-left_join(MPLS_street,avail2,by=c('GBSID'='ClosestCenterlineID'))
MPLS_street<-MPLS_street%>%
dplyr::select(OBJECTID,STREETALL,TYPE,SPEED_LIM,GBSID,count,avg_avail,geometry)
1-(sum(is.na(MPLS_street$avg_avail))/length(MPLS_street$avg_avail))# What percentage of streets in Minneapolis have been provided with e-scooters?
## [1] 0.69854
ggplot() +
geom_sf(data = Minneapolis) +
geom_sf(data = MPLS_street,aes(colour=ntile(avg_avail,4)), size=0.5,show.legend = 'line') +
scale_colour_viridis('E-Scooter Availability - Quantile Breaks', # Scale label
direction = -1, # Direction of the color palette - either 1 or -1
option = 'D', # Viridis has multiple palettes to choose from
alpha = 0.6, # Transparency level
labels=as.character(quantile(MPLS_street$avg_avail,
c(.25,.5,.75,1),na.rm=TRUE)))+
labs(title= "Daily E-Scooter Availability in Minneapolis ") +
theme.map
Spatial Pattern of E-Scooter Supply by Census Tracts
census_street<-st_join(Minneapolistract,MPLS_street)
census_street<-census_street%>%
group_by(GEOID10)%>%
summarize(Tot_avail=sum(avg_avail,na.rm = TRUE))
census_street2<-st_intersection(census_street,MPLS_Boundary)
ggplot() +
geom_sf(data=census_street2,aes(fill=Tot_avail))+
scale_fill_gradient(low='white',high='steelblue')+
labs(
title = "Minneapolis E-Scooter Availability by Census Tracts"
)+
theme.map
Both two maps show that most of e-scooters are provided in the center and northeast of the city, which are Gateway District, Southeast Como Neighborhood and Dinkytown where University of Minnesota is located.
Spatial Pattern of E-Scooter Demand
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10011 17078 18699 19274 21318 26148
Start<-table(trips$StartCenterlineID)
Startstreet<-as.data.frame(Start)
names(Startstreet)<-c("Street", "outtrips")
Endstreet<-as.data.frame(table(trips$EndCenterlineID))
names(Endstreet)<-c("Street", "intrips")
Trips_data<-merge(Startstreet,Endstreet,by="Street",all=TRUE)
Trips_data[is.na(Trips_data)] <- 0
Trips_data$absolute_trip<-Trips_data$intrips-Trips_data$outtrips
Trips_data$Street<-as.character(Trips_data$Street)
MPLS_street$GBSID<-as.character(MPLS_street$GBSID)
MPLS_street<-left_join(MPLS_street,Trips_data,by=c('GBSID'='Street'))
census_street_demand<-st_join(Minneapolistract,MPLS_street)
census_street_demand<-census_street_demand%>%
group_by(GEOID10)%>%
summarize(Abs_trips=sum(absolute_trip,na.rm = TRUE))
census_street_demand<-st_intersection(census_street_demand,MPLS_Boundary)
ggplot() +
geom_sf(data=census_street_demand,aes(fill=Abs_trips))+
scale_fill_viridis('Absolute Trips',
option='D',
alpha = .4)+
labs(
title = "Minneapolis E-Scooter Demands by Census Tracts"
)+
geom_sf_label(data=census_street_demand,aes(label=Abs_trips),size=1)+
theme.map
The absolute trips equal to inflow trips minus outflow trips. Hence, the negative value represent that the census tracts are mainly the origination for citizens, while the positive value represent that the census tracts are mainly the destination. From the map, we could see that Central Minneapolis are an important origin area in Minneapolis, and while the inflow trips are larger than outflow trips in other census tracts, Southeast Como Neighborhood and Dinkytown are the two main destinations in Minneapolis.
Model Building
Explanatory and Dependant Variables
Chosen explanatory variables include: 1. Number of commuters; 2. Number of vehicle in commuting; 3. Median household income; 4. Total population; 5. Population density (per sq mile); 6. Education level; 7. Sex ratio (Male/Female); 8. Median age; 9. Transit stops density
Chosen dependant variables include: 1. Inflow e-scooter trips; 2. Outflow e-scooter trips
ct<-read.csv("census tract.csv")
#commuting
commuting<-read.csv("commuting.csv")
commuting<-commuting[2:300,]
commuting$S0801_C01_001E<-as.character(commuting$S0801_C01_001E)
commuting$S0801_C01_001E<-as.numeric(commuting$S0801_C01_001E)
commuting$S0801_C01_013E<-as.character(commuting$S0801_C01_013E)
commuting$S0801_C01_013E<-as.numeric(commuting$S0801_C01_013E)
#commuters=total workers-work at home
commuting$commuters<-commuting$S0801_C01_001E-(commuting$S0801_C01_013E*commuting$S0801_C01_001E*0.01)
commuting$commuters<-round(commuting$commuters)
commuting$NAME<-as.character(commuting$NAME)
commuting$NAME<-str_sub(commuting$NAME,14,-23)
commuting$NAME<-str_sub(commuting$NAME,end=-6)
commuting$NAME<-str_sub(commuting$NAME,end=-2)
commuting$NAME<-as.numeric(commuting$NAME)
commuters<-commuting[,c("NAME","commuters")]
ct<-merge(ct,commuters,by.x="NAME10",by.y="NAME",all=FALSE)
social<-read.csv("social explorer data.csv")
social<-social[2:300,]
social$Census.Tract<-as.numeric(as.character(social$Census.Tract))
social$Census.Tract<-social$Census.Tract/100
#vehicle amount
vehicle<-social[,c("Census.Tract","Aggregate.Number.Of.Vehicles..Car.Truck.Or.Van..Used.In.Commuting")]
names(vehicle)<-c("tract","number of vehicle in commuting")
ct<-merge(ct,vehicle, by.x="NAME10", by.y="tract",all=FALSE)
#household income
income<-social[,c("Census.Tract","Median.Household.Income..In.2018.Inflation.Adjusted.Dollars.")]
names(income)<-c("tract","median household income")
ct<-merge(ct,income, by.x="NAME10", by.y="tract",all=FALSE)
#population and density
pop<-social[,c("Census.Tract","Total.Population","Population.Density..Per.Sq..Mile.")]
names(pop)<-c("tract","pop","pop density(per sq mile)")
ct<-merge(ct,pop, by.x="NAME10", by.y="tract",all=FALSE)
#Education: 1-less than high school,2-high school,3-college,4-bachelor,5-master, 6-professional school,7-doctorate
social$X..Population.25.Years.and.Over..Less.than.High.School<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Less.than.High.School))
social$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.<-as.numeric(as.character(social$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.))
social$X..Population.25.Years.and.Over..Some.College<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Some.College))
social$X..Population.25.Years.and.Over..Bachelor.s.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Bachelor.s.Degree))
social$X..Population.25.Years.and.Over..Master.s.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Master.s.Degree))
social$X..Population.25.Years.and.Over..Professional.School.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Professional.School.Degree))
social$X..Population.25.Years.and.Over..Doctorate.Degree<-as.numeric(as.character(social$X..Population.25.Years.and.Over..Doctorate.Degree))
social$education_level<-(1*social$X..Population.25.Years.and.Over..Less.than.High.School+2*social$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.
+3*social$X..Population.25.Years.and.Over..Some.College+4*social$X..Population.25.Years.and.Over..Bachelor.s.Degree
+5*social$X..Population.25.Years.and.Over..Master.s.Degree+6*social$X..Population.25.Years.and.Over..Professional.School.Degree
+7*social$X..Population.25.Years.and.Over..Doctorate.Degree)/100
Education<-social[,c("Census.Tract","education_level")]
names(Education)<-c("tract","education level")
ct<-merge(ct,Education, by.x="NAME10", by.y="tract",all=FALSE)
#Age
medianage<-Minneapoliscensus[,c("GEOID","Med_Age")]
ct<-merge(ct,medianage, by.x="GEOID10", by.y="GEOID",all=FALSE)
#Sex
social$Total.Population..Male<-as.numeric(as.character(social$Total.Population..Male))
social$Total.Population..Female<-as.numeric(as.character(social$Total.Population..Female))
social$sexratio<-social$Total.Population..Male/social$Total.Population..Female
sex<-social[,c("Census.Tract","sexratio")]
names(sex)<-c("tract","sex ratio")
ct<-merge(ct,sex,by.x="NAME10", by.y="tract",all=FALSE)
#Transit Accessibility
Minneapolisstops<-read.csv("stops.csv")
stops <- Minneapolisstops %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
st_transform(crs=4326)
stops_join <- st_intersection(stops,Minneapolis) %>%
dplyr::select(-8)
stops_join$GEOID10<-as.numeric(stops_join$GEOID10)
stop_counts <- stops_join%>%
mutate(stops_join$GEOID10<-as.numeric(GEOID10))%>%
group_by(stops_join$GEOID10)%>%
summarise(stopcounts=n())
stop_counts
names(stop_counts)<-c("GEOID","stopcounts","geometry")
ct<-merge(ct,stop_counts,by.x="GEOID10", by.y="GEOID",all=FALSE)
ct$stopdensity<-ct$stopcounts/ct$LAND_SQML
Preliminary Tests
Model<-ct[,c("GEOID10","NAME10","NAMELSAD10","INTPTLAT10","INTPTLON10","Trips__abs","Trips__out","Trips__int","commuters","number of vehicle in commuting","median household income","pop","pop density(per sq mile)","education level", "sex ratio","Med_Age","stopdensity")]
Model$`number of vehicle in commuting`<-as.numeric(as.character(Model$`number of vehicle in commuting`))
Model$`median household income`<-as.numeric(as.character(Model$`median household income`))
Model$pop<-as.numeric(as.character(Model$pop))
Model$`pop density(per sq mile)`<-as.numeric(as.character(Model$`pop density(per sq mile)`))
names(Model)<-c("GEOID","tract","censustract","LAT","LON","trips_abs","trips_out","trips_in","commuters","vehicle_number","Med_inc","pop","pop_density","education","sex_ratio","Med_Age","stopdensity")
library(psych)
Explanatory<-Model[,9:17]
## [1] 114.7813
res1 <- cor.mtest(Explanatory, conf.level = .95)
corrplot(corr=explanatory_cor,method="color",order="AOE",type="upper",tl.pos = "d",tl.cex=0.4,p.mat = res1$p, sig.level = .01)
corrplot(corr=explanatory_cor,add=TRUE, type="lower",method="number",order="AOE",diag = FALSE,tl.pos="n", cl.pos="n",p.mat = res1$p, sig.level = .01)
The chart shows that there are significant correlation between commuters, population and number of vehicles. Using Kappa() function in R to calculate the IV matrix condition number k value could also testify the multicollinearity. The k value is 114.8 which is larger than 100 and smaller than 1000, indicating strong multicollinearity between independent variables.
Building Inflow Model
First, try the multiple linear regression of all the chosen explanatory variables and inflow trips.
Model1<-lm(trips_in~commuters+vehicle_number+Med_inc+pop+pop_density
+education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model1)
##
## Call:
## lm(formula = trips_in ~ commuters + vehicle_number + Med_inc +
## pop + pop_density + education + sex_ratio + Med_Age + stopdensity,
## data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9166.3 -1076.2 186.8 1349.6 14159.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.218e+04 3.074e+03 -3.962 0.000135 ***
## commuters 4.519e+00 1.453e+00 3.110 0.002402 **
## vehicle_number -8.190e+00 1.366e+00 -5.994 2.86e-08 ***
## Med_inc 1.108e-02 1.834e-02 0.605 0.546774
## pop 1.327e+00 4.505e-01 2.945 0.003965 **
## pop_density -2.321e-01 5.965e-02 -3.891 0.000175 ***
## education 2.175e+03 9.355e+02 2.325 0.021973 *
## sex_ratio 2.208e+03 1.689e+03 1.307 0.194092
## Med_Age -1.037e+01 6.222e+01 -0.167 0.867914
## stopdensity 4.057e+01 4.968e+00 8.165 7.19e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2774 on 106 degrees of freedom
## Multiple R-squared: 0.7527, Adjusted R-squared: 0.7317
## F-statistic: 35.86 on 9 and 106 DF, p-value: < 2.2e-16
## Start: AIC=1848.87
## trips_in ~ commuters + vehicle_number + Med_inc + pop + pop_density +
## education + sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## - Med_Age 1 213908 816042329 1846.9
## - Med_inc 1 2812867 818641287 1847.3
## - sex_ratio 1 13144509 828972929 1848.7
## <none> 815828420 1848.9
## - education 1 41606912 857435332 1852.6
## - pop 1 66769784 882598204 1856.0
## - commuters 1 74452095 890280515 1857.0
## - pop_density 1 116500777 932329198 1862.3
## - vehicle_number 1 276549481 1092377902 1880.7
## - stopdensity 1 513160796 1328989216 1903.5
##
## Step: AIC=1846.9
## trips_in ~ commuters + vehicle_number + Med_inc + pop + pop_density +
## education + sex_ratio + stopdensity
##
## Df Sum of Sq RSS AIC
## - Med_inc 1 2713416 818755745 1845.3
## - sex_ratio 1 12931082 828973411 1846.7
## <none> 816042329 1846.9
## - education 1 44837694 860880023 1851.1
## - pop 1 66694264 882736592 1854.0
## - commuters 1 83166641 899208970 1856.2
## - pop_density 1 117653548 933695876 1860.5
## - vehicle_number 1 299217993 1115260321 1881.1
## - stopdensity 1 513079179 1329121507 1901.5
##
## Step: AIC=1845.29
## trips_in ~ commuters + vehicle_number + pop + pop_density + education +
## sex_ratio + stopdensity
##
## Df Sum of Sq RSS AIC
## - sex_ratio 1 12707718 831463463 1845.1
## <none> 818755745 1845.3
## - pop 1 88076757 906832502 1855.1
## - commuters 1 97488116 916243861 1856.3
## - pop_density 1 127046476 945802221 1860.0
## - education 1 148884533 967640278 1862.7
## - vehicle_number 1 398134129 1216889874 1889.2
## - stopdensity 1 533658800 1352414545 1901.5
##
## Step: AIC=1845.07
## trips_in ~ commuters + vehicle_number + pop + pop_density + education +
## stopdensity
##
## Df Sum of Sq RSS AIC
## <none> 831463463 1845.1
## - pop 1 77431353 908894816 1853.4
## - commuters 1 121615369 953078831 1858.9
## - pop_density 1 123500523 954963986 1859.1
## - education 1 142489154 973952616 1861.4
## - vehicle_number 1 429618243 1261081706 1891.4
## - stopdensity 1 653359213 1484822676 1910.3
##
## Call:
## lm(formula = trips_in ~ commuters + vehicle_number + pop + pop_density +
## education + stopdensity, data = Model)
##
## Coefficients:
## (Intercept) commuters vehicle_number pop pop_density
## -1.069e+04 4.477e+00 -8.007e+00 1.291e+00 -2.328e-01
## education stopdensity
## 2.454e+03 4.296e+01
The summary of model 1 shows that median income, sex ratio and median age are not significantly correlated to inflow trips. Hence, using stepwise regression to select fitted variables. The stepwise method excludes median income, sex ratio and median age.
Model1_stepwise<-lm(formula = trips_in ~ commuters + vehicle_number + pop + pop_density + education + stopdensity, data = Model)
summary(Model1_stepwise)
##
## Call:
## lm(formula = trips_in ~ commuters + vehicle_number + pop + pop_density +
## education + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9595.9 -898.3 8.9 1183.1 13910.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.069e+04 2.033e+03 -5.257 7.35e-07 ***
## commuters 4.477e+00 1.121e+00 3.993 0.000119 ***
## vehicle_number -8.007e+00 1.067e+00 -7.505 1.78e-11 ***
## pop 1.291e+00 4.051e-01 3.186 0.001882 **
## pop_density -2.328e-01 5.785e-02 -4.024 0.000106 ***
## education 2.454e+03 5.678e+02 4.322 3.43e-05 ***
## stopdensity 4.296e+01 4.641e+00 9.255 2.14e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2762 on 109 degrees of freedom
## Multiple R-squared: 0.748, Adjusted R-squared: 0.7341
## F-statistic: 53.92 on 6 and 109 DF, p-value: < 2.2e-16
## commuters vehicle_number pop pop_density education
## 12.954674 5.118265 5.716493 1.523300 1.756836
## stopdensity
## 1.209620
Although all the explanatory variables in the stepwise Model 1 are significant, the vif result shows that the number of commuters has strong multicollnearity with other variables. Therefore, exclude the number of commuters in the following model.
Model1_stepwise_renewed<-lm(formula = trips_in ~ vehicle_number + pop + pop_density + education + stopdensity, data = Model)
summary(Model1_stepwise_renewed)
##
## Call:
## lm(formula = trips_in ~ vehicle_number + pop + pop_density +
## education + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9677.3 -1134.1 -110.7 1249.9 14345.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.511e+04 1.816e+03 -8.321 2.62e-13 ***
## vehicle_number -5.013e+00 8.088e-01 -6.198 1.02e-08 ***
## pop 2.608e+00 2.506e-01 10.408 < 2e-16 ***
## pop_density -1.221e-01 5.412e-02 -2.257 0.026 *
## education 3.323e+03 5.589e+02 5.946 3.31e-08 ***
## stopdensity 4.757e+01 4.791e+00 9.929 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2944 on 110 degrees of freedom
## Multiple R-squared: 0.7111, Adjusted R-squared: 0.698
## F-statistic: 54.16 on 5 and 110 DF, p-value: < 2.2e-16
## vehicle_number pop pop_density education stopdensity
## 2.589139 1.925317 1.173816 1.498684 1.134619
After excluding the number of commuters, the vif values are all under 5, indicating poor or no multicollnearity with other variables. However, the regression diagnostics indicate that the model is not fitted for prediction. First, the first and second plots of ‘Residuals vs Fitted’ and ‘Scale-Location’ show that as the fitted values increase, the residuals decrease, indicating that the original data is not linear relationship; Second, the third plot of ‘Normal Q-Q’ shows that the residuals are non-normality; Third, the last plot of ‘Residuals vs Leverage’ indicates that there are many influential points in the model. Based on the reasons, try to replace dependant variable with log-transformed inflow trips.
Model$logtripsin<-log(Model$trips_in)
Model1_log<-lm(logtripsin~commuters+vehicle_number+Med_inc+pop+pop_density
+education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model1_log)
##
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc +
## pop + pop_density + education + sex_ratio + Med_Age + stopdensity,
## data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.19144 -0.58818 -0.08074 0.56144 3.02367
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.697e-01 1.068e+00 0.253 0.801049
## commuters 1.164e-03 5.046e-04 2.306 0.023060 *
## vehicle_number -1.680e-03 4.746e-04 -3.540 0.000596 ***
## Med_inc -2.570e-05 6.368e-06 -4.036 0.000103 ***
## pop 8.981e-05 1.565e-04 0.574 0.567163
## pop_density -3.416e-06 2.072e-05 -0.165 0.869369
## education 1.893e+00 3.249e-01 5.827 6.14e-08 ***
## sex_ratio 8.655e-01 5.867e-01 1.475 0.143138
## Med_Age -3.226e-02 2.161e-02 -1.493 0.138482
## stopdensity 6.116e-03 1.726e-03 3.544 0.000587 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9635 on 106 degrees of freedom
## Multiple R-squared: 0.7011, Adjusted R-squared: 0.6757
## F-statistic: 27.63 on 9 and 106 DF, p-value: < 2.2e-16
## Start: AIC=0.92
## logtripsin ~ commuters + vehicle_number + Med_inc + pop + pop_density +
## education + sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## - pop_density 1 0.0252 98.436 -1.0455
## - pop 1 0.3059 98.717 -0.7152
## <none> 98.411 0.9248
## - sex_ratio 1 2.0202 100.431 1.2820
## - Med_Age 1 2.0687 100.479 1.3379
## - commuters 1 4.9366 103.347 4.6025
## - vehicle_number 1 11.6319 110.043 11.8841
## - stopdensity 1 11.6621 110.073 11.9159
## - Med_inc 1 15.1198 113.530 15.5037
## - education 1 31.5235 129.934 31.1587
##
## Step: AIC=-1.05
## logtripsin ~ commuters + vehicle_number + Med_inc + pop + education +
## sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## - pop 1 0.3414 98.777 -2.6439
## <none> 98.436 -1.0455
## - sex_ratio 1 2.0001 100.436 -0.7121
## - Med_Age 1 2.0467 100.483 -0.6583
## - commuters 1 5.0324 103.468 2.7383
## - stopdensity 1 11.6419 110.078 9.9212
## - vehicle_number 1 11.9314 110.367 10.2259
## - Med_inc 1 15.2824 113.718 13.6955
## - education 1 31.5017 129.938 29.1618
##
## Step: AIC=-2.64
## logtripsin ~ commuters + vehicle_number + Med_inc + education +
## sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## <none> 98.777 -2.644
## - sex_ratio 1 1.729 100.506 -2.631
## - Med_Age 1 1.869 100.646 -2.469
## - stopdensity 1 11.372 110.150 7.997
## - Med_inc 1 15.896 114.674 12.666
## - vehicle_number 1 17.128 115.905 13.905
## - commuters 1 23.430 122.208 20.047
## - education 1 40.413 139.190 35.141
##
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc +
## education + sex_ratio + Med_Age + stopdensity, data = Model)
##
## Coefficients:
## (Intercept) commuters vehicle_number Med_inc education
## 5.613e-01 1.391e-03 -1.788e-03 -2.409e-05 1.782e+00
## sex_ratio Med_Age stopdensity
## 7.775e-01 -3.007e-02 6.000e-03
The summary of log-transformed model shows that total population, population density, sex ratio and median age are not significantly correlated to log-transformed inflow trips. Hence, using stepwise regression to select fitted variables. The stepwise method only excludes total population and population density.
Model1_log_stepwise<-lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc +
education + sex_ratio + Med_Age + stopdensity, data = Model)
summary(Model1_log_stepwise)
##
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc +
## education + sex_ratio + Med_Age + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1798 -0.5744 -0.1060 0.5694 2.9992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.613e-01 8.595e-01 0.653 0.51508
## commuters 1.391e-03 2.749e-04 5.061 1.72e-06 ***
## vehicle_number -1.788e-03 4.131e-04 -4.327 3.38e-05 ***
## Med_inc -2.409e-05 5.778e-06 -4.169 6.19e-05 ***
## education 1.782e+00 2.681e-01 6.647 1.26e-09 ***
## sex_ratio 7.775e-01 5.655e-01 1.375 0.17201
## Med_Age -3.007e-02 2.103e-02 -1.430 0.15572
## stopdensity 6.000e-03 1.701e-03 3.526 0.00062 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9563 on 108 degrees of freedom
## Multiple R-squared: 0.7, Adjusted R-squared: 0.6805
## F-statistic: 36 on 7 and 108 DF, p-value: < 2.2e-16
## commuters vehicle_number Med_inc education sex_ratio
## 6.494096 6.398672 3.691302 3.266617 1.268209
## Med_Age stopdensity
## 2.046143 1.355672
Model1_log_stepwise_renewed<-lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc + education + stopdensity, data = Model)
summary(Model1_log_stepwise_renewed)
##
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc +
## education + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.47359 -0.62628 -0.05064 0.54694 2.49811
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.685e-01 6.003e-01 1.114 0.268
## commuters 1.559e-03 2.546e-04 6.123 1.45e-08 ***
## vehicle_number -1.964e-03 4.010e-04 -4.897 3.37e-06 ***
## Med_inc -2.634e-05 5.696e-06 -4.624 1.03e-05 ***
## education 1.687e+00 2.541e-01 6.640 1.23e-09 ***
## stopdensity 6.712e-03 1.605e-03 4.183 5.80e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9634 on 110 degrees of freedom
## Multiple R-squared: 0.6899, Adjusted R-squared: 0.6758
## F-statistic: 48.95 on 5 and 110 DF, p-value: < 2.2e-16
## commuters vehicle_number Med_inc education stopdensity
## 5.487730 5.941523 3.536103 2.891618 1.188357
## Analysis of Variance Table
##
## Model 1: logtripsin ~ commuters + vehicle_number + Med_inc + education +
## sex_ratio + Med_Age + stopdensity
## Model 2: logtripsin ~ commuters + vehicle_number + Med_inc + education +
## stopdensity
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 108 98.777
## 2 110 102.086 -2 -3.3092 1.8091 0.1687
## df AIC
## Model1_log_stepwise 9 328.5498
## Model1_log_stepwise_renewed 7 328.3724
Since the sex ratio and median age are not significantly correlated to inflow trips, exclude them from the stepwise log-transformed model to build a new one. The Anova analysis p value is 0.1687, which is larger than 0.05, indicating that excluding sex ratio and median age has no influence on the inflow trips. Besides, the AIC analysis indicates that the latter model is more fitted because it has a smaller AIC of the model without sex ratio and median age.
The regression diagnostics indicate the model has a normal-distributed residuals, the explanatory variables and inflow trips are linear relationship, and there are no influential points in the model. Hence, with the R*2 equals to 0.6758, the model could explain the inflow trips in Minneapolis efficiently.
Therefore, the inflow trips model is : ln(Inflow trips) = 1.559e-03commuters - 1.964e-03vehicle member - 2.634e-05median income + 1.687e+00edcuation +6.712e-03stop density + 6.685e-01
TripinModel<-lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc +
education + stopdensity, data = Model)
summary(TripinModel)
##
## Call:
## lm(formula = logtripsin ~ commuters + vehicle_number + Med_inc +
## education + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.47359 -0.62628 -0.05064 0.54694 2.49811
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.685e-01 6.003e-01 1.114 0.268
## commuters 1.559e-03 2.546e-04 6.123 1.45e-08 ***
## vehicle_number -1.964e-03 4.010e-04 -4.897 3.37e-06 ***
## Med_inc -2.634e-05 5.696e-06 -4.624 1.03e-05 ***
## education 1.687e+00 2.541e-01 6.640 1.23e-09 ***
## stopdensity 6.712e-03 1.605e-03 4.183 5.80e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9634 on 110 degrees of freedom
## Multiple R-squared: 0.6899, Adjusted R-squared: 0.6758
## F-statistic: 48.95 on 5 and 110 DF, p-value: < 2.2e-16
Building Outflow Model
Procedure same as building the inflow model.
Model2<-lm(trips_out~commuters+vehicle_number+Med_inc+pop+pop_density
+education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model2)#med_inc,sex ratio,med_age not significant
##
## Call:
## lm(formula = trips_out ~ commuters + vehicle_number + Med_inc +
## pop + pop_density + education + sex_ratio + Med_Age + stopdensity,
## data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10230.3 -1159.3 119.8 1438.5 17129.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.395e+04 3.472e+03 -4.019 0.000110 ***
## commuters 4.371e+00 1.641e+00 2.664 0.008939 **
## vehicle_number -8.487e+00 1.543e+00 -5.500 2.65e-07 ***
## Med_inc 1.579e-02 2.071e-02 0.762 0.447554
## pop 1.547e+00 5.088e-01 3.041 0.002973 **
## pop_density -2.569e-01 6.738e-02 -3.813 0.000231 ***
## education 2.197e+03 1.057e+03 2.080 0.039981 *
## sex_ratio 2.703e+03 1.908e+03 1.417 0.159460
## Med_Age -1.284e+00 7.028e+01 -0.018 0.985459
## stopdensity 4.883e+01 5.612e+00 8.702 4.63e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3133 on 106 degrees of freedom
## Multiple R-squared: 0.7429, Adjusted R-squared: 0.7211
## F-statistic: 34.04 on 9 and 106 DF, p-value: < 2.2e-16
## Start: AIC=1877.11
## trips_out ~ commuters + vehicle_number + Med_inc + pop + pop_density +
## education + sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## - Med_Age 1 3277 1040737462 1875.1
## - Med_inc 1 5705904 1046440089 1875.8
## <none> 1040734185 1877.1
## - sex_ratio 1 19709672 1060443857 1877.3
## - education 1 42458553 1083192738 1879.8
## - commuters 1 69654235 1110388420 1882.6
## - pop 1 90786063 1131520247 1884.8
## - pop_density 1 142737705 1183471890 1890.0
## - vehicle_number 1 296977565 1337711749 1904.2
## - stopdensity 1 743476693 1784210878 1937.6
##
## Step: AIC=1875.11
## trips_out ~ commuters + vehicle_number + Med_inc + pop + pop_density +
## education + sex_ratio + stopdensity
##
## Df Sum of Sq RSS AIC
## - Med_inc 1 5717343 1046454805 1873.8
## <none> 1040737462 1875.1
## - sex_ratio 1 19942625 1060680086 1875.3
## - education 1 47796807 1088534269 1878.3
## - commuters 1 75820901 1116558362 1881.3
## - pop 1 91626174 1132363636 1882.9
## - pop_density 1 145810617 1186548078 1888.3
## - vehicle_number 1 317414916 1358152378 1904.0
## - stopdensity 1 744352482 1785089943 1935.7
##
## Step: AIC=1873.75
## trips_out ~ commuters + vehicle_number + pop + pop_density +
## education + sex_ratio + stopdensity
##
## Df Sum of Sq RSS AIC
## <none> 1046454805 1873.8
## - sex_ratio 1 19539025 1065993830 1873.9
## - commuters 1 78840787 1125295592 1880.2
## - pop 1 124932018 1171386823 1884.8
## - pop_density 1 159793202 1206248007 1888.2
## - education 1 179627496 1226082301 1890.1
## - vehicle_number 1 402629520 1449084325 1909.5
## - stopdensity 1 777374754 1823829559 1936.2
##
## Call:
## lm(formula = trips_out ~ commuters + vehicle_number + pop + pop_density +
## education + sex_ratio + stopdensity, data = Model)
##
## Coefficients:
## (Intercept) commuters vehicle_number pop pop_density
## -1.491e+04 3.710e+00 -7.841e+00 1.685e+00 -2.651e-01
## education sex_ratio stopdensity
## 2.766e+03 2.671e+03 4.942e+01
Model2_stepwise<-lm(formula = trips_out ~ commuters + vehicle_number + pop + pop_density +
education + sex_ratio + stopdensity, data = Model)
summary(Model2_stepwise)
##
## Call:
## lm(formula = trips_out ~ commuters + vehicle_number + pop + pop_density +
## education + sex_ratio + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10576.1 -1060.1 128.1 1441.4 17381.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.491e+04 3.044e+03 -4.898 3.41e-06 ***
## commuters 3.710e+00 1.301e+00 2.853 0.005199 **
## vehicle_number -7.841e+00 1.216e+00 -6.446 3.30e-09 ***
## pop 1.685e+00 4.693e-01 3.591 0.000498 ***
## pop_density -2.651e-01 6.528e-02 -4.061 9.28e-05 ***
## education 2.766e+03 6.423e+02 4.306 3.68e-05 ***
## sex_ratio 2.671e+03 1.881e+03 1.420 0.158474
## stopdensity 4.942e+01 5.517e+00 8.957 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3113 on 108 degrees of freedom
## Multiple R-squared: 0.7415, Adjusted R-squared: 0.7248
## F-statistic: 44.26 on 7 and 108 DF, p-value: < 2.2e-16
## commuters vehicle_number pop pop_density education
## 13.720438 5.237206 6.038607 1.526824 1.770243
## sex_ratio stopdensity
## 1.324355 1.345655
Model2_stepwise_renewed<-lm(formula = trips_out ~ vehicle_number + pop + pop_density + education + stopdensity, data = Model)
summary(Model2_stepwise_renewed)
##
## Call:
## lm(formula = trips_out ~ vehicle_number + pop + pop_density +
## education + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10879.8 -1129.6 -104.2 1446.8 17317.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.616e+04 2.012e+03 -8.031 1.17e-12 ***
## vehicle_number -5.328e+00 8.962e-01 -5.945 3.31e-08 ***
## pop 2.751e+00 2.777e-01 9.908 < 2e-16 ***
## pop_density -1.582e-01 5.997e-02 -2.637 0.00957 **
## education 3.491e+03 6.193e+02 5.637 1.35e-07 ***
## stopdensity 5.618e+01 5.309e+00 10.583 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3262 on 110 degrees of freedom
## Multiple R-squared: 0.7109, Adjusted R-squared: 0.6978
## F-statistic: 54.11 on 5 and 110 DF, p-value: < 2.2e-16
## vehicle_number pop pop_density education stopdensity
## 2.589139 1.925317 1.173816 1.498684 1.134619
Model$logtripsout<-log(Model$trips_out)
Model2_log<-lm(logtripsout~commuters+vehicle_number+Med_inc+pop+pop_density
+education+sex_ratio+ Med_Age+stopdensity,Model)
summary(Model2_log)
##
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc +
## pop + pop_density + education + sex_ratio + Med_Age + stopdensity,
## data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7503 -0.7531 -0.1169 0.5225 3.4622
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.712e-01 1.248e+00 -0.538 0.591957
## commuters 1.143e-03 5.901e-04 1.937 0.055373 .
## vehicle_number -1.871e-03 5.549e-04 -3.372 0.001042 **
## Med_inc -2.537e-05 7.446e-06 -3.407 0.000931 ***
## pop 1.197e-04 1.829e-04 0.654 0.514356
## pop_density 1.113e-05 2.423e-05 0.459 0.646976
## education 2.153e+00 3.799e-01 5.667 1.26e-07 ***
## sex_ratio 1.067e+00 6.860e-01 1.556 0.122725
## Med_Age -5.135e-02 2.527e-02 -2.032 0.044632 *
## stopdensity 7.491e-03 2.018e-03 3.712 0.000329 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.127 on 106 degrees of freedom
## Multiple R-squared: 0.6874, Adjusted R-squared: 0.6608
## F-statistic: 25.89 on 9 and 106 DF, p-value: < 2.2e-16
## Start: AIC=37.21
## logtripsout ~ commuters + vehicle_number + Med_inc + pop + pop_density +
## education + sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## - pop_density 1 0.268 134.82 35.444
## - pop 1 0.543 135.10 35.681
## <none> 134.56 37.214
## - sex_ratio 1 3.073 137.63 37.833
## - commuters 1 4.764 139.32 39.250
## - Med_Age 1 5.243 139.80 39.647
## - vehicle_number 1 14.434 148.99 47.034
## - Med_inc 1 14.730 149.29 47.264
## - stopdensity 1 17.496 152.05 49.393
## - education 1 40.767 175.32 65.913
##
## Step: AIC=35.44
## logtripsout ~ commuters + vehicle_number + Med_inc + pop + education +
## sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## - pop 1 0.443 135.27 33.825
## <none> 134.82 35.444
## - sex_ratio 1 3.198 138.02 36.163
## - commuters 1 5.563 140.39 38.135
## - Med_Age 1 5.733 140.56 38.275
## - Med_inc 1 15.695 150.52 46.218
## - vehicle_number 1 15.975 150.80 46.434
## - stopdensity 1 17.869 152.69 47.881
## - education 1 41.085 175.91 64.300
##
## Step: AIC=33.83
## logtripsout ~ commuters + vehicle_number + Med_inc + education +
## sex_ratio + Med_Age + stopdensity
##
## Df Sum of Sq RSS AIC
## <none> 135.27 33.825
## - sex_ratio 1 2.822 138.09 34.220
## - Med_Age 1 5.421 140.69 36.383
## - Med_inc 1 16.081 151.35 44.855
## - stopdensity 1 17.500 152.77 45.938
## - vehicle_number 1 22.884 158.15 49.956
## - commuters 1 26.677 161.94 52.705
## - education 1 52.720 187.99 70.003
##
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc +
## education + sex_ratio + Med_Age + stopdensity, data = Model)
##
## Coefficients:
## (Intercept) commuters vehicle_number Med_inc education
## -1.348e-01 1.485e-03 -2.066e-03 -2.423e-05 2.035e+00
## sex_ratio Med_Age stopdensity
## 9.933e-01 -5.121e-02 7.443e-03
Model2_log_stepwise<-lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc +
education + sex_ratio + Med_Age + stopdensity, data = Model)
summary(Model2_log_stepwise)
##
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc +
## education + sex_ratio + Med_Age + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8510 -0.7568 -0.1007 0.5830 3.4643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.348e-01 1.006e+00 -0.134 0.893658
## commuters 1.485e-03 3.217e-04 4.615 1.09e-05 ***
## vehicle_number -2.066e-03 4.834e-04 -4.274 4.15e-05 ***
## Med_inc -2.423e-05 6.761e-06 -3.583 0.000511 ***
## education 2.035e+00 3.137e-01 6.488 2.70e-09 ***
## sex_ratio 9.933e-01 6.618e-01 1.501 0.136267
## Med_Age -5.121e-02 2.461e-02 -2.080 0.039850 *
## stopdensity 7.443e-03 1.991e-03 3.738 0.000298 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.119 on 108 degrees of freedom
## Multiple R-squared: 0.6857, Adjusted R-squared: 0.6653
## F-statistic: 33.66 on 7 and 108 DF, p-value: < 2.2e-16
## commuters vehicle_number Med_inc education sex_ratio
## 6.494096 6.398672 3.691302 3.266617 1.268209
## Med_Age stopdensity
## 2.046143 1.355672
Model2_log_stepwise_renewed<-lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc + education + Med_Age + stopdensity, data = Model)
summary(Model2_log_stepwise_renewed)
##
## Call:
## lm(formula = logtripsout ~ commuters + vehicle_number + Med_inc +
## education + Med_Age + stopdensity, data = Model)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8913 -0.7736 -0.0959 0.7640 3.2131
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.854e-01 8.492e-01 0.807 0.421400
## commuters 1.520e-03 3.227e-04 4.710 7.33e-06 ***
## vehicle_number -2.111e-03 4.853e-04 -4.349 3.09e-05 ***
## Med_inc -2.565e-05 6.733e-06 -3.809 0.000231 ***
## education 2.064e+00 3.149e-01 6.555 1.92e-09 ***
## Med_Age -4.798e-02 2.466e-02 -1.946 0.054276 .
## stopdensity 8.484e-03 1.877e-03 4.520 1.58e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.126 on 109 degrees of freedom
## Multiple R-squared: 0.6791, Adjusted R-squared: 0.6615
## F-statistic: 38.45 on 6 and 109 DF, p-value: < 2.2e-16
## commuters vehicle_number Med_inc education Med_Age
## 6.459487 6.374968 3.619008 3.254371 2.030517
## stopdensity
## 1.191183
## Analysis of Variance Table
##
## Model 1: logtripsout ~ commuters + vehicle_number + Med_inc + education +
## sex_ratio + Med_Age + stopdensity
## Model 2: logtripsout ~ commuters + vehicle_number + Med_inc + education +
## Med_Age + stopdensity
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 108 135.27
## 2 109 138.09 -1 -2.8219 2.2531 0.1363
## df AIC
## Model2_log_stepwise 9 365.0189
## Model2_log_stepwise_renewed 8 365.4139
The outflow trips model is : ln(Outflow trips) = 1.520e-03commuters - 2.111e-03vehicle number - 2.565e-05median income + 2.064e+00edcuation -4.798e-02Median Age + 8.484e-03stop density + 6.854e-01
Inflow and Outflow Models
Inflow model: ln(Inflow trips) = 1.559e-03commuters - 1.964e-03vehicle number - 2.634e-05median income + 1.687e+00edcuation +6.712e-03stop density + 6.685e-01
Outflow model: ln(Outflow trips) = 1.520e-03commuters - 2.111e-03vehicle number - 2.565e-05median income + 2.064e+00edcuation -4.798e-02Median Age + 8.484e-03stop density + 6.854e-01
E-Scooter Trips Prediction in Philadelphia
Prepare variables
phillycensus<-read.csv("philly census.csv",stringsAsFactors=FALSE)
#commuters
phillycensus$commuters<-phillycensus$Workers.16.Years.and.Over.-phillycensus$Workers.16.Years.and.Over..Worked.At.Home
#vehicle number
phillycensus$vehicle_number<-phillycensus$Aggregate.Number.Of.Vehicles..Car.Truck.Or.Van..Used.In.Commuting
#median income
phillycensus$Med_inc<-phillycensus$Median.Household.Income..In.2018.Inflation.Adjusted.Dollars.
#education
phillycensus$education<-(1*phillycensus$X..Population.25.Years.and.Over..Less.than.High.School+2*phillycensus$X..Population.25.Years.and.Over..High.School.Graduate..Includes.Equivalency.
+3*phillycensus$X..Population.25.Years.and.Over..Some.College+4*phillycensus$X..Population.25.Years.and.Over..Bachelor.s.Degree
+5*phillycensus$X..Population.25.Years.and.Over..Master.s.Degree+6*phillycensus$X..Population.25.Years.and.Over..Professional.School.Degree
+7*phillycensus$X..Population.25.Years.and.Over..Doctorate.Degree)/100
#median age
phillycensus2<-get_acs(geography = "tract",
variables = c("B01003_001", "B19013_001",
"B02001_002", "B08013_001",
"B08012_001", "B08301_001",
"B08301_010", "B01002_001"),
year = 2018,
state = "PA",
geometry = TRUE,
county=c("Philadelphia"),
output = "wide") %>%
rename(Total_Pop = B01003_001E,
Med_Inc = B19013_001E,
Med_Age = B01002_001E,
White_Pop = B02001_002E,
Travel_Time = B08013_001E,
Num_Commuters = B08012_001E,
Means_of_Transport = B08301_001E,
Total_Public_Trans = B08301_010E)
phillyage<-phillycensus2[,c("Med_Age","GEOID")]
phillycensus<-merge(phillycensus,phillyage,by.x="FIPS",by.y="GEOID",all=TRUE)
# Transit Accessibility
phillyTracts <-
phillycensus2 %>%
as.data.frame() %>%
distinct(GEOID, .keep_all = TRUE) %>%
dplyr::select(GEOID, geometry) %>%
st_sf %>%
st_transform(102271)
Phillytracts_boundary<-st_read("Census_Tracts_2010.shp") %>%
st_transform(crs=4326)
Phillybusstops<-read.csv("SEPTA__Bus_Stops.csv")
busstops <- Phillybusstops %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
st_transform(4326)
busstops_join <- st_intersection(busstops,Phillytracts_boundary) %>%
dplyr::select(-8)
busstop_counts <- busstops_join%>%
mutate(busstops_join$GEOID10<-as.numeric(GEOID10))%>%
group_by(busstops_join$GEOID10)%>%
summarise(bustopcounts=n())
Phillytrolleystops<-read.csv("SEPTA__Trolley_Stops.csv")
trolleystops<-Phillytrolleystops %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
st_transform(4326)
trolleystops_join <- st_intersection(trolleystops,Phillytracts_boundary) %>%
dplyr::select(-8)
trolleystop_counts <- trolleystops_join%>%
mutate(trolleystops_join$GEOID10<-as.numeric(GEOID10))%>%
group_by(trolleystops_join$GEOID10)%>%
summarise(trolleystoopcounts=n())
PhillyMFL<-read.csv("SEPTA__MarketFrankford_Line_Stations.csv")
MFLstops<-PhillyMFL %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
st_transform(4326)
MFL_join <- st_intersection(MFLstops,Phillytracts_boundary) %>%
dplyr::select(-8)
MFLstop_counts <- MFL_join%>%
mutate(MFL_join$GEOID10<-as.numeric(GEOID10))%>%
group_by(MFL_join$GEOID10)%>%
summarise(MFLstopcounts=n())
PhillyBSL<-read.csv("SEPTA__Broad_Street_Line_Stations.csv")
BSLstops<-PhillyBSL %>%
st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326, agr = "constant")%>%
st_transform(4326)
BSL_join <- st_intersection(BSLstops,Phillytracts_boundary) %>%
dplyr::select(-8)
BSLstop_counts <- BSL_join%>%
mutate(BSL_join$GEOID10<-as.numeric(GEOID10))%>%
group_by(BSL_join$GEOID10)%>%
summarise(BSLstopscounts=n())
PhillyNHL<-read.csv("SEPTA__Norristown_Highspeed_Line_Stations.csv")
NHLstops<-PhillyNHL%>%
st_as_sf(coords=c("LONGITUDE","LATITUDE"),crs=4326, agr="constant")%>%
st_transform(4326)
NHL_join<-st_intersection(NHLstops,Phillytracts_boundary) %>%
dplyr::select(-8)#No NHL
PhillyRL<-read.csv("SEPTA__Regional_Rail_Stations.csv")
RLstops<-PhillyRL%>%
st_as_sf(coords=c("LONGITUDE","LATITUDE"),crs=4326, agr="constant")%>%
st_transform(4326)
RL_join<-st_intersection(RLstops,Phillytracts_boundary) %>%
dplyr::select(-8)
RLstop_counts <- RL_join%>%
mutate(RL_join$GEOID10<-as.numeric(GEOID10))%>%
group_by(RL_join$GEOID10)%>%
summarise(RLstopscounts=n())
phillycensus<-merge(phillycensus,busstop_counts,by.x="FIPS", by.y="busstops_join$GEOID10",all=FALSE)
phillycensus<-merge(phillycensus,trolleystop_counts,by.x="FIPS",by.y="trolleystops_join$GEOID10",all=TRUE)
phillycensus<-merge(phillycensus,MFLstop_counts,by.x="FIPS",by.y="MFL_join$GEOID10",all=TRUE)
phillycensus<-merge(phillycensus,BSLstop_counts,by.x="FIPS",by.y="BSL_join$GEOID10",all=TRUE)
phillycensus<-merge(phillycensus,RLstop_counts,by.x="FIPS",by.y="RL_join$GEOID10",all=TRUE)
phillycensus[is.na(phillycensus)] <- 0
phillycensus$stopamount<-1*phillycensus$bustopcounts+2*phillycensus$trolleystoopcounts+3*(phillycensus$BSLstopscounts+phillycensus$MFLstopcounts)+4*phillycensus$RLstopscounts
phillyland<-read.csv("philly land.csv")
phillycensus<-merge(phillycensus,phillyland, by="FIPS",all=TRUE)
phillycensus$stopdensity<-phillycensus$stopamount/phillycensus$Area.Total..Area..Land.
Inflow Trips Prediction
phillycensus$log.tripsin<-TripinModel$coefficients[2]*phillycensus$commuters +TripinModel$coefficients[3]*phillycensus$vehicle_number+TripinModel$coefficients[4]*phillycensus$Med_inc+TripinModel$coefficients[5]*phillycensus$education+TripinModel$coefficients[6]*phillycensus$stopdensity+TripinModel$coefficients[1]
phillycensus$tripsin<-exp(phillycensus$log.tripsin)
options(scipen = 200)
phillycensus$tripsin<-round(phillycensus$tripsin)
philly_prediction<-phillycensus[,c("FIPS","tripsin")]
names(philly_prediction)<-c("GEOID","tripsin")
Phillytracts_boundary$GEOID10<-as.numeric(as.character(Phillytracts_boundary$GEOID10))
Phillytracts_boundary$INTPTLAT10<-as.numeric(as.character(Phillytracts_boundary$INTPTLAT10))
Phillytracts_boundary$INTPTLON10<-as.numeric(as.character(Phillytracts_boundary$INTPTLON10))
philly_prediction<-merge(Phillytracts_boundary,philly_prediction,by.x="GEOID10",by.y="GEOID",all=TRUE)
ggplot()+
geom_sf(data=philly_prediction,aes(fill=ntile(tripsin,4)))+
scale_fill_viridis('Inflow Trips - quantile breaks', # Scale label
direction = -1, # Direction of the color palette - either 1 or -1
option = 'D', # Viridis has multiple palettes to choose from
alpha = 0.6, # Transparency level
labels=as.character(quantile(philly_prediction$tripsin,
c(.25,.5,.75,1))))+
labs(title= "Predicted E-Scooter Inflow Trips in Philadelphia") +
theme.map
Outflow Trips Prediction
phillycensus$log.tripsout<-TripoutModel$coefficients[2]*phillycensus$commuters +TripoutModel$coefficients[3]*phillycensus$vehicle_number+TripoutModel$coefficients[4]*phillycensus$Med_inc+TripoutModel$coefficients[5]*phillycensus$education+TripoutModel$coefficients[6]*phillycensus$Med_Age+TripoutModel$coefficients[7]*phillycensus$stopdensity +TripinModel$coefficients[1]
phillycensus$tripsout<-exp(phillycensus$log.tripsout)
options(scipen = 200)
phillycensus$tripsout<-round(phillycensus$tripsout)
philly_prediction<-phillycensus[,c("FIPS","tripsout","tripsin")]
names(philly_prediction)<-c("GEOID","tripsout","tripsin")
Phillytracts_boundary$GEOID10<-as.numeric(as.character(Phillytracts_boundary$GEOID10))
Phillytracts_boundary$INTPTLAT10<-as.numeric(as.character(Phillytracts_boundary$INTPTLAT10))
Phillytracts_boundary$INTPTLON10<-as.numeric(as.character(Phillytracts_boundary$INTPTLON10))
philly_prediction<-merge(Phillytracts_boundary,philly_prediction,by.x="GEOID10",by.y="GEOID",all=TRUE)
ggplot()+
geom_sf(data=philly_prediction,aes(fill=ntile(tripsout,4)))+
scale_fill_viridis('Outflow Trips - quantile breaks', # Scale label
direction = -1, # Direction of the color palette - either 1 or -1
option = 'D', # Viridis has multiple palettes to choose from
alpha = 0.6, # Transparency level
labels=as.character(quantile(philly_prediction$tripsout,
c(.25,.5,.75,1))))+
labs(title= "Predicted E-Scooter Outflow Trips in Philadelphia") +
theme.map
Absolute Trips in Philadelphia
philly_prediction$abs_trip<-philly_prediction$tripsin-philly_prediction$tripsout
ggplot()+
geom_sf(data=philly_prediction,aes(fill=ntile(abs_trip,4)))+
scale_fill_viridis('Absolute Trips - quantile breaks', # Scale label
direction = -1, # Direction of the color palette - either 1 or -1
option = 'D', # Viridis has multiple palettes to choose from
alpha = 0.6, # Transparency level
labels=as.character(quantile(philly_prediction$abs_trip,
c(.25,.5,.75,1))))+
labs(title= "Predicted E-Scooter Absolute Trips (Inflow - Outflow) in Philadelphia") +
theme.map
The spatial pattern of inflow and outflow trips in Philadelphia are similar. University City and Center City are both popular origin and destination for the impending e-scooter in Philadelphia. While looking into the map of predicted absolute trips, which is the gap between inflow trips and outflow trips, the Center City, University City,Wynnefield, Powelton Village,Belmont Village are several main origins (yellow), while the outskirts of Philadelphia are mostly inflow trips.
Policy Recommendation
To better prepare for the upcoming e-scooter in Philadelphia:
City should regulate the e-scooter speed, right-of-way and other regulations to guarantee the safety of all road users;
City could assign boundary limits to e-scooter pilot program at the beginning of e-scooter rental markets coming to Philadelphia, such as University City and Center City;
E-scooter rental company should provide e-scooters based on the demand of residents;
E-scooter rental company should allocate more staff to collect e-scooters at the outskirts of Philadelphia and move them back to places where inflow trips are less than outflow trips
Further Study
Use hexagons as the units of the analysis, so that the distances between each units and the area are same;
Add land use area as a kind of factor;
Try negative binominal regression to predict the accurate number of trips