Reproduction of: Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA
Original study by Kang, J. Y., A. Michels, F. Lyu, Shaohua Wang, N. Agbodo, V. L. Freeman, and Shaowen Wang. 2020. Rapidly measuring spatial accessibility of COVID-19 healthcare resources: a case study of Illinois, USA. International Journal of Health Geographics 19 (1):1–17. DOI:10.1186/s12942-020-00229-x.
Reproduction Authors: Joe Holler, Derrick Burt, and Kufre Udoh With contributions from Peter Kedron, Drew An-Pham, and the Spring 2021 Open Source GIScience class at Middlebury
Reproduction Materials Available at: github.com/HEGSRR/RPr-Kang-2020
Created: 2021-06-01
Revised: 2021-08-19
The original study by Kang et al. focused on determining accessibility to healthcare resources for COVID-19 in the state of Illinois and Chicago. The goal was to determine a service to patient ratio for each given area in the study. The authors used a network graph of the Chicago area road network and ran a two-step floating catchment area around hospitals to determine the catchment areas of those hospitals for 10 minute, 20 minute, and 30 minute distances. They then determined the ratio of service (either ICU beds or ventilators) to patients (either reported COVID cases or population older than 50). The service ratios were then overlayed on a hexagonal grid to visualize spatial accessibility. The original study conducted this analysis over the entire state of Illinois, however, our reanalysis will focus only on the Chicago area.
To perform the ESFCA method, three types of data are required, as follows: (1) road network, (2) population, and (3) hospital information. The road network can be obtained from the OpenStreetMap Python Library, called OSMNX. The population data is available on the American Community Survey. Lastly, hospital information is also publically available on the Homelanad Infrastructure Foundation-Level Data.
Import necessary libraries to run this model.
See environment.yml
for the library versions used for this analysis.
# Import modules
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import osmnx as ox
import re
from shapely.geometry import Point, LineString, Polygon
import matplotlib.pyplot as plt
from tqdm import tqdm
import multiprocessing as mp
import folium
import itertools
import os
import time
import warnings
import IPython
import requests
from IPython.display import display, clear_output
warnings.filterwarnings("ignore")
print('\n'.join(f'{m.__name__}=={m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
numpy==1.22.0 pandas==1.3.5 geopandas==0.10.2 networkx==2.6.3 osmnx==1.1.2 re==2.2.1 folium==0.12.1.post1 IPython==8.3.0 requests==2.27.1
Because we have restructured the repository for replication, we need to check our working directory and make necessary adjustments.
# Check working directory
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020/procedure/code'
# Use to set work directory properly
if os.path.basename(os.getcwd()) == 'code':
os.chdir('../../')
os.getcwd()
'/home/jovyan/work/RPr-Kang-2020'
If you would like to use the data generated from the pre-processing scripts, use the following code:
covid_data = gpd.read_file('./data/raw/public/Pre-Processing/covid_pre-processed.shp')
atrisk_data = gpd.read_file('./data/raw/public/Pre-Processing/atrisk_pre-processed.shp')
# Read in at risk population data
atrisk_data = gpd.read_file('./data/raw/public/PopData/Illinois_Tract.shp')
atrisk_data.head()
GEOID | STATEFP | COUNTYFP | TRACTCE | NAMELSAD | Pop | Unnamed_ 0 | NAME | OverFifty | TotalPop | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 17091011700 | 17 | 091 | 011700 | Census Tract 117 | 3688 | 588 | Census Tract 117, Kankakee County, Illinois | 1135 | 3688 | POLYGON ((-87.88768 41.13594, -87.88764 41.136... |
1 | 17091011800 | 17 | 091 | 011800 | Census Tract 118 | 2623 | 220 | Census Tract 118, Kankakee County, Illinois | 950 | 2623 | POLYGON ((-87.89410 41.14388, -87.89400 41.143... |
2 | 17119400951 | 17 | 119 | 400951 | Census Tract 4009.51 | 5005 | 2285 | Census Tract 4009.51, Madison County, Illinois | 2481 | 5005 | POLYGON ((-90.11192 38.70281, -90.11128 38.703... |
3 | 17119400952 | 17 | 119 | 400952 | Census Tract 4009.52 | 3014 | 2299 | Census Tract 4009.52, Madison County, Illinois | 1221 | 3014 | POLYGON ((-90.09442 38.72031, -90.09360 38.720... |
4 | 17135957500 | 17 | 135 | 957500 | Census Tract 9575 | 2869 | 1026 | Census Tract 9575, Montgomery County, Illinois | 1171 | 2869 | POLYGON ((-89.70369 39.34803, -89.69928 39.348... |
# Read in covid case data
covid_data = gpd.read_file('./data/raw/public/PopData/Chicago_ZIPCODE.shp')
covid_data['cases'] = covid_data['cases']
covid_data.head()
ZCTA5CE10 | County | State | Join | ZONE | ZONENAME | FIPS | pop | cases | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 60660 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 43242 | 78 | POLYGON ((-87.65049 41.99735, -87.65029 41.996... |
1 | 60640 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 69715 | 117 | POLYGON ((-87.64645 41.97965, -87.64565 41.978... |
2 | 60614 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 71308 | 134 | MULTIPOLYGON (((-87.67703 41.91845, -87.67705 ... |
3 | 60712 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 12539 | 42 | MULTIPOLYGON (((-87.76181 42.00465, -87.76156 ... |
4 | 60076 | Cook County | IL | Cook County IL | IL_E | Illinois East | 1201 | 31867 | 114 | MULTIPOLYGON (((-87.74782 42.01540, -87.74526 ... |
Note that 999 is treated as a "NULL"/"NA" so these hospitals are filtered out. This data contains the number of ICU beds and ventilators at each hospital.
# Read in hospital data
hospitals = gpd.read_file('./data/raw/public/HospitalData/Chicago_Hospital_Info.shp')
hospitals.head()
FID | Hospital | City | ZIP_Code | X | Y | Total_Bed | Adult ICU | Total Vent | geometry | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | Methodist Hospital of Chicago | Chicago | 60640 | -87.671079 | 41.972800 | 145 | 36 | 12 | MULTIPOINT (-87.67108 41.97280) |
1 | 4 | Advocate Christ Medical Center | Oak Lawn | 60453 | -87.732483 | 41.720281 | 785 | 196 | 64 | MULTIPOINT (-87.73248 41.72028) |
2 | 13 | Evanston Hospital | Evanston | 60201 | -87.683288 | 42.065393 | 354 | 89 | 29 | MULTIPOINT (-87.68329 42.06539) |
3 | 24 | AMITA Health Adventist Medical Center Hinsdale | Hinsdale | 60521 | -87.920116 | 41.805613 | 261 | 65 | 21 | MULTIPOINT (-87.92012 41.80561) |
4 | 25 | Holy Cross Hospital | Chicago | 60629 | -87.690841 | 41.770001 | 264 | 66 | 21 | MULTIPOINT (-87.69084 41.77000) |
# Plot hospital data
m = folium.Map(location=[41.85, -87.65], tiles='cartodbpositron', zoom_start=10)
for i in range(0, len(hospitals)):
folium.CircleMarker(
location=[hospitals.iloc[i]['Y'], hospitals.iloc[i]['X']],
popup="{}{}\n{}{}\n{}{}".format('Hospital Name: ',hospitals.iloc[i]['Hospital'],
'ICU Beds: ',hospitals.iloc[i]['Adult ICU'],
'Ventilators: ', hospitals.iloc[i]['Total Vent']),
radius=5,
color='blue',
fill=True,
fill_opacity=0.6,
legend_name = 'Hospitals'
).add_to(m)
legend_html = '''<div style="position: fixed; width: 20%; heigh: auto;
bottom: 10px; left: 10px;
solid grey; z-index:9999; font-size:14px;
"> Legend<br>'''
m
# Read in and plot grid file for Chicago
grid_file = gpd.read_file('./data/raw/public/GridFile/Chicago_Grid.shp')
grid_file.plot(figsize=(8,8))
<AxesSubplot:>
If Chicago_Network_Buffer.graphml
does not already exist, this cell will query the road network from OpenStreetMap.
Each of the road network code blocks may take a few mintues to run.
%%time
# To create a new graph from OpenStreetMap, delete or rename data/raw/private/Chicago_Network_Buffer.graphml
# (if it exists), and set OSM to True
OSM = False
# if buffered street network is not saved, and OSM is preferred, # generate a new graph from OpenStreetMap and save it
if not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml") and OSM:
print("Loading buffered Chicago road network from OpenStreetMap. Please wait... runtime may exceed 9min...", flush=True)
G = ox.graph_from_place('Chicago', network_type='drive', buffer_dist=24140.2)
print("Saving Chicago road network to raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
ox.save_graphml(G, './data/raw/private/Chicago_Network_Buffer.graphml')
print("Data saved.")
# otherwise, if buffered street network is not saved, download graph from the OSF project
elif not os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Downloading buffered Chicago road network from OSF...", flush=True)
url = 'https://osf.io/download/z8ery/'
r = requests.get(url, allow_redirects=True)
print("Saving buffered Chicago road network to file...", flush=True)
open('./data/raw/private/Chicago_Network_Buffer.graphml', 'wb').write(r.content)
# if the buffered street network is already saved, load it
if os.path.exists("./data/raw/private/Chicago_Network_Buffer.graphml"):
print("Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait...", flush=True)
G = ox.load_graphml('./data/raw/private/Chicago_Network_Buffer.graphml')
print("Data loaded.")
else:
print("Error: could not load the road network from file.")
Loading buffered Chicago road network from raw/private/Chicago_Network_Buffer.graphml. Please wait... Data loaded. CPU times: user 51.7 s, sys: 2.86 s, total: 54.5 s Wall time: 54.2 s
%%time
ox.plot_graph(G, node_size = 1, bgcolor = 'white', node_color = 'black', edge_color = "#333333", node_alpha = 0.5, edge_linewidth = 0.5)
CPU times: user 1min 27s, sys: 501 ms, total: 1min 28s Wall time: 1min 28s
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)
Display all the unique speed limit values and count how many network edges (road segments) have each value. We will compare this to our cleaned network later.
%%time
# Turn nodes and edges into geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
# Get unique counts of road segments for each speed limit
print(edges['maxspeed'].value_counts())
print(str(len(edges)) + " edges in graph")
# can we also visualize highways / roads with higher speed limits to check accuracy?
# the code above converts the graph into an edges geodataframe, which could theoretically be filtered
# by fast road segments and mapped, e.g. in folium
25 mph 4793 30 mph 3555 35 mph 3364 40 mph 2093 45 mph 1418 20 mph 1155 55 mph 614 60 mph 279 50 mph 191 40 79 15 mph 76 70 mph 71 65 mph 54 10 mph 38 [40 mph, 45 mph] 27 [30 mph, 35 mph] 26 45,30 24 [40 mph, 35 mph] 22 70 21 25 20 [55 mph, 45 mph] 16 25, east 14 [45 mph, 35 mph] 13 [30 mph, 25 mph] 10 [45 mph, 50 mph] 8 50 8 [40 mph, 30 mph] 7 [35 mph, 25 mph] 6 [55 mph, 60 mph] 5 20 4 [70 mph, 60 mph] 3 [65 mph, 60 mph] 3 [40 mph, 45 mph, 35 mph] 3 [70 mph, 65 mph] 2 [70 mph, 45 mph, 5 mph] 2 [40, 45 mph] 2 [35 mph, 50 mph] 2 35 2 [55 mph, 65 mph] 2 [40 mph, 50 mph] 2 [15 mph, 25 mph] 2 [40 mph, 35 mph, 25 mph] 2 [15 mph, 40 mph, 30 mph] 2 [20 mph, 25 mph] 2 [30 mph, 25, east] 2 [65 mph, 55 mph] 2 [20 mph, 35 mph] 2 [55 mph, 55] 2 55 2 [15 mph, 30 mph] 2 [45 mph, 30 mph] 2 [15 mph, 45 mph] 2 [55 mph, 45, east, 50 mph] 2 [20 mph, 30 mph] 1 [5 mph, 45 mph, 35 mph] 1 [55 mph, 35 mph] 1 [5 mph, 35 mph] 1 [55 mph, 50 mph] 1 Name: maxspeed, dtype: int64 384240 edges in graph CPU times: user 43.7 s, sys: 321 ms, total: 44 s Wall time: 43.9 s
edges.head()
osmid | highway | oneway | length | name | geometry | lanes | ref | bridge | maxspeed | access | service | tunnel | junction | width | area | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
u | v | key | ||||||||||||||||
261095436 | 261095437 | 0 | 24067717 | residential | False | 46.873 | NaN | LINESTRING (-87.90237 42.10571, -87.90198 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
261095437 | 261095439 | 0 | 24067717 | residential | False | 46.317 | NaN | LINESTRING (-87.90198 42.10540, -87.90159 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
261095436 | 0 | 24067717 | residential | False | 46.873 | NaN | LINESTRING (-87.90198 42.10540, -87.90237 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
261109275 | 0 | 24069424 | residential | False | 34.892 | NaN | LINESTRING (-87.90198 42.10540, -87.90227 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
261109274 | 0 | 24069424 | residential | False | 47.866 | NaN | LINESTRING (-87.90198 42.10540, -87.90156 42.1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
Cleans the OSMNX network to work better with drive-time analysis.
Instead of using only the few speed limits in Open Street map and filling in the speed of all other roads as 30 or 35 mph the add_edge_speeds function will fill in the speed of all other roads based on the mean speed in OSM for the given road type. For example, 'residential' might have an average speed of 20 mph. The goal of this step is to account for highways that don't have speeds listed in OSM and so are listed as having 35 mph speed limits, or on the other side small alleys that don't have listed speeds and should be closer to 20 mph. .
Args:
Returns:
G = ox.speed.add_edge_speeds(G)
# two things about this function:
# 1) the work to remove nodes is hardly worth it now that OSMnx cleans graphs by default
# the function is now only pruning < 300 nodes
# 2) try using the OSMnx speed module for setting speeds, travel times
# https://osmnx.readthedocs.io/en/stable/user-reference.html#module-osmnx.speed
# just be careful about units of speed and time!
# the remainder of this code expects 'time' to be measured in minutes
# def network_setting(network):
# _nodes_removed = len([n for (n, deg) in network.out_degree() if deg ==0])
# network.remove_nodes_from([n for (n, deg) in network.out_degree() if deg ==0])
# for component in list(nx.strongly_connected_components(network)):
# if len(component)<10:
# for node in component:
# _nodes_removed+=1
# network.remove_node(node)
# for u, v, k, data in tqdm(G.edges(data=True, keys=True),position=0):
# if 'maxspeed' in data.keys():
# speed_type = type(data['maxspeed'])
# if (speed_type==str):
# # Add in try/except blocks to catch maxspeed formats that don't fit Kang et al's cases
# try:
# if len(data['maxspeed'].split(','))==2:
# data['maxspeed_fix']=float(data['maxspeed'].split(',')[0])
# elif data['maxspeed']=='signals':
# data['maxspeed_fix']=30.0 # drive speed setting as 35 miles
# else:
# data['maxspeed_fix']=float(data['maxspeed'].split()[0])
# except:
# data['maxspeed_fix']=30.0 #miles
# else:
# try:
# data['maxspeed_fix']=float(data['maxspeed'][0].split()[0])
# except:
# data['maxspeed_fix']=30.0 #miles
# else:
# data['maxspeed_fix']=30.0 #miles
# data['maxspeed_meters'] = data['maxspeed_fix']*26.8223 # convert mile per hour to meters per minute
# data['time'] = float(data['length'])/ data['maxspeed_meters'] # meters / meters per minute = minutes
# print("Removed {} nodes ({:2.4f}%) from the OSMNX network".format(_nodes_removed, _nodes_removed/float(network.number_of_nodes())))
# print("Number of nodes: {}".format(network.number_of_nodes()))
# print("Number of edges: {}".format(network.number_of_edges()))
# return(network)
%%time
#G, hospitals, grid_file, pop_data = file_import (population_dropdown.value)
#G = network_setting(G)
# Create point geometries for each node in the graph, to make constructing catchment area polygons easier
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
# Modify code to react to processor dropdown (got rid of file_import function)
100%|██████████| 383911/383911 [00:02<00:00, 165838.76it/s]
Removed 274 nodes (0.0019%) from the OSMNX network Number of nodes: 142044 Number of edges: 383911 CPU times: user 9.98 s, sys: 859 ms, total: 10.8 s Wall time: 10.7 s
Display all the unique speed limit values and count how many network edges (road segments) have each value. Compare to the previous results.
%%time
## Get unique counts for each road network
# Turn nodes and edges in geodataframes
nodes, edges = ox.graph_to_gdfs(G, nodes=True, edges=True)
# Count
edges['speed_mph'] = edges['speed_kph']*0.621371
G = ox.graph_from_gdfs(nodes, edges)
print(edges['speed_mph'].value_counts())
print(str(len(edges)) + " edges in graph")
24.357743 291413 30.012219 29822 35.231736 26353 37.344397 14985 24.979114 5604 34.983187 3364 53.624317 2200 40.016292 2093 20.008146 1872 26.656816 1793 44.987260 1418 43.371696 654 54.991334 606 55.985527 565 60.024439 277 50.020366 191 31.689921 118 24.854840 80 14.975041 76 70.028512 61 64.995407 42 10.004073 38 15.534275 34 42.253228 29 32.311292 26 28.148106 24 37.282260 24 43.495970 21 39.767744 18 49.709680 16 27.340324 12 34.796776 9 47.224196 8 31.068550 8 29.825808 8 22.369356 6 57.166132 5 59.651616 4 44.117341 4 12.427420 4 64.622584 3 19.883872 3 44.738712 3 27.961695 3 62.137100 3 32.559840 2 34.175405 2 67.108068 2 21.747985 2 32.932663 2 52.195164 1 Name: speed_mph, dtype: int64 383911 edges in graph CPU times: user 1min 54s, sys: 1.59 s, total: 1min 56s Wall time: 1min 55s
def hospital_setting(hospitals, G):
# Create an empty column
hospitals['nearest_osm']=None
# Append the neaerest osm column with each hospitals neaerest osm node
for i in tqdm(hospitals.index, desc="Find the nearest network node from hospitals", position=0):
hospitals['nearest_osm'][i] = ox.get_nearest_node(G, [hospitals['Y'][i], hospitals['X'][i]], method='euclidean') # find the nearest node from hospital location
print ('hospital setting is done')
return(hospitals)
Converts geodata to centroids
Args:
Returns:
def pop_centroid (pop_data, pop_type):
pop_data = pop_data.to_crs({'init': 'epsg:4326'})
# If pop is selected in dropdown, select at risk pop where population is greater than 0
if pop_type =="pop":
pop_data=pop_data[pop_data['OverFifty']>=0]
# If covid is selected in dropdown, select where covid cases are greater than 0
if pop_type =="covid":
pop_data=pop_data[pop_data['cases']>=0]
pop_cent = pop_data.centroid # it make the polygon to the point without any other information
# Convert to gdf
pop_centroid = gpd.GeoDataFrame()
i = 0
for point in tqdm(pop_cent, desc='Pop Centroid File Setting', position=0):
if pop_type== "pop":
pop = pop_data.iloc[i]['OverFifty']
code = pop_data.iloc[i]['GEOID']
if pop_type =="covid":
pop = pop_data.iloc[i]['cases']
code = pop_data.iloc[i].ZCTA5CE10
pop_centroid = pop_centroid.append({'code':code,'pop': pop,'geometry': point}, ignore_index=True)
i = i+1
return(pop_centroid)
Function written by Joe Holler + Derrick Burt. It is a more efficient way to calculate distance-weighted catchment areas for each hospital. The algorithm runs quicker than the original one ("calculate_catchment_area"). It first creaets a dictionary (with a node and its corresponding drive time from the hospital) of all nodes within a 30 minute drive time (using single_cource_dijkstra_path_length function). From here, two more dictionaries are constructed by querying the original one. From this dictionaries, single part convex hulls are created for each drive time interval and appended into a single list (one list with 3 polygon geometries). Within the list, the polygons are differenced from each other to produce three catchment areas.
Args:
Returns:
def dijkstra_cca_polygons(G, nearest_osm, distances, distance_unit = "time"):
'''
Before running: must assign point geometries to street nodes
# create point geometries for the entire graph
for node, data in G.nodes(data=True):
data['geometry']=Point(data['x'], data['y'])
'''
## CREATE DICTIONARIES
# create dictionary of nearest nodes
nearest_nodes_30 = nx.single_source_dijkstra_path_length(G, nearest_osm, distances[2], distance_unit) # creating the largest graph from which 10 and 20 minute drive times can be extracted from
# extract values within 20 and 10 (respectively) minutes drive times
nearest_nodes_20 = dict()
nearest_nodes_10 = dict()
for key, value in nearest_nodes_30.items():
if value <= 20:
nearest_nodes_20[key] = value
if value <= 10:
nearest_nodes_10[key] = value
## CREATE POLYGONS FOR 3 DISTANCE CATEGORIES (10 min, 20 min, 30 min)
# 30 MIN
# If the graph already has a geometry attribute with point data,
# this line will create a GeoPandas GeoDataFrame from the nearest_nodes_30 dictionary
points_30 = gpd.GeoDataFrame(gpd.GeoSeries(nx.get_node_attributes(G.subgraph(nearest_nodes_30), 'geometry')))
# This line converts the nearest_nodes_30 dictionary into a Pandas data frame and joins it to points
# left_index=True and right_index=True are options for merge() to join on the index values
points_30 = points_30.merge(pd.Series(nearest_nodes_30).to_frame(), left_index=True, right_index=True)
# Re-name the columns and set the geodataframe geometry to the geometry column
points_30 = points_30.rename(columns={'0_x':'geometry','0_y':'z'}).set_geometry('geometry')
# Create a convex hull polygon from the points
polygon_30 = gpd.GeoDataFrame(gpd.GeoSeries(points_30.unary_union.convex_hull))
polygon_30 = polygon_30.rename(columns={0:'geometry'}).set_geometry('geometry')
# 20 MIN
# Select nodes less than or equal to 20
points_20 = points_30.query("z <= 20")
# Create a convex hull polygon from the points
polygon_20 = gpd.GeoDataFrame(gpd.GeoSeries(points_20.unary_union.convex_hull))
polygon_20 = polygon_20.rename(columns={0:'geometry'}).set_geometry('geometry')
# 10 MIN
# Select nodes less than or equal to 10
points_10 = points_30.query("z <= 10")
# Create a convex hull polygon from the points
polygon_10 = gpd.GeoDataFrame(gpd.GeoSeries(points_10.unary_union.convex_hull))
polygon_10 = polygon_10.rename(columns={0:'geometry'}).set_geometry('geometry')
# Create empty list and append polygons
polygons = []
# Append
polygons.append(polygon_10)
polygons.append(polygon_20)
polygons.append(polygon_30)
# Clip the overlapping distance ploygons (create two donuts + hole)
for i in reversed(range(1, len(distances))):
polygons[i] = gpd.overlay(polygons[i], polygons[i-1], how="difference")
return polygons
Measures the effect of a single hospital on the surrounding area. (Uses dijkstra_cca_polygons
)
Args:
Returns:
def hospital_measure_acc (_thread_id, hospital, pop_data, distances, weights):
# Create polygons
polygons = dijkstra_cca_polygons(G, hospital['nearest_osm'], distances)
# Calculate accessibility measurements
num_pops = []
for j in pop_data.index:
point = pop_data['geometry'][j]
# Multiply polygons by weights
for k in range(len(polygons)):
if len(polygons[k]) > 0: # To exclude the weirdo (convex hull is not polygon)
if (point.within(polygons[k].iloc[0]["geometry"])):
num_pops.append(pop_data['pop'][j]*weights[k])
total_pop = sum(num_pops)
for i in range(len(distances)):
polygons[i]['time']=distances[i]
polygons[i]['total_pop']=total_pop
polygons[i]['hospital_icu_beds'] = float(hospital['Adult ICU'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i]['hospital_vents'] = float(hospital['Total Vent'])/polygons[i]['total_pop'] # proportion of # of beds over pops in 10 mins
polygons[i].crs = { 'init' : 'epsg:4326'}
polygons[i] = polygons[i].to_crs({'init':'epsg:32616'})
print('{:.0f}'.format(_thread_id), end=" ", flush=True)
return(_thread_id, [ polygon.copy(deep=True) for polygon in polygons ])
Parallel implementation of accessibility measurement.
Args:
Returns:
def hospital_acc_unpacker(args):
return hospital_measure_acc(*args)
# WHERE THE RESULTS ARE POOLED AND THEN REAGGREGATED
def measure_acc_par (hospitals, pop_data, network, distances, weights, num_proc = 4):
catchments = []
for distance in distances:
catchments.append(gpd.GeoDataFrame())
pool = mp.Pool(processes = num_proc)
hospital_list = [ hospitals.iloc[i] for i in range(len(hospitals)) ]
print("Calculating", len(hospital_list), "hospital catchments...\ncompleted number:", end=" ")
results = pool.map(hospital_acc_unpacker, zip(range(len(hospital_list)), hospital_list, itertools.repeat(pop_data), itertools.repeat(distances), itertools.repeat(weights)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
for i in range(len(results)):
for j in range(len(distances)):
catchments[j] = catchments[j].append(results[i][j], sort=False)
return catchments
Calculates and aggregates accessibility statistics for one catchment on our grid file.
Args:
Returns:
from collections import Counter
def overlap_calc(_id, poly, grid_file, weight, service_type):
value_dict = Counter() #set up value_dict
if type(poly.iloc[0][service_type])!=type(None): #make sure point has at least one ICU bed or ventilator
value = float(poly[service_type])*weight #weight depending on catchment area
intersect = gpd.overlay(grid_file, poly, how='intersection') #take intersections of catchment area and hexagons
intersect['overlapped']= intersect.area #calculate the area of fragments that are both catchment area and hexagon
intersect['percent'] = intersect['overlapped']/intersect['area'] #calculate percent of area of a catchment area within each hexagon
intersect=intersect[intersect['percent']>=0.5] #throw away catchments that are less than 50% in a hexagon
intersect_region = intersect['id']
for intersect_id in intersect_region:
try:
value_dict[intersect_id] +=value
except:
value_dict[intersect_id] = value
return(_id, value_dict)
def overlap_calc_unpacker(args):
return overlap_calc(*args)
Calculates how all catchment areas overlap with and affect the accessibility of each grid in our grid file.
Args:
Returns:
def overlapping_function (grid_file, catchments, service_type, weights, num_proc = 4):
grid_file[service_type]=0
pool = mp.Pool(processes = num_proc)
acc_list = []
for i in range(len(catchments)):
acc_list.extend([ catchments[i][j:j+1] for j in range(len(catchments[i])) ])
acc_weights = []
for i in range(len(catchments)):
acc_weights.extend( [weights[i]]*len(catchments[i]) )
results = pool.map(overlap_calc_unpacker, zip(range(len(acc_list)), acc_list, itertools.repeat(grid_file), acc_weights, itertools.repeat(service_type)))
pool.close()
results.sort()
results = [ r[1] for r in results ]
service_values = results[0]
for result in results[1:]:
service_values+=result
for intersect_id, value in service_values.items():
grid_file.loc[grid_file['id']==intersect_id, service_type] += value
return(grid_file)
Normalizes our result (Geodataframe) for a given resource (res).
def normalization (result, res):
result[res]=(result[res]-min(result[res]))/(max(result[res])-min(result[res]))
return result
Imports all files we need to run our code and pulls the Illinois network from OSMNX if it is not present (will take a while).
NOTE: even if we calculate accessibility for just Chicago, we want to use the Illinois network (or at least we should not use the Chicago network) because using the Chicago network will result in hospitals near but outside of Chicago having an infinite distance (unreachable because roads do not extend past Chicago).
Args:
Returns:
def output_map(output_grid, base_map, hospitals, resource):
ax=output_grid.plot(column=resource, cmap='PuBuGn',figsize=(18,12), legend=True, zorder=1)
# Next two lines set bounds for our x- and y-axes because it looks like there's a weird
# Point at the bottom left of the map that's messing up our frame (Maja)
ax.set_xlim([314000, 370000])
ax.set_ylim([540000, 616000])
base_map.plot(ax=ax, facecolor="none", edgecolor='gray', lw=0.1)
hospitals.plot(ax=ax, markersize=10, zorder=1, c='blue')
Below you can customize the input of the model:
import ipywidgets
from IPython.display import display
processor_dropdown = ipywidgets.Dropdown( options=[("1", 1), ("2", 2), ("3", 3), ("4", 4)],
value = 4, description = "Processor: ")
population_dropdown = ipywidgets.Dropdown( options=[("Population at Risk", "pop"), ("COVID-19 Patients", "covid") ],
value = "pop", description = "Population: ")
resource_dropdown = ipywidgets.Dropdown( options=[("ICU Beds", "hospital_icu_beds"), ("Ventilators", "hospital_vents") ],
value = "hospital_icu_beds", description = "Resource: ")
hospital_dropdown = ipywidgets.Dropdown( options=[("All hospitals", "hospitals"), ("Subset", "hospital_subset") ],
value = "hospitals", description = "Hospital:")
display(processor_dropdown,population_dropdown,resource_dropdown,hospital_dropdown)
Dropdown(description='Processor: ', index=3, options=(('1', 1), ('2', 2), ('3', 3), ('4', 4)), value=4)
Dropdown(description='Population: ', options=(('Population at Risk', 'pop'), ('COVID-19 Patients', 'covid')), …
Dropdown(description='Resource: ', options=(('ICU Beds', 'hospital_icu_beds'), ('Ventilators', 'hospital_vents…
Dropdown(description='Hospital:', options=(('All hospitals', 'hospitals'), ('Subset', 'hospital_subset')), val…
if population_dropdown.value == "pop":
pop_data = pop_centroid(atrisk_data, population_dropdown.value)
elif population_dropdown.value == "covid":
pop_data = pop_centroid(covid_data, population_dropdown.value)
distances=[10,20,30] # Distances in travel time
weights=[1.0, 0.68, 0.22] # Weights where weights[0] is applied to distances[0]
# Other weighting options representing different distance decays
# weights1, weights2, weights3 = [1.0, 0.42, 0.09], [1.0, 0.75, 0.5], [1.0, 0.5, 0.1]
# it is surprising how long this function takes just to calculate centroids.
# why not do it with the geopandas/pandas functions rather than iterating through every item?
Pop Centroid File Setting: 100%|██████████| 3121/3121 [05:42<00:00, 9.10it/s]
If you have already run this code and changed the Hospital selection, rerun the Load Hospital Data block.
# Set hospitals according to hospital dropdown
if hospital_dropdown.value == "hospital_subset":
hospitals = hospital_setting(hospitals[:1], G)
else:
hospitals = hospital_setting(hospitals, G)
resources = ["hospital_icu_beds", "hospital_vents"] # resources
# this is also slower than it needs to be; if network nodes and hospitals are both
# geopandas data frames, it should be possible to do a much faster spatial join rather than iterating through every hospital
Find the nearest network node from hospitals: 100%|██████████| 66/66 [01:50<00:00, 1.67s/it]
hospital setting is done
# # Create point geometries for entire graph
# # what is the pupose of the following two lines? Can this be deleted?
# for node, data in G.nodes(data=True):
# data['geometry']=Point(data['x'], data['y'])
# which hospital to visualize?
fighosp = 0
# Create catchment for hospital 0
poly = dijkstra_cca_polygons(G, hospitals['nearest_osm'][fighosp], distances)
# Reproject polygons
for i in range(len(poly)):
poly[i].crs = { 'init' : 'epsg:4326'}
poly[i] = poly[i].to_crs({'init':'epsg:32616'})
# Reproject hospitals
# Possible to map from the hospitals data rather than creating hospital_subset?
hospital_subset = hospitals.iloc[[fighosp]].to_crs(epsg=32616)
fig, ax = plt.subplots(figsize=(12,8))
min_10 = poly[0].plot(ax=ax, color="royalblue", label="10 min drive")
min_20 = poly[1].plot(ax=ax, color="cornflowerblue", label="20 min drive")
min_30 = poly[2].plot(ax=ax, color="lightsteelblue", label="30 min drive")
hospital_subset.plot(ax=ax, color="red", legend=True, label = "hospital")
# Add legend
ax.legend()
<matplotlib.legend.Legend at 0x7fcff54b52b0>
poly
%%time
catchments = measure_acc_par(hospitals, pop_data, G, distances, weights, num_proc=processor_dropdown.value)
Calculating 66 hospital catchments... completed number: 0 15 10 5 1 16 611 2 7 12 17 3 8 13 18 4 9 14 19 20 25 30 35 21 26 31 22 36 27 32 37 23 28 33 38 24 29 39 34 40 55 45 50 41 46 56 42 51 47 57 43 52 48 58 53 44 59 49 54 60 65 61 62 63 64 CPU times: user 4.09 s, sys: 1.37 s, total: 5.46 s Wall time: 2min 55s
%%time
for j in range(len(catchments)):
catchments[j] = catchments[j][catchments[j][resource_dropdown.value]!=float('inf')]
result=overlapping_function(grid_file, catchments, resource_dropdown.value, weights, num_proc=processor_dropdown.value)
CPU times: user 12.4 s, sys: 944 ms, total: 13.3 s Wall time: 33.1 s
%%time
result = normalization (result, resource_dropdown.value)
CPU times: user 7.78 ms, sys: 2.1 ms, total: 9.87 ms Wall time: 7.91 ms
result.head()
left | top | right | bottom | id | area | geometry | hospital_vents | |
---|---|---|---|---|---|---|---|---|
0 | 440843.416087 | 4.638515e+06 | 441420.766356 | 4.638015e+06 | 4158 | 216661.173 | POLYGON ((440843.416 4638265.403, 440987.754 4... | 0.921406 |
1 | 440843.416087 | 4.638015e+06 | 441420.766356 | 4.637515e+06 | 4159 | 216661.168 | POLYGON ((440843.416 4637765.403, 440987.754 4... | 0.921406 |
2 | 440843.416087 | 4.639515e+06 | 441420.766356 | 4.639015e+06 | 4156 | 216661.169 | POLYGON ((440843.416 4639265.403, 440987.754 4... | 0.956446 |
3 | 440843.416087 | 4.639015e+06 | 441420.766356 | 4.638515e+06 | 4157 | 216661.171 | POLYGON ((440843.416 4638765.403, 440987.754 4... | 0.925965 |
4 | 440843.416087 | 4.640515e+06 | 441420.766356 | 4.640015e+06 | 4154 | 216661.171 | POLYGON ((440843.416 4640265.403, 440987.754 4... | 0.963625 |
While the OpenStreetMap data that we imported contained speed limits for some road segments, the vast majority of edges of our network graph did not have any speed limit attribute. This attribute is important to the way that our network analysis runs because we need to be able to compute travel times to and from hospitals. The way that the original study dealt with this was by setting the speed limit of all of the road segments that didn’t have an assigned speed limit equal to 35 mph. In the reanalysis of the study, all unassigned speed limits were set to 30 mph. Setting almost all of the roads equal to 30 or 35 dramatically decreases the speeds of highways that may not have speed limits in OpenStreetMap and increases the speeds of residential roads. To solve this issue we decided to use the osmnx.speed function to assign speed limit values. This function kept the speed limit values for those that had listed values, but then for all of the other road segments it assigned a value based on the mean speed limit for that specific road type eg. highway, residential, primary, secondary. This meant that we got a much more nuanced selection of speed limits. We ended up getting that the most common speed limit was around 25 mph. This would make sense given how this would be the assumed speed limit for most residential roads which would make up a large number of the total road segments. The major issue with earlier ways of assigning speed limits was the assumption that all roads essentially behaved in the same way. We still face this issue to an extent. We can call this threat to validity an assumption of spatial homogeneity — we are assuming that all roads of a given type will be the same and we ignore features like one way roads, traffic, or modalities of transportation besides car that may add additional friction to our model. This frictionless view of a street network does not reflect geographic reality, however, we have made strides to improve the accuracy of our road network.
%%time
hospitals = hospitals.to_crs({'init': 'epsg:26971'})
result = result.to_crs({'init': 'epsg:26971'})
output_map(result, pop_data, hospitals, resource_dropdown.value)
CPU times: user 4.62 s, sys: 592 ms, total: 5.21 s Wall time: 4.6 s
Classified Accessibility Outputs
Our tweaking of the osm speed limit data to create a more detailed model of road speeds may change how we should view the original Kang et al. study. The piece of the study that we should perhaps be most curious about or wary of is the construction of catchment areas. The catchment area calculations depend significantly on the road network graph that has been constructed and in the original study, almost all of the roads had a speed limit set to 35 mph. Given our most recent reanalysis which found that most road segments should actually have a speed limit closer to 25 mph, it is probable that the original studies catchment areas were larger than they should have been. For example, you could get to a hospital in 20 minutes from a further distance if you were driving at 35 mph than at 25 mph. Therefore, it is possible that Kang et al. overestimate spatial accessibility to COVID-19 resources. However, given our most recent reanalysis more highways have speed limits set to 60 mph instead of 35 mph so the original catchment areas may instead underestimate spatial accesibility to healthcare resources for those who live close to a large highway that takes them close to the hospital. Overall, our tweaking of speed limit data helps us examine the struggle of adding complexity to our network model while also knowing that we will not be able to include all variables that could potentially cause friction in a transportation network. Chicago is not an isometric plane and while our network graph introduces more complexity we are unable to perfectly measure transportation without additional data.
Luo, W., & Qi, Y. (2009). An enhanced two-step floating catchment area (E2SFCA) method for measuring spatial accessibility to primary care physicians. Health & place, 15(4), 1100-1107.