Abstract¶
This study is a reproduction of a Middebury Geography Indroductory GIS Lab problem titled "Exposure to Environmental Hazards: Hurricane Harvey." The original study focused on comparing levels of flooding across block groups of different majority demographics. The original study used the desktop GIS QGIS to determine the majority racial group in every block group in Harris County, Texas and then compared these data to the extent of flooding from Hurricane Harvey. This reproduction study aims to reproduce the same results from the original lab problem using a Python computation notebook (ipynb) as opposed to a desktop GIS workflow. The notebook will potentially serve as an opportunity to demonstrate using Python to complete simple GIS problems in the context of an introductory Human Geography with GIS class.
Study metadata¶
Key words
: Comma-separated list of keywords (tags) for searchability. Geographers often use one or two keywords each for: theory, geographic context, and methods.Subject
: select from the BePress TaxonomyDate created
: November 23, 2023Date modified
: December 17, 2023Spatial Coverage
: Harris County, Texas OpenStreetMap LinkSpatial Resolution
: Census Block Group LevelSpatial Reference System
: EPSG:6587Temporal Coverage
: September 2017Temporal Resolution
: ACS 5 year estimatesFunding Name
: Middlebury CollegeFunding Title
: N/AAward info URI
: N/AAward number
: N/A
Original study spatio-temporal metadata¶
Spatial Coverage
: Harris County, Texas OpenStreetMap LinkSpatial Resolution
: Census Block Group LevelSpatial Reference System
: EPSG:6587Temporal Coverage
: September 2017Temporal Resolution
: ACS 5 year estimates
Study design¶
The study is setup to be an educational example. The primary research question is which regions defined by majority racial/ethnic group had the most flooding during Hurricane Harvey. It used a zonal statistic to calculate the amount of flooding in each region of Harris County. The original study is a QGIS workflow, and this replication uses a ipynb python notebook.
Materials and procedure¶
Computational environment¶
Maintaining a reproducible computational environment requires some conscious choices in package management.
Please refer to requirements.txt
for details.
# Import modules, define directories
from pyhere import here
import pandas as pd
import geopandas as gpd
import folium
import matplotlib
import numpy as np
import rasterio
from rasterio.plot import show
import fiona
from rasterstats import zonal_stats
from matplotlib import pyplot as plt
# You can define your own shortcuts for file paths:
path = {
"dscr": here("data", "scratch"),
"drpub": here("data", "raw", "public"),
"drpriv": here("data", "raw", "private"),
"ddpub": here("data", "derived", "public"),
"ddpriv": here("data", "derived", "private"),
"rfig": here("results", "figures"),
"roth": here("results", "other"),
"rtab": here("results", "tables"),
"dmet": here("data", "metadata")
}
Data and variables¶
Describe the data sources and variables to be used. Data sources may include plans for observing and recording primary data or descriptions of secondary data. For secondary data sources with numerous variables, the analysis plan authors may focus on documenting only the variables intended for use in the study.
Primary data sources for the study are to include ... . Secondary data sources for the study are to include ... .
Each of the next subsections describes one data source.
blockgroups.shp¶
Title
: blockgroups.shpAbstract
: Shapefile containing the geometry and GEOID over every Census block group in Harris County, Texas.Spatial Coverage
: Harris County, Texas. OpenStreetMap LinkSpatial Resolution
: Census Block GroupSpatial Reference System
: EPSG: 6587Temporal Coverage
: N/ATemporal Resolution
: N/ALineage
: United States Census Bureau delineation, gathered through US Census API https://www.census.gov/developers/Distribution
: Distributed publicly indefinetely by the US Census.Constraints
: NoneData Quality
: Opening in a graphical GIS like QGIS and verifying existence of all block groups.Variables
: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)Label
: GEOIDAlias
: GEOIDDefinition
: Unique identifier for each block groupType
: IntegerAccuracy
: One per block group.Domain
: 482011000001 to 482019801001Missing Data Value(s)
: N/AMissing Data Frequency
: None for GEOID
Other variables are not significant for this analysis Prior observation: - [x] metadata and descriptive statistics have been observed
Import blockgroups.shp¶
blockgroups = gpd.read_file(here(path["drpub"], "blockgroups.shp"))
blockgroups = gpd.GeoDataFrame(blockgroups)
blockgroups.head()
STATEFP | COUNTYFP | TRACTCE | BLKGRPCE | AFFGEOID | GEOID | LSAD | ALAND | AWATER | GEONAME | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 48 | 201 | 311000 | 1 | 1500000US482013110001 | 482013110001 | BG | 616969.0 | 47009.0 | Block Group 1, Census Tract 3110, Harris Count... | POLYGON ((958404.137 4217699.295, 958413.144 4... |
1 | 48 | 201 | 311000 | 4 | 1500000US482013110004 | 482013110004 | BG | 408595.0 | 25333.0 | Block Group 4, Census Tract 3110, Harris Count... | POLYGON ((957048.814 4217692.784, 957496.427 4... |
2 | 48 | 201 | 311100 | 1 | 1500000US482013111001 | 482013111001 | BG | 1018525.0 | 213804.0 | Block Group 1, Census Tract 3111, Harris Count... | POLYGON ((958975.179 4217311.881, 958892.738 4... |
3 | 48 | 201 | 311100 | 3 | 1500000US482013111003 | 482013111003 | BG | 484061.0 | 36045.0 | Block Group 3, Census Tract 3111, Harris Count... | POLYGON ((958773.280 4216120.376, 959779.865 4... |
4 | 48 | 201 | 311100 | 4 | 1500000US482013111004 | 482013111004 | BG | 547376.0 | 0.0 | Block Group 4, Census Tract 3111, Harris Count... | POLYGON ((958756.220 4216618.402, 959659.603 4... |
blockgroup_demographic_data.csv¶
Title
: blockgroup_demographic_data.csvAbstract
: Data table of American Community Survey demographic data by Census Block groups for Harris County, Texas.Spatial Coverage
: Harris County, Texas. OpenStreetMap LinkSpatial Resolution
: Census Block GroupsSpatial Reference System
: NoneTemporal Coverage
: 2012-2017Temporal Resolution
: ACS 5-year estimatesLineage
: https://www.census.gov/programs-surveys/acs/guidance/estimates.htmlDistribution
: Distributed publicly indefinetely by the US Census.Constraints
: NoneData Quality
: NoneVariables
: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)Label
: variable name as used in the data or codeAlias
: intuitive natural language nameDefinition
: Short description or definition of the variable. Include measurement units in description.Type
: data type, e.g. character string, integer, realAccuracy
: e.g. uncertainty of measurementsDomain
: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebookMissing Data Value(s)
: Values used to represent missing data and frequency of missing data observationsMissing Data Frequency
: Frequency of missing data observations: not yet known for data to be collected
Label | Alias | Definition | Type | Accuracy | Domain | Missing Data Value(s) | Missing Data Frequency |
---|---|---|---|---|---|---|---|
GEOID | GEOID | Unique identifier for each block group | Integer | N/A | 482011000001 to 482019801001 | N/A | None |
BO3002_001 | Total Population | Number of people in the block group | Integer | See ACS | 9-21758 | N/A | None |
BO3002_003 | White Population | Number of people in the White racial/ethnic group in the block group | Integer | See ACS | 0-9199 | N/A | None |
BO3002_004 | Black Population | Number of people in the Black racial/ethnic group in the block group | Integer | See ACS | 0-5258 | N/A | None |
BO3002_006 | Asian Population | Number of people in the Asian racial/ethnic group in the block group | Integer | See ACS | 0-3418 | N/A | None |
BO3002_012 | Latinx Population | Number of people in the Latinx racial/ethnic group in the block group | Integer | See ACS | 0-11408 | N/A | None |
blockgroup_demographic_data = pd.read_csv(here(path["drpub"],'blockgroup_demographic_data.csv'), dtype=str,encoding='latin-1')
blockgroup_demographic_data.head()
GEOID | B03002_001 | B03002_002 | B03002_003 | B03002_004 | B03002_005 | B03002_006 | B03002_007 | B03002_008 | B03002_009 | ... | B03002_012 | B03002_013 | B03002_014 | B03002_015 | B03002_016 | B03002_017 | B03002_018 | B03002_019 | B03002_020 | B03002_021 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 482013110001 | 583 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 583 | 573 | 0 | 0 | 0 | 0 | 10 | 0 | 0 | 0 |
1 | 482013110004 | 1869 | 22 | 22 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1847 | 1818 | 0 | 0 | 0 | 0 | 29 | 0 | 0 | 0 |
2 | 482013111001 | 1046 | 11 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1035 | 895 | 4 | 0 | 0 | 0 | 136 | 0 | 0 | 0 |
3 | 482013111003 | 1639 | 112 | 112 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 1527 | 1192 | 0 | 0 | 0 | 0 | 315 | 20 | 0 | 20 |
4 | 482013111004 | 1759 | 48 | 0 | 16 | 0 | 32 | 0 | 0 | 0 | ... | 1711 | 1476 | 11 | 7 | 0 | 0 | 173 | 44 | 44 | 0 |
5 rows × 22 columns
actual_flood_10.tif¶
Title
: actual_flood_10.tifAbstract
: Raster image of Harris County where 1's represent flooded area from Hurricane Harvey and and nodata (raster equivalent of NULL) in all other locationsSpatial Coverage
: Harris County, Texas. OpenStreetMap LinkSpatial Resolution
: 10 meter resolutionSpatial Reference System
: EPSG: 6587Temporal Coverage
: September 2017Temporal Resolution
: Worth investigating more, a specific day after the flooding.Lineage
: I recieved this data from the GEOG 120 professors. The original sources are: The Flood Observatory https://floodobservatory.colorado.edu/ and the Harris County Flood Control District: https://www.hcfcd.org/Hurricane-Harvey. The steps taken from the original source to the version I recieved are unclear.Distribution
: This exact file may not be publicly available.Constraints
: Middlebury Course MaterialData Quality
: Inspect in QGIS.Variables
: Only has one Band. 1 = Flooded. Nodata = Not flooded.
actual_flood_10 = rasterio.open(here(path["drpub"],'actual_flood_10.tif'))
ax = show((actual_flood_10, 1))
Bias and threats to validity¶
One of the main geographic threat to validity in this study is the modifiable areal unit problem. In the case of the original study the data sources are aggregated at the level of census block groups. In many cases the unit of aggregation used for a study can dramatically change the output of an analysis. For example, the map of majority racial groups in Harris County aggregated at the block group level would be much more complex than a map at the tract level but it would not be as detailed may show different trends than a map produced with data at the block level. The original study unit size is based upon the scale of data available, which often determines the unit used. However, we cannot be sure that the results of our analysis wouldn't be different if we used a finer unit of analysis, therefore the unit of aggregation is a threat to validity in this study. Additionally, another threat to this study is the confusion of spatial and a-spatial causation. Flooding is an example of a variable that is often considered to be based exlusivily on the physical geographic of a place. However, this may lead to a lack of focus on variables such as emergency preparedness that may impact the extent of flooding. Another threat to validity is the common assumption that all locations within a delineated region are the same. In the case of the Harvey flooding study we may see that a region that has a predominantly white population has a lot of flooding, but we are not taking into consideration the distribution of population within that area and cannot tell whether the flooding in that region is overlapping with the white population or if there are other factors at play such as flooding in undeveloped wetlands within the region, which would not have a significant human geography impact. Finally, there may be a boundary effect from the border of Harris County. The northwest border of Harris County is a river, thus it follows that the lowland area surrounding it would be more likely to flood. Thus, by not extending the extent of the study, we may see higher flooded percentages in regions that touch this border.
Data transformations¶
Describe all data transformations planned to prepare data sources for analysis. This section should explain with the fullest detail possible how to transform data from the raw state at the time of acquisition or observation, to the pre-processed derived state ready for the main analysis. Including steps to check and mitigate sources of bias and threats to validity. The method may anticipate contingencies, e.g. tests for normality and alternative decisions to make based on the results of the test. More specifically, all the geographic and variable transformations required to prepare input data as described in the data and variables section above to match the study's spatio-temporal characteristics as described in the study metadata and study design sections. Visual workflow diagrams may help communicate the methodology in this section.
Examples of geographic transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.
Examples of variable transformations include standardization, normalization, constructed variables, imputation, classification, etc.
Be sure to include any steps planned to exclude observations with missing or outlier data, to group observations by attribute or geographic criteria, or to impute missing data or apply spatial or temporal interpolation.
Goal 1: Load census data into block groups¶
bg_merged = blockgroups.merge(blockgroup_demographic_data, on = "GEOID", how = "left")
bgData = gpd.GeoDataFrame(bg_merged)
bgData.rename(
columns={
'B03002_001': 'Total',
'B03002_003' : 'White',
'B03002_004' : 'Black',
'B03002_006' : 'Asian',
'B03002_012' : 'Latinx',
},
inplace=True
)
bgData = bgData.drop(columns=['STATEFP', 'COUNTYFP', 'TRACTCE', 'BLKGRPCE', 'AFFGEOID','LSAD', 'ALAND', 'AWATER', 'GEONAME','B03002_002','B03002_005','B03002_007', 'B03002_008','B03002_009', 'B03002_010', 'B03002_011','B03002_013','B03002_014', 'B03002_015', 'B03002_016', 'B03002_017', 'B03002_018','B03002_019', 'B03002_020', 'B03002_021'])
# bgData = bgData.astype(
# {'GEOID': 'int','Total': 'float', 'White': 'float', 'Black': 'float', 'Asian': 'float','Latinx': 'float'}).dtypes
bgData[['Total', 'White', 'Black', 'Asian','Latinx']] = bgData[['Total', 'White', 'Black', 'Asian','Latinx']].apply(pd.to_numeric)
print(bgData.dtypes)
#bgData.columns
#bgData.plot()
#print(bgData)
GEOID object geometry geometry Total int64 White int64 Black int64 Asian int64 Latinx int64 dtype: object
Goal 2: Create regions by majority groups¶
## Calculate percentages of each majority group
bgData["pctAsian"] = bgData.Asian / bgData.Total * 100
bgData["pctBlack"] = bgData.Black / bgData.Total * 100
bgData["pctLatinx"] = bgData.Latinx / bgData.Total * 100
bgData["pctWhite"] = bgData.White / bgData.Total * 100
bgData.plot(column='pctLatinx', legend=True)
<Axes: >
## Create majority group field
def assign_major_group(row):
if row['pctAsian'] >= 60:
return 'Asian'
elif row['pctBlack'] >= 60:
return 'Black'
elif row['pctLatinx'] >= 60:
return 'Latinx'
elif row['pctWhite'] >= 60:
return 'White'
else:
return 'Mixed'
bgData['majorGrp'] = bgData.apply(assign_major_group, axis=1)
bgData.head(10)
GEOID | geometry | Total | White | Black | Asian | Latinx | pctAsian | pctBlack | pctLatinx | pctWhite | majorGrp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 482013110001 | POLYGON ((958404.137 4217699.295, 958413.144 4... | 583 | 0 | 0 | 0 | 583 | 0.000000 | 0.000000 | 100.000000 | 0.000000 | Latinx |
1 | 482013110004 | POLYGON ((957048.814 4217692.784, 957496.427 4... | 1869 | 22 | 0 | 0 | 1847 | 0.000000 | 0.000000 | 98.822900 | 1.177100 | Latinx |
2 | 482013111001 | POLYGON ((958975.179 4217311.881, 958892.738 4... | 1046 | 11 | 0 | 0 | 1035 | 0.000000 | 0.000000 | 98.948375 | 1.051625 | Latinx |
3 | 482013111003 | POLYGON ((958773.280 4216120.376, 959779.865 4... | 1639 | 112 | 0 | 0 | 1527 | 0.000000 | 0.000000 | 93.166565 | 6.833435 | Latinx |
4 | 482013111004 | POLYGON ((958756.220 4216618.402, 959659.603 4... | 1759 | 0 | 16 | 32 | 1711 | 1.819215 | 0.909608 | 97.271177 | 0.000000 | Latinx |
5 | 482013131001 | POLYGON ((948490.088 4214006.390, 948838.150 4... | 2744 | 1567 | 347 | 434 | 375 | 15.816327 | 12.645773 | 13.666181 | 57.106414 | Mixed |
6 | 482013111002 | POLYGON ((958538.363 4215935.056, 958611.621 4... | 1092 | 7 | 0 | 0 | 1085 | 0.000000 | 0.000000 | 99.358974 | 0.641026 | Latinx |
7 | 482013131002 | POLYGON ((947852.561 4213019.383, 947966.900 4... | 652 | 304 | 85 | 119 | 88 | 18.251534 | 13.036810 | 13.496933 | 46.625767 | Mixed |
8 | 482015431001 | POLYGON ((896848.263 4250165.408, 897077.183 4... | 3498 | 1442 | 322 | 19 | 1490 | 0.543168 | 9.205260 | 42.595769 | 41.223556 | Mixed |
9 | 482015502001 | POLYGON ((943861.867 4241534.858, 944139.303 4... | 2256 | 143 | 1368 | 0 | 745 | 0.000000 | 60.638298 | 33.023050 | 6.338652 | Black |
## Group by majority groups and dissolve geometry
bgData['blockGroups'] = 1
#bgData['majorGrp2'] = bgData.majorGrp
group_sums = bgData.groupby('majorGrp')[['blockGroups', 'Total', 'White', 'Black', 'Asian', 'Latinx']].sum().reset_index()
# Step 2: Merge the sum information with the original GeoDataFrame
bgDataWithSums = pd.merge(bgData, group_sums, on='majorGrp', how = 'inner')
# Step 3: Dissolve based on 'majorGrp' and calculate the sum
dissolved = bgDataWithSums.dissolve('majorGrp', aggfunc='sum')
# Step 4: Create a new GeoDataFrame with the dissolved result
major_grps = gpd.GeoDataFrame(dissolved)
major_grps = major_grps.drop(columns=['blockGroups_y','Total_y','White_y','Black_y','Asian_y','Latinx_y'])
major_grps.rename(
columns={
'blockGroups_x': 'blockGrps',
'Total_x' : 'Total',
'White_x' : 'White',
'Black_x' : 'Black',
'Asian_x' : 'Asian',
'Latinx_x' : 'Latinx'
},
inplace=True
)
major_grps.head()
#bgData.plot(column='majorGrp', legend=True)
/Users/colmanbashore/anaconda3/envs/flooding/lib/python3.9/site-packages/geopandas/geodataframe.py:1676: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. aggregated_data = data.groupby(**groupby_kwargs).agg(aggfunc)
geometry | Total | White | Black | Asian | Latinx | pctAsian | pctBlack | pctLatinx | pctWhite | blockGrps | |
---|---|---|---|---|---|---|---|---|---|---|---|
majorGrp | |||||||||||
Asian | MULTIPOLYGON (((926751.990 4209490.058, 926783... | 3814 | 557 | 157 | 2431 | 637 | 192.226138 | 16.971301 | 47.205665 | 41.411103 | 3 |
Black | MULTIPOLYGON (((949698.842 4206031.461, 949639... | 264578 | 10961 | 199801 | 3261 | 47251 | 180.146080 | 13451.646456 | 3023.093058 | 650.237696 | 175 |
Latinx | MULTIPOLYGON (((965411.937 4198913.656, 965055... | 1219893 | 112180 | 109390 | 27012 | 962544 | 1195.685395 | 4933.749778 | 51520.813364 | 6210.126096 | 643 |
Mixed | MULTIPOLYGON (((970393.326 4194192.233, 969688... | 2213832 | 661709 | 497957 | 224801 | 777050 | 8318.759023 | 19525.925998 | 30700.430404 | 25805.788462 | 864 |
White | MULTIPOLYGON (((978544.008 4193972.941, 978565... | 823402 | 601169 | 30980 | 49604 | 123053 | 2669.432784 | 1483.902465 | 6311.791195 | 34404.114891 | 459 |
Analysis¶
Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions. This section should explicitly define any spatial / statistical models and their parameters, including grouping criteria, weighting criteria, and significance thresholds. Also explain any follow-up analyses or validations.
Goal 3: Find flooded area in each group and calculate pct¶
# Zonal Statistics
# Filter out invalid geometries
major_grps = major_grps[major_grps.geometry.is_valid]
major_grps.to_file(here(path["ddpub"],'major_grps.shp'), driver='ESRI Shapefile')
# Specify the path to the raster file
raster_file = here(path["drpub"],'actual_flood_10.tif')
# Read raster data using rasterio
with rasterio.open(raster_file) as src:
# Get raster values as a NumPy array
raster_data = src.read(1)
with fiona.open(here(path["ddpub"],'major_grps.shp')) as src:
zs = zonal_stats(src, raster_file, stats="count", all_touched=True)
print(zs)
flood_pixels = [value for dictionary in zs for value in dictionary.values()]
bgMajorFld = gpd.read_file(here(path["ddpub"],'major_grps.shp'))
bgMajorFld['fl_count'] = flood_pixels
# Display the GeoDataFrame with the new "flood" column
print(bgMajorFld['fl_count'])
/Users/colmanbashore/anaconda3/envs/flooding/lib/python3.9/site-packages/rasterstats/main.py:151: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead. if 'Point' in geom.type:
[{'count': 2165}, {'count': 645888}, {'count': 3181885}, {'count': 11415356}, {'count': 4784908}] 0 2165 1 645888 2 3181885 3 11415356 4 4784908 Name: fl_count, dtype: int64
# Field Calculator Flooded Area
bgMajorFld['flArea'] = bgMajorFld.fl_count * 10 * 10
bgMajorFld['totArea'] = (bgMajorFld.geometry.area).round(2)
bgMajorFld['pctFlood'] = (bgMajorFld.flArea / bgMajorFld.totArea * 100).round(2)
bgMajorFld.head()
columns_to_export = ['majorGrp', 'blockGrps','flArea', 'totArea', 'pctFlood']
bgMajorFld[columns_to_export].to_csv(here(path["rtab"],'MajGrpFld.csv'), index=False)
bgMajorFld.to_csv(here(path["ddpub"],'bgMajorFld.csv'), index=False)
Results¶
The final results for this study are the Final Table which shows number of block groups, flooded area, total area, and percent flooded in each majority group region. The other results are a map of percent flooding by each region, and maps of the majority group regions and the flooding.
# Final Table
Final_Table = pd.read_csv(here(path["rtab"],'MajGrpFld.csv'), dtype=str,encoding='latin-1')
Final_Table.head()
majorGrp | blockGrps | flArea | totArea | pctFlood | |
---|---|---|---|---|---|
0 | Asian | 3 | 216500 | 952855.79 | 22.72 |
1 | Black | 175 | 64588800 | 206749828.71 | 31.24 |
2 | Latinx | 643 | 318188500 | 789346589.94 | 40.31 |
3 | Mixed | 864 | 1141535600 | 2477275416.41 | 46.08 |
4 | White | 459 | 478490800 | 1114263551.36 | 42.94 |
The final table shows how most of the flooding ocurred in the regions that did not have a dominant racial/ethnic group. However, this region had the most number of block groups and most total area. The majority groups that had the highester percentage of area flooded were White and Latinx. This final table should be close to the same as the final table from the original study. The only differences were the exact flooded area numbers which were calculated using different zonal statistic tools. The final flood percentages were very close to the originals.
# Final Map
Final_Map = bgMajorFld.plot(column = 'pctFlood', legend = True, cmap = 'Blues', scheme = 'FisherJenks')
print("Percent Flooded by Block Groups")
plt.savefig(here(path["rfig"],'bgMajorFld.png'))
Percent Flooded by Block Groups
/Users/colmanbashore/anaconda3/envs/flooding/lib/python3.9/site-packages/mapclassify/classifiers.py:1860: UserWarning: Numba not installed. Using slow pure python version. warnings.warn(
Percent Flooded by Block Groups¶
This map is designed to match the final map from the original study. It shows the percentage flooded in each majority group. Besides small stylistic elements it matches the original.
# Map of Majority Groups
bgMajorFld.plot(column='majorGrp', legend = True)
plt.savefig(here(path["rfig"],'majorGrps.png'))
Majority Groups in Harris County, Texas¶
# Map of Flooding Extent
with rasterio.open(here(path["drpub"],'actual_flood_10.tif')) as src:
# Read the raster data
raster_data = src.read(1)
# Plot the raster image
plt.imshow(raster_data, cmap='Blues')
#plt.colorbar(label='Pixel Values')
plt.title('Flooding Extent')
# Save the plot as a PNG file
plt.savefig(here(path["rfig"],'actualFlood.png'))
# Show the plot (optional)
plt.show()
Flooding extent from Hurricane Harvey in Harris County, Texas¶
The previous two maps are designed to fill the place of another map deliverable from the original study. The original study produced a map of majority groups with a flooding layer on top. This reproduction was not able to replicate this specific figure but the two previous figures show the same layers.
Discussion¶
The success of both the original study and this replication are both challenging to evaluate given the design of the study as an educational problem. The scientific research question is the evaluation of the environmental justice and spatial patterns of flooding from Hurricane Harvey. The reproduction found the same patterns and similar percentage flooded as the original study. Both studies found the highest percentage of flooding in Mixed majority group regions, followed by White and Latinx regions. In this regard, the reproduction was a success. The main goal was to translate the existing QGIS workflow into python code in a way that this notebook could be used as a learning example in future courses teaching python as a GIS tool. This reproduction succesfully translated each data fransformation and analysis step into reproducible python code. The visualization of results is not entirely complete, and further reproductions of this study could improve it through visualization and cartography in python. However, this computational notebook successfully implements a simple GIS problem in a reproducible way and can serve as a building block for future python for GIS educational projects.
Integrity Statement¶
The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research.
Acknowledgements¶
This project was made possible as part of the Middlebury Geography course Open Source GIScience taught by Professor Joseph Holler.
This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:10.17605/OSF.IO/W29MQ
References¶
Middlebury Geography Department