Column Name | Description | Type |
---|---|---|
pin | Parcel Identification Number (PIN) | Plain Text |
tax_year | Tax year | Number |
class | Property class | Plain Text |
township_code | Township code | Plain Text |
township_name | Township name | Plain Text |
mailed_bldg | Assessor mailed building value | Number |
mailed_land | Assessor mailed land value | Number |
mailed_tot | Assessor mailed total value | Number |
certified_bldg | Assessor certified building value | Number |
certified_land | Assessor certified land value | Number |
certified_tot | Assessor certified total value | Number |
board_bldg | Board of Review certified building value | Number |
board_land | Board of Review certified land value | Number |
board_tot | Board of Review certified total value | Number |
Part 1 (Due 2/13, 11:59pm)
You have been tasked with undertaking a multi-part analysis of homes in Cook County, Illinois. You are provided with a database to facilitate this analysis. This database was constructed from the Cook County Open Data portal. More information is included in the database section below. Note that the database must be downloaded.
Four tables are provided:
assessments - 2021 to 2023 (not finalized)
See: https://datacatalog.cookcountyil.gov/Property-Taxation/Assessor-Assessed-Values/uzyt-m557/about_data
View Table Columns
characteristics - Tax year 2023 characteristics. See: https://datacatalog.cookcountyil.gov/Property-Taxation/Assessor-Single-and-Multi-Family-Improvement-Chara/x54s-btds/about_data
View Table Columns
Column Name | Description | Type |
---|---|---|
pin | Parcel Identification Number (PIN) | Plain Text |
tax_year | Tax year | Number |
card_num | Card number. Each card is an improvement/building on the parcel | Number |
class | Property class | Plain Text |
township_code | Township code | Plain Text |
proration_key_pin | Tieback key PIN. Prorated properties (whose value is split across multiple PINs) have a “main” or key PIN | Plain Text |
pin_proration_rate | Tieback proration rate. Prorated properties (whose value is split across multiple PINs) pay taxes on the proportion of value on their PIN. In other words, assessed value is multiplied by proration rate to determine taxable assessed value | Number |
card_proration_rate | Card proration rate. Prorated parcels (whose value is split across multiple cards) pay taxes on the proportion of value on their card. In other words, assessed value is multiplied by proration rate to determine taxable assessed value. Cards are divisions within parcels, such as one of multiple buildings on a single parcel. | Number |
cdu | Condition, Desirability, and Utility code. Not well maintained. | Plain Text |
pin_is_multicard | Multicard PIN. Indicates whether the parcel contains more than one building (ADU, coach house, etc.) | Checkbox |
pin_num_cards | Number of cards on this parcel. Each card is an improvement/building | Number |
pin_is_multiland | Multiland PIN. Indicates whether parcel has more than one landline | Checkbox |
pin_num_landlines | Number of landlines on a parcel. The sum of all landline square footage should be equal to the total square footage of the parcel. Each landline can correspond to a different land price/rate | Number |
year_built | Year built | Number |
building_sqft | Building square feet. Square footage of the building, as measured from the exterior | Number |
land_sqft | Land square feet. Square footage of the land (not just the building) of the property. Note that a single PIN can have multiple landlines, meaning it can be associated with more than one land price/rate | Number |
num_bedrooms | Number of bedrooms | Number |
num_rooms | Rooms. Number of total rooms in the building (excluding baths). Not to be confused with bedrooms | Number |
num_full_baths | Full baths. Defined as having a bath or shower. If this value is missing, the default value is set to 1 | Number |
num_half_baths | Half baths. Defined as bathrooms without a shower or bathtub | Number |
num_fireplaces | Fireplaces. Counted as the number of flues one can see from the outside of the building | Number |
type_of_residence | Type of residence | Plain Text |
construction_quality | Construction quality | Plain Text |
num_apartments | Apartments. Number of apartments for class 211 and 212 properties | Plain Text |
attic_finish | Attic finish | Plain Text |
garage_attached | Garage 1 attached | Plain Text |
garage_area_included | Is Garage 1 physically included within the building area? If yes, the garage area is subtracted from the building square feet calculation by the field agent | Plain Text |
garage_size | Garage 1 size | Plain Text |
garage_ext_wall_material | Garage 1 exterior wall material | Plain Text |
attic_type | Attic type | Plain Text |
basement_type | Basement type | Plain Text |
ext_wall_material | Exterior wall material | Plain Text |
central_heating | Central heating | Plain Text |
repair_condition | Repair condition | Plain Text |
basement_finish | Basement finish | Plain Text |
roof_material | Roof material | Plain Text |
single_v_multi_family | Single vs. multi-family use | Plain Text |
site_desirability | Site desirability | Plain Text |
num_commercial_units | Number of commercial units on the parcel (the vast majority are for properties with class 212) | Plain Text |
renovation | Renovation | Plain Text |
recent_renovation | Renovation in last 3 years | Checkbox |
porch | Porch | Plain Text |
central_air | Central air conditioning | Plain Text |
design_plan | Design plan | Plain Text |
geospatial_universe - Information on latitude/longitude and neighborhood code from tax year 2022 (released on a delay). Only a subset of columns is selected. See: https://datacatalog.cookcountyil.gov/Property-Taxation/Assessor-Parcel-Universe/nj4t-kc8j/about_data
View Table Columns
Column Name | Description | Type |
---|---|---|
pin | Parcel Identification Number (PIN) | Plain Text |
pin10 | Parcel Identification Number (10-digit) | Plain Text |
tax_year | Tax year | Number |
class | Property class | Plain Text |
triad_name | Triad name. Reassessment of property in Cook County is done within a triennial cycle, meaning it occurs every three years. The Cook County Assessor’s Office alternates reassessments between triads: the north and west suburbs, the south and west suburbs and the City of Chicago. | Plain Text |
triad_code | Triad code. Reassessment of property in Cook County is done within a triennial cycle, meaning it occurs every three years. The Cook County Assessor’s Office alternates reassessments between triads: the north and west suburbs, the south and west suburbs and the City of Chicago. | Plain Text |
township_name | Township name | Plain Text |
township_code | Township code | Plain Text |
neighborhood_code | Assessor neighborhood code, first two digits are township, last three are neighborhood | Plain Text |
tax_district_code | Tax district code, as seen on individual property tax bills (Not currently up-to-date) | Plain Text |
zip_code | Property zip code | Plain Text |
longitude | Parcel centroid longitude | Number |
latitude | Parcel centroid latitude | Number |
centroid_x_crs_3435 | Parcel centroid X coordinate (CRS 3435) | Number |
centroid_y_crs_3435 | Parcel centroid Y coordinate (CRS 3435) | Number |
census_block_group_geoid | Census block group GEOID | Plain Text |
census_block_geoid | Census block GEOID | Plain Text |
census_congressional_district_geoid | Census congressional district GEOID | Plain Text |
census_county_subdivision_geoid | Census county subdivision GEOID | Plain Text |
census_place_geoid | Census place GEOID | Plain Text |
census_puma_geoid | Census PUMA GEOID | Plain Text |
census_school_district_elementary_geoid | Census school district (elementary) GEOID | Plain Text |
census_school_district_secondary_geoid | Census school district (secondary) GEOID | Plain Text |
census_school_district_unified_geoid | Census school district (unified) GEOID | Plain Text |
census_state_representative_geoid | Census state representative GEOID | Plain Text |
census_state_senate_geoid | Census state senate GEOID | Plain Text |
census_tract_geoid | Census tract GEOID | Plain Text |
census_zcta_geoid | Census ZCTA GEOID | Plain Text |
census_data_year | Census data year | Number |
census_acs5_congressional_district_geoid | Census ACS5 congressional district GEOID | Plain Text |
census_acs5_county_subdivision_geoid | Census ACS5 county subdivision GEOID | Plain Text |
census_acs5_place_geoid | Census ACS5 place GEOID | Plain Text |
census_acs5_puma_geoid | Census ACS5 PUMA GEOID | Plain Text |
census_acs5_school_district_elementary_geoid | Census ACS5 school district (elementary) GEOID | Plain Text |
census_acs5_school_district_secondary_geoid | Census ACS5 school district (secondary) GEOID | Plain Text |
census_acs5_school_district_unified_geoid | Census ACS5 school district (unified) GEOID | Plain Text |
census_acs5_state_representative_geoid | Census ACS5 state representative GEOID | Plain Text |
census_acs5_state_senate_geoid | Census ACS5 state senate GEOID | Plain Text |
census_acs5_tract_geoid | Census ACS5 tract GEOID | Plain Text |
census_acs5_data_year | Census ACS5 data year | Number |
board_of_review_district_num | Board of Review district number | Plain Text |
board_of_review_district_data_year | Board of Review district data year | Number |
commissioner_district_num | Commissioner district number | Plain Text |
commissioner_district_data_year | Commissioner district data year | Number |
judicial_district_num | Judicial district number | Plain Text |
judicial_district_data_year | Judicial district data year | Number |
ward_num | Ward number | Plain Text |
ward_chicago_data_year | Chicago ward data year | Number |
ward_evanston_data_year | Evanston ward data year | Number |
chicago_community_area_num | Chicago community area number | Plain Text |
chicago_community_area_name | Chicago community area name | Plain Text |
chicago_community_area_data_year | Chicago community area data year | Number |
chicago_industrial_corridor_num | Chicago industrial corridor number | Plain Text |
chicago_industrial_corridor_name | Chicago industrial corridor name | Plain Text |
chicago_industrial_corridor_data_year | Chicago industrial corridor data year | Number |
chicago_police_district_num | Chicago police district number | Plain Text |
chicago_police_district_data_year | Chicago police district data year | Number |
coordinated_care_area_num | Coordinated Care Area number | Plain Text |
coordinated_care_area_data_year | Coordinated Care Area data year | Number |
enterprise_zone_num | Enterprise Zone number | Plain Text |
enterprise_zone_data_year | Enterprise Zone data year | Number |
industrial_growth_zone_num | Industrial Growth Zone number | Plain Text |
industrial_growth_zone_data_year | Industrial Growth Zone data year | Number |
qualified_opportunity_zone_num | Qualified Opportunity Zone number | Plain Text |
qualified_opportunity_zone_data_year | Qualified Opportunity Zone data year | Number |
flood_fema_sfha | FEMA Special Flood Hazard Area (SFHA) indicator | Checkbox |
flood_fema_data_year | FEMA Special Flood Hazard Area (SFHA) data year | Number |
flood_fs_factor | First Street Flood Factor | Number |
flood_fs_risk_direction | First Street flood risk direction | Number |
flood_fs_data_year | First Street data year | Number |
ohare_noise_contour_no_buffer_bool | O’Hare noise contour indicator (no buffer). Indicates whether or not a parcel’s centroid is within O’Hare’s 65 DNL noise contour | Checkbox |
ohare_noise_contour_half_mile_buffer_bool | O’Hare noise contour indicator (1/2 mile buffer). Indicates whether or not a parcel’s centroid is within O’Hare’s 65 DNL noise contour, buffered by 1/2 mile | Checkbox |
ohare_noise_contour_data_year | O’Hare noise contour data year. The “omp” value corresponds to the projected noise contour upon completion of the O’Hare Modernization Project | Number |
airport_noise_dnl | Airport continuous noise surface estimated DNL | Number |
airport_noise_data_year | Airport continuous noise surface estimated data year | Plain Text |
school_elementary_district_geoid | School district (elementary) GEOID, derived from Cook County and City of Chicago shapefiles. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
school_elementary_district_name | School district (elementary) name, derived from Cook County and City of Chicago shapefiles. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
school_secondary_district_geoid | School district (secondary) GEOID, derived from Cook County and City of Chicago shapefiles. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
school_secondary_district_name | School district (secondary) name, derived from Cook County and City of Chicago shapefiles. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
school_unified_district_geoid | School district (unified) GEOID, derived from Cook County and City of Chicago shapefiles. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
school_unified_district_name | School district (unified) name, derived from Cook County and City of Chicago shapefiles. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
school_year | School year | Plain Text |
school_data_year | School data year | Number |
tax_municipality_num | Municipality number | Plain Text |
tax_municipality_name | Municipality name | Plain Text |
tax_school_elementary_district_num | School district (elementary) number, derived from tax district. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
tax_school_elementary_district_name | School district (elementary) name, derived from tax district. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
tax_school_secondary_district_num | School district (secondary) number, derived from tax district. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
tax_school_secondary_district_name | School district (secondary) name, derived from tax district. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
tax_school_unified_district_num | School district (unified) number, derived from tax district. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
tax_school_unified_district_name | School district (unified) name, derived from tax district. Chicago Public Schools are associated with attendance areas where suburban schools are associated with districts. | Plain Text |
tax_community_college_district_num | Community college district number | Plain Text |
tax_community_college_district_name | Community college district name | Plain Text |
tax_fire_protection_district_num | Fire protection district number | Plain Text |
tax_fire_protection_district_name | Fire protection district name | Plain Text |
tax_library_district_num | Library district number | Plain Text |
tax_library_district_name | Library district name | Plain Text |
tax_park_district_num | Park district number | Plain Text |
tax_park_district_name | Park district name | Plain Text |
tax_sanitation_district_num | Sanitation district number | Plain Text |
tax_sanitation_district_name | Sanitation district name | Plain Text |
tax_special_service_area_num | Special Service Area number | Plain Text |
tax_special_service_area_name | Special Service Area name | Plain Text |
tax_tif_district_num | TIF district number | Plain Text |
tax_tif_district_name | TIF district name | Plain Text |
tax_districts_data_year | Data year for municipality, school, community college, fire, library, park, sanitary, special service area, and tax increment financing tax districts. | Number |
cmap_walkability_grid_id | CMAP walkability grid ID. From CMAP’s ON TO 2050 spatial data files | Plain Text |
cmap_walkability_no_transit_score | CMAP walkability score (no transit). From CMAP’s ON TO 2050 spatial data files | Number |
cmap_walkability_total_score | CMAP walkability total score. From CMAP’s ON TO 2050 spatial data files | Number |
cmap_walkability_data_year | CMAP walkability data year | Number |
subdivision_id | Subdivision ID | Plain Text |
subdivision_data_year | Subdivision data year | Number |
sales - Information on sales from 2021 to present (current mid-September 2023) See: https://datacatalog.cookcountyil.gov/Property-Taxation/Assessor-Parcel-Sales/wvhk-k5uv/about_data
View Table Columns
Column Name | Description | Type |
---|---|---|
pin | Parcel Identification Number (PIN) | Plain Text |
year | Year | Number |
township_code | Township code | Plain Text |
neighborhood_code | Assessor neighborhood code, first two digits are township, last three are neighborhood | Plain Text |
class | Property class | Plain Text |
sale_date | Sale date (recorded, not executed) | Date & Time |
is_mydec_date | Indicates whether the sale date has been overwritten with a more precise value from IDOR (Illinois Department of Revenue). In the past the Assessor’s ingest process truncated sale dates to the first of the month. Not all sales can be updated with dates from IDOR. | Checkbox |
sale_price | Sale price | Number |
sale_document_num | Sale document number. Corresponds to Clerk’s document number | Plain Text |
sale_deed_type | Sale deed type | Plain Text |
mydec_deed_type | Deed type from MyDec, more granular than CCAO deed type. | Plain Text |
sale_seller_name | Sale seller name | Plain Text |
is_multisale | Indicates whether a parcel was sold individually or as part of a larger group of PINs | Checkbox |
num_parcels_sale | The number of parcels that were part of the sale | Number |
sale_buyer_name | Sale buyer name | Plain Text |
sale_type | Sale type | Plain Text |
sale_filter_same_sale_within_365 | Remove sale with the same value (for the same PIN) within 365 days | Checkbox |
sale_filter_less_than_10k | Indicator for whether sale is less than $10K FMW | Checkbox |
sale_filter_deed_type | Indicator for quit claim, executor, beneficiary and missing deed types | Checkbox |
Conduct an exploratory data analysis
First let’s load our data and make some joins to sales data
Code
library(tidyverse)
<- DBI::dbConnect(RSQLite::SQLite(), "../data/cook.sqlite")
con
<- tbl(con, 'sales') %>%
sfh_sales collect() %>%
filter(class %in% c(202, 203, 204, 205, 206, 207, 208, 209, 210, 234, 278)) %>%
mutate(sale_date = as_datetime(sale_date)) %>%
mutate(township_code = as.character(township_code)) %>%
distinct(doc_no, .keep_all = TRUE)
<- read_csv("../data/township_codes.csv") %>%
township_codes mutate(across(where(is.numeric), as.character))
<- tbl(con, 'characteristics') %>%
sfh_characteristics collect() %>%
distinct(pin, .keep_all = T) %>%
select(-c(class, township_code)) %>%
right_join(sfh_sales, by = join_by(pin, year == year)) %>%
right_join(township_codes)
<- tbl(con, 'assessments') %>%
sfh_assessments collect() %>%
mutate(township_code = as.character(township_code)) %>%
distinct(pin, .keep_all = T) %>%
select(-c(class, township_code)) %>%
right_join(sfh_sales, by = join_by(pin, tax_year == year))
Maps
The distributions of average sale price for single family homes by township lines up with most people’s preconceptions of Cook County. We can see that average sale prices are highest in townships in northern Cook County in Chicago and the more wealthy northern suburbs. Within these very same townships, we can see that single family homes both tend to be larger and tend to cost more per square foot. Lastly, we can see that the oldest households tend be within the city of Chicago, while the mean age of houses within Cook County seems to lower, the farther you get from the city of Chicago.
Code
library(sf)
library(leaflet)
library(scales)
<- read_sf("https://gis.cookcountyil.gov/traditional/rest/services/politicalBoundary/MapServer/3/query?outFields=*&where=1%3D1&f=geojson") %>%
townships mutate(ORIGOID = as.character(ORIGOID)) %>%
right_join(township_codes, by = join_by(ORIGOID == origoid))
<- sfh_characteristics %>%
sfh_mean_by_township group_by(township_code) %>%
reframe(sale_price_mean = mean(sale_price, na.rm = T),
bldg_sf_mean = mean(char_bldg_sf, na.rm = T),
sale_price_per_sf = mean(sale_price/char_bldg_sf, na.rm = T),
years_old_mean = mean(2023 - char_yrblt, na.rm = T)) %>%
right_join(townships) %>%
st_as_sf()
<- colorQuantile("Greens", sfh_mean_by_township$sale_price_mean, n = 5)
qpal
%>%
sfh_mean_by_township leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
fillColor = ~ qpal(sale_price_mean),
fillOpacity = 0.7,
color = "Black",
weight = 0.5,
opacity = 0.5,
highlightOptions = highlightOptions(
weight = 2,
color = "Black",
fillOpacity = 1,
bringToFront = T),
label = sprintf(
"<strong>%s</strong><br>Average Sale: %s<br/>",
$NAME,
sfh_mean_by_townshiplabel_currency()(sfh_mean_by_township$sale_price_mean)) %>%
lapply(htmltools::HTML),
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "12px",
direction = "auto")) %>%
addLegend(
pal = qpal,
values = sfh_mean_by_township$sale_price_mean,
opacity = 0.7,
title = "Mean Sale Price of </br>Single Family Homes </br>in Cook County by </br>Township",
position = "topright",
na.label = "Insufficient Data",
labFormat = function(type, cuts, p) {
= length(cuts)
n = paste0(round(p * 100), '%')
p = paste0(label_currency()(cuts[-n]), " - ", label_currency()(cuts[-1]))
cuts paste0(
'<span title="', p[-n], " - ", p[-1], '">', cuts,
'</span>'
) })
Code
<- colorQuantile("Oranges", sfh_mean_by_township$bldg_sf_mean, n = 5)
qpal
%>%
sfh_mean_by_township leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
fillColor = ~ qpal(bldg_sf_mean),
fillOpacity = 0.7,
color = "Black",
weight = 0.5,
opacity = 0.5,
highlightOptions = highlightOptions(
weight = 2,
color = "Black",
fillOpacity = 1,
bringToFront = T),
label = sprintf(
"<strong>%s</strong><br>Mean Building sqft: %s<br/>",
$NAME,
sfh_mean_by_townshiplabel_comma(1)(sfh_mean_by_township$bldg_sf_mean)) %>%
lapply(htmltools::HTML),
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "12px",
direction = "auto")) %>%
addLegend(
pal = qpal,
values = sfh_mean_by_township$bldg_sf_mean,
opacity = 0.7,
title = "Mean Building Square Feet</br> of Single Family Homes </br>in Cook County by </br>Township",
position = "topright",
na.label = "Insufficient Data",
labFormat = function(type, cuts, p) {
= length(cuts)
n = paste0(round(p * 100), '%')
p = paste0(label_comma(1)(cuts[-n]), " - ", label_comma(1)(cuts[-1]))
cuts paste0(
'<span title="', p[-n], " - ", p[-1], '">', cuts,
'</span>'
) })
Code
<- colorQuantile("Blues", sfh_mean_by_township$sale_price_per_sf, n = 5)
qpal
%>%
sfh_mean_by_township leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
fillColor = ~ qpal(sale_price_per_sf),
fillOpacity = 0.7,
color = "Black",
weight = 0.5,
opacity = 0.5,
highlightOptions = highlightOptions(
weight = 2,
color = "Black",
fillOpacity = 1,
bringToFront = T),
label = sprintf(
"<strong>%s</strong><br>%s per sqft<br/>",
$NAME,
sfh_mean_by_townshiplabel_currency()(sfh_mean_by_township$sale_price_per_sf)) %>%
lapply(htmltools::HTML),
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "12px",
direction = "auto")) %>%
addLegend(
pal = qpal,
values = sfh_mean_by_township$sale_price_per_sf,
opacity = 0.7,
title = "Dollars per Square Foot of </br>Single Family Homes in </br>Cook County by Township",
position = "topright",
na.label = "Insufficient Data",
labFormat = function(type, cuts, p) {
= length(cuts)
n = paste0(round(p * 100), '%')
p = paste0(label_currency()(cuts[-n]), " - ", label_currency()(cuts[-1]))
cuts paste0(
'<span title="', p[-n], " - ", p[-1], '">', cuts,
'</span>'
) })
Code
<- colorQuantile("Reds", sfh_mean_by_township$years_old_mean, n = 5)
qpal
%>%
sfh_mean_by_township leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(
fillColor = ~ qpal(years_old_mean),
fillOpacity = 0.7,
color = "Black",
weight = 0.5,
opacity = 0.5,
highlightOptions = highlightOptions(
weight = 2,
color = "Black",
fillOpacity = 1,
bringToFront = T),
label = sprintf(
"<strong>%s</strong><br>%s years old<br/>",
$NAME,
sfh_mean_by_townshiplabel_number(1)(sfh_mean_by_township$years_old_mean)) %>%
lapply(htmltools::HTML),
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "3px 8px"),
textsize = "12px",
direction = "auto")) %>%
addLegend(
pal = qpal,
values = sfh_mean_by_township$years_old_mean,
opacity = 0.7,
title = "Mean Years Old of </br>Single Family Homes </br>in Cook County by </br>Township",
position = "topright",
na.label = "Insufficient Data",
labFormat = function(type, cuts, p) {
= length(cuts)
n = paste0(round(p * 100), '%')
p = paste0(label_number(1)(cuts[-n]), " - ", label_number(1)(cuts[-1]))
cuts paste0(
'<span title="', p[-n], " - ", p[-1], '">', cuts,
'</span>'
) })
Sale Price Distributions
It seems that single family homes built in the last 30 - 40 years tend to be more expensive, but beyond that sale prices do not seem to be associated with sale prices. As expected, higher square footage single family homes tend to sell for more, but there is still a large amount variability in sale prices. Larger land square footage single family homes seem to sell for more up until about 15,000 square feet where additional land does not seem to be associated with much greater sale prices. More beds, rooms, and bathrooms seem to all be associated with greater sale prices. However, there seems to be some odd outliers with 0 bedrooms or 2 rooms that are selling for more than would be expected. This may be something we would want to investigate later. These could be simply a result of faulty assessor data.
Code
%>%
sfh_characteristics filter(sale_price <= 10^7) %>%
ggplot(aes(x = char_yrblt, y = sale_price)) +
geom_point(size = 0.1) +
geom_smooth(method = "gam", color = "springgreen4") +
scale_y_continuous(labels = label_currency()) +
labs(title = "Single Family Home Sale Price and Year Built in Cook County, IL",
x = "Year Built",
y = "Sale Price")
Code
%>%
sfh_characteristics filter(sale_price <= 10^7) %>%
ggplot(aes(x = char_bldg_sf, y = sale_price)) +
geom_point(size = 0.1) +
geom_smooth(method = "gam", color = "springgreen4") +
scale_y_continuous(labels = label_currency()) +
labs(title = "Single Family Home Building Square Footage and Sale Price \nin Cook County, IL",
x = "Building Square Footage",
y = "Sale Price")
Code
%>%
sfh_characteristics filter(sale_price <= 10^7) %>%
filter(char_land_sf <= 10^5) %>%
ggplot(aes(x = char_land_sf, y = sale_price)) +
geom_point(size = 0.1) +
geom_smooth(method = "gam", color = "springgreen4") +
scale_y_continuous(labels = label_currency()) +
labs(title = "Single Family Home Building Square Footage and Sale Price \nin Cook County, IL",
x = "Land Square Footage",
y = "Sale Price")
Code
%>%
sfh_characteristics ggplot() +
geom_boxplot(aes(x = char_beds, y = sale_price, fill = factor(char_beds))) +
scale_x_continuous(n.breaks = 9) +
scale_y_continuous(labels = label_currency()) +
labs(title = "Single Family Home Number of Bedrooms and Sale Price \nin Cook County, IL",
x = "Number of Bedrooms",
y = "Sale Price") +
theme(legend.position="none")
Code
%>%
sfh_characteristics ggplot() +
geom_boxplot(aes(x = char_rooms, y = sale_price, fill = factor(char_rooms))) +
scale_x_continuous(n.breaks = 20) +
scale_y_continuous(labels = label_currency()) +
labs(title = "Single Family Home Number of Rooms and Sale Price \nin Cook County, IL",
x = "Number of Rooms",
y = "Sale Price") +
theme(legend.position="none")
Code
%>%
sfh_characteristics ggplot() +
geom_boxplot(aes(x = char_fbath + char_hbath * 0.5, y = sale_price, fill = factor(char_fbath + char_hbath * 0.5))) +
scale_x_continuous(n.breaks = 20) +
scale_y_continuous(labels = label_currency()) +
labs(title = "Single Family Home Total Number of Baths and Sale Price \nin Cook County, IL",
x = "Total Number of Baths",
y = "Sale Price") +
theme(legend.position="none")
Use cmfproperty
to conduct a sales ratio study across the relevant time period
Let’s get our data into a format that the cmfproperty
package can use.
Code
library(cmfproperty)
<- DBI::dbConnect(RSQLite::SQLite(), "../data/cook.sqlite")
con
<- sfh_sales %>%
sfh_sales_cmf select(pin, year, sale_price, doc_no) %>%
distinct(doc_no, .keep_all = TRUE) %>%
select(-doc_no) %>%
filter(as.numeric(sale_price) > 2500)
<- tbl(con, "assessments") %>%
assessments_cmf collect() %>%
select(pin, tax_year, certified_tot)
<- sfh_sales_cmf %>%
sale_assess_cmf left_join(assessments_cmf, by = join_by(pin, year == tax_year)) %>%
rename(PIN = pin, SALE_YEAR = year, SALE_PRICE = sale_price, ASSESSED_VALUE = certified_tot) %>%
mutate(ASSESSED_VALUE = 10 * ASSESSED_VALUE)
<- reformat_data(sale_assess_cmf,
ratios sale_col = "SALE_PRICE",
assessment_col = "ASSESSED_VALUE",
sale_year_col = "SALE_YEAR")
<- calc_iaao_stats(ratios) stats
Overall, we can see that at every sale price decile, single family homes are underassessed compared to their true sale price. However, the degree to which single family homes are underassessed is not equal. Regardless of efforts of the county, it is clear that in 2023 (for the months of Jan-Sep), houses within the lowest deciles tend to be assessed at higher ratios to their true sales price. The overall picture from 2021 to Sep 2023 is more enocouraging, but nevertheless the lowest decile tends to be assessed at higher rates than all other deciles and the 2nd to 5th lowest deciles seem to be asssed at lower rates than the higher deciles.
Code
<- binned_scatter(ratios,
binned min_reporting_yr = 2021,
max_reporting_yr = 2023,
jurisdiction_name = "Cook County, IL")
::asis_output(htmltools::htmlPreserve(binned[[1]]))
knitr
2]] binned[[
In 2023, the most expensive homes (the top decile) were assessed at 76.2% of their value and the least expensive homes (the bottom decile) were assessed at 93.1%. In other words, the least expensive homes were assessed at 1.22 times the rate applied to the most expensive homes. Across our sample from 2021 to 2023, the most expensive homes were assessed at 77.4% of their value and the least expensive homes were assessed at 84.6%, which is 1.09 times the rate applied to the most expensive homes.
Code
<- pct_over_under(ratios,
pct_over min_reporting_yr = 2021,
max_reporting_yr = 2023,
jurisdiction_name = "Cook County, IL")
::asis_output(htmltools::htmlPreserve(pct_over[[1]]))
knitr
2]] pct_over[[
In Cook County, IL, 58% of the lowest value homes are overassessed and 52% of the highest value homes are overassessed.
Code
<- iaao_graphs(stats,
iaao_rslt
ratios,min_reporting_yr = 2021,
max_reporting_yr = 2023,
jurisdiction_name = "Cook County, Illinois")
::asis_output(htmltools::htmlPreserve(iaao_rslt[[1]]))
knitr
2]] iaao_rslt[[
For 2023, the COD in Cook County, Illinois was 19.6 which did not meet the IAAO standard for uniformity.
Code
::asis_output(htmltools::htmlPreserve(iaao_rslt[[3]]))
knitr
4]] iaao_rslt[[
In 2023, the PRD in Cook County, Illinois, was 1.074 which does not meet the IAAO standard for vertical equity.
Code
::asis_output(htmltools::htmlPreserve(iaao_rslt[[5]]))
knitr
6]] iaao_rslt[[
In 2023, the PRB in Cook County, Illinois was 0.004 which indicates that sales ratios increase by 0.4% when home values double. This meets the IAAO standard.
Explore trends and relationships with property sales using simple regressions
Let’s evaluate a couple different linear models (linear models were chosen simply because of the lower computing power necessary to train them). First, we’ll split our data.
Code
library(tidymodels)
tidymodels_prefer()
set.seed(1)
<- initial_split(sfh_characteristics)
split <- training(split)
train_set <- testing(split)
test_set
set.seed(2)
<- bootstraps(train_set) train_resamples
We’ll preprocess our data.
Code
<- recipe(sale_price ~ char_yrblt + char_bldg_sf + char_land_sf + char_beds + char_rooms + char_fbath + char_hbath + township_code, data = sfh_characteristics) %>%
basic_rec step_mutate(years_old = 2023 - char_yrblt, role = "predictor") %>%
remove_role(char_yrblt, old_role = "predictor") %>%
step_string2factor(township_code) %>%
step_naomit(all_predictors()) %>%
step_novel(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors())
<- basic_rec %>%
normalized_rec step_normalize(all_predictors())
Save the specifications of our models.
Code
<- linear_reg() %>%
ols_spec set_engine("lm") %>%
set_mode("regression")
<- linear_reg(penalty = tune(), mixture = 0) %>%
ridge_spec set_engine("glmnet") %>%
set_mode("regression")
<- linear_reg(penalty = tune(), mixture = 1) %>%
lasso_spec set_engine("glmnet") %>%
set_mode("regression")
<- linear_reg(penalty = tune(), mixture = tune()) %>%
enet_spec set_engine("glmnet") %>%
set_mode("regression")
Create our workflows.
Code
<- workflow_set(
all_workflows preproc = list(normalize = normalized_rec),
models = list(
ols = ols_spec,
ridge = ridge_spec,
lasso = lasso_spec,
enet = enet_spec))
Run our models.
Code
<- control_grid(
ctrl_grid save_pred = TRUE,
save_workflow = TRUE)
<- all_workflows %>%
res_grid workflow_map(
resamples = train_resamples,
grid = 20,
control = ctrl_grid,
metrics = metric_set(rmse, rsq, ccc),
verbose = TRUE)
Fit our best model (Elastic Net).
Code
<- res_grid %>%
res_ranks rank_results('rmse') %>%
filter(.metric == 'rmse') %>%
select(wflow_id, model, .config, rmse = mean, rank) %>%
group_by(wflow_id) %>%
slice_min(rank, with_ties = FALSE) %>%
ungroup() %>%
arrange(rank)
<- res_ranks %>%
wflow_id_best slice_min(rank, with_ties = FALSE) %>%
pull(wflow_id)
<- res_grid %>%
wf_best extract_workflow_set_result(wflow_id_best) %>%
select_best(metric = 'rmse')
<- res_grid %>%
fit_best extract_workflow(wflow_id_best) %>%
finalize_workflow(wf_best) %>%
last_fit(split = split)
Let’s look at how well our model performs
Code
<- res_grid %>%
train_fit extract_workflow(wflow_id_best) %>%
finalize_workflow(wf_best) %>%
fit(data = train_set)
<- augment(train_fit, test_set) test_augment
Code
mape(test_augment, truth = sale_price, estimate = .pred) %>% knitr::kable()
.metric | .estimator | .estimate |
---|---|---|
mape | standard | 434.6188 |
Code
rmse(test_augment, truth = sale_price, estimate = .pred) %>% knitr::kable()
.metric | .estimator | .estimate |
---|---|---|
rmse | standard | 264690.7 |
It looks like our model underestimates higher sale price homes. This would make our model regressive, which is concerning.
Code
ggplot(test_augment, aes(x = sale_price, y = sale_price - .pred)) +
geom_point() +
labs(title = "Elastic Net Residuals and Sale Price in Cook County, IL",
x = "Sale Price",
y = "Residuals")
Let’s see what variables have the largest impact on sale_price.
Notice that building square feet seems to have the largest impact on the sale price. This makes implicit sense because larger single family homes tend to sell for more. We can also see that certain townships (ex: New Trier, Lake View, North Chicago) can have large impacts on the sale price of single family homes.
Although our exploratory data analysis indicated that the number of years old a single family home was and how much land it has was associated with changes in sale prices within certain ranges it seems that the model was still able to pick up on some association.
Interestingly, the number of beds that a single family home has was only the fifth strongest non-township predictor within this model. My guess is that this is because there is some multicolinearity between it and building square feet and number of full baths (two of our strongest non-township predictors). Elastic net takes some characteristics from ridge regression which means that it may shrink coefficients for parameters that exibit multicollinearity. In the future, we should take some caution to address concerns that may stem out of the fact that some of our parameters may exhibit multicollinearity.
Code
%>%
fit_best extract_fit_parsnip() %>%
tidy() %>%
arrange(desc(abs(estimate))) %>%
select(-penalty) %>%
::kable() knitr
term | estimate |
---|---|
(Intercept) | 411430.1509 |
char_bldg_sf | 173102.1930 |
township_code_X23 | 122590.9790 |
township_code_X73 | 100464.1254 |
township_code_X74 | 98948.7028 |
char_fbath | 57515.9458 |
township_code_X71 | 44273.8193 |
township_code_X77 | 40171.8266 |
township_code_X25 | 37585.7469 |
township_code_X17 | 37127.8833 |
years_old | -36310.3012 |
township_code_X32 | -29412.3966 |
township_code_X37 | -26423.4514 |
char_land_sf | 25928.7813 |
township_code_X12 | -24590.2254 |
char_beds | -23844.3942 |
township_code_X27 | 23571.0122 |
township_code_X21 | 22396.3105 |
township_code_X22 | 18702.2171 |
township_code_X24 | 18433.0435 |
township_code_X33 | 17375.9446 |
township_code_X13 | -15277.9551 |
char_hbath | 13864.3441 |
township_code_X28 | -13663.1957 |
township_code_X70 | -11151.1800 |
township_code_X18 | -10018.7998 |
township_code_X16 | 9804.5747 |
township_code_X75 | 9622.6161 |
township_code_X26 | 9527.2867 |
township_code_X38 | 8921.9784 |
township_code_X39 | -7649.2568 |
township_code_X76 | 7577.2306 |
township_code_X30 | -7055.2081 |
township_code_X34 | 6767.3835 |
township_code_X72 | -6149.2097 |
township_code_X19 | -5829.4375 |
township_code_X14 | -5619.0157 |
township_code_X31 | 5335.3295 |
township_code_X20 | 5057.9514 |
char_rooms | -3085.1155 |
township_code_X29 | -2818.9065 |
township_code_X11 | 2720.7418 |
township_code_X35 | -464.7437 |
township_code_X36 | -291.2669 |
township_code_X15 | 0.0000 |