Network Science Institute, Northeastern University
NETS 7983 Computational Urban Science
Published
March 31, 2025
In today’s lab, we will build on our progress from Lab 2 (analysis of census data). In that lab, we used dimensionality reduction with Principal Component Analysis to identify the key axes differentiating areas in Boston based on a subset of census variables.
Today, we will re-analyze some of the Boston census data, aiming to quantify the degree of spatial autocorrelation (clustering) in the underlying census features.
We will then conduct a geodemographic analysis, with the aim of identifying spatially contiguous geographic areas with similar demographic features. Geodemographics has a long history in creating meaningful spatial groupings of individuals with similar demographic characteristics and has been used for informing public policy, identifying potential consumer segments, and characterizing urban dynamics.
For our geodemographic clustering, we will compare the results of non-spatial clustering (K-means) with a spatially-aware method (Spatial ’K’luster Analysis by Tree Edge Removal - SKATER [1]).
The census geospatial boundary API is down (as of Jan 4th 2025)! Luckily, we can use pre-processed census data for Boston, and spatial boundaries we have already downloaded and saved.
census <-read_parquet("/data/CUS/labs/5/14460_acs_2021.parquet")cbg <-read_parquet("/data/CUS/labs/5/14460_2021_cbg_geo.parquet")
To do this, we just need to add spatial boundaries to our familiar census data. In 14460_2021_cbg_geo.parquet, geometries are stored as in “Well-Known-Text” format. We can parse this with the sf package.
Next, we need to join our spatial boundaries onto our census data, using the GEOID column. While we are at it, we can remove rows with missing median_income values present for Census Block Groups (CBGs) with low or no population. We can also remove redundant population columns.
census <- census |>drop_na(median_income) |>left_join(cbg, by="GEOID") |>st_as_sf() |>select(-race.total, -employment.total, -geometry_wkt) |>select(-matches("^(industry|travel|trans)"))
Let’s quickly visualize our data to make sure everything looks OK:
census |>leaflet() |>addProviderTiles(provider=providers$CartoDB.Positron) |>addPolygons()