Spatial inference
Spatial inference is the process of understanding the relationship between spatially distributed variables.
Key concepts:
Spatial autocorrelation: the value of a variable in one location is correlated with the variable’s value in nearby locations.
Spatial heterogeneity: the relationship between variables can change over space or time.
Modifiable areal unit problem (MAUP): the relationship between variables can change depending on the spatial scale at which they are measured.
Our inference models should incorporate those concepts.
Methods for Spatial Inference
Methods:
Exploratory Spatial Data Analysis (ESDA): we can use spatial statistics to understand the spatial distribution of our variables, and to identify spatial patterns.
- We show some of these methods in the last lecture, like Moran’s I, LISA, SKATER, etc.
Spatial regression or machine learning: we can use spatial regression models to understand the relationship between variables and how they change over space.
- We will show some of these methods in this lecture.
Inference through Spatial Regression
Regression models are a powerful tool to understand the relationship between variables. In most papers/research, we see the following equation:
\[ y = \alpha + \beta X + \varepsilon \]
Where \(y\) is the dependent variable (vector), \(X\) are the independent variables (matrix), and \(\varepsilon\) is the error term (vector). Each of the components of \(y\) is \(y_i\) where \(i\) refers to a geographical unit, like a city, a neighborhood, CBG, etc.
- The goal is to estimate the coefficients \(\beta\) that best explains the relationship between \(y_i\) and \(X_i\).
- Methods like OLS, GLM, etc,. are used to estimate these coefficients.
- One of the key assumptions of these models is that the error term \(\varepsilon_i\) is independent and identically distributed (i.i.d).
- In most situations, though, \(y_i\), \(X_i\) and \(\varepsilon_i\) are spatially correlated. This means that the value of \(y_i\) in a location is correlated with the value of \(y_j\) in a nearby location.
Spatial Weight Matrix
How can we incorporate spatial correlation in our regression models? Since we find that the variables in one place are related to nearby places, we can use relationships between neighbors.
To do that, we are going to define a spatial weight matrix \(W\) that defines the relationship between locations.
If a location \(i\) is a neighbor ofthe location \(j\), then $w_{ij} $, otherwise \(w_{ij} = 0\).
Most of the time, we use “contiguity” to define neighbors. For example, here are the Rook, Bishop, and Queen contiguity matrices for a 3x3 grid:
![]()
Contiguity-based spatial matrix
Spatial Weight Matrix
- Another way is to define neighbors based on distance. For example, we can define a weight matrix \(W\) where
\[w_{ij} = 1/d_{ij}\] where \(d_{ij}\) is the distance between the location \(i\) and location \(j\), or use a kernel function to define the relationship between locations.
\[
w_{ij} = f(d_{ij})
\]
- The choice of the weight matrix \(W\) is a model choice. It can have a big impact on the results of our model, and thus, it has to be chosen carefully and based on our knowledge of the data.
Spatial Lag Models
Once we have defined the weight matrix \(W\), we can incorporate it into our regression model. The most common way to do this is to use a spatial lag model (SLM):
\[ y = \alpha + \rho W y + \beta X + \varepsilon \] Where \(y\) is the dependent variable, \(X\) is the independent variable, and \(\varepsilon\) is the error term, and \(\alpha\) is a constant. The term \(\rho W y\) is the spatial lag term, the weighted average of the dependent variable in the neighboring locations.
The parameter \(\rho\) is the spatial autoregressive parameter, which measures the strength of the spatial autocorrelation in the dependent variable.
This model assumes that a spatial process affects the dependent variable. For example, we might think that the income in one area will affect the income in nearby areas (spillover). SLM is a principled way to include those potential spillovers.
Spatial Lag Models
One of the main problems with SLM is that it assumes that the relationship between the dependent and independent variables is the same in all locations. This assumption is known as spatial homogeneity.
Also, SLM assumes that the spatial relationship is linear. This might not be the case in practice, as the relationship between variables can be non-linear. However, this can be addressed by using a non-linear spatial lag model.
More importantly, SLM does not allow us to interpret \(\beta\) as the effect of the independent variable \(X_i\) on the dependent variable \(y_i\) at location \(i\). This is because the spatial lag term \(\rho W y\) is a function of the dependent variable \(y\) itself. Thus, \(X_j\) at neighboring locations will affect \(y_i\) at location \(i\). (see practical)
Spatial Error Models
We can use a spatial error model (SEM) to address this issue. In SEM, we assume that the spatial process is affecting the error term \(\varepsilon\):
\[ y = \alpha + \beta X + u \] \[ u = a + \lambda W u + \varepsilon \] Where \(u\) is the error term, and \(\lambda\) is the spatial autoregressive parameter that measures the strength of the spatial autocorrelation in the error term.
In this model, we assume that there is a spatial process that affects the error term but does not directly affect the dependent variable.
This allows us to interpret \(\beta\) as the effect of the independent variable \(X_i\) on the dependent variable \(y_i\) at location \(i\).
Other Spatial Models
There are many other spatial regression models that we can use to incorporate spatial correlation in our models; see [4] [5]. Some of the most common ones are:
SLX: Spatial lag model with exogenous variables, where the spatial structure is introduced in the dependent variable variables.
SDM: Spatial Durbin Model, where the spatial structure is introduced in the dependent and independent variables.
![]()
Relationship between spatial regression models
Spatial Heterogeneous Models
The models considered so far are homogeneous; that is, they assume that the relationship between variables is the same in all locations. But we know this might not be the case in spatial data. One way to address this issue is to use a spatial heterogeneity model. In these models, we assume that the relationship between variables can change over space. Remember our OLS linear model:
\[y_i = \alpha + \sum_k \beta_k x_{ik} + \varepsilon_i\] There are two ways to incorporate spatial heterogeneity:
- Spatial Fixed Effects models: Allowing geographical variations of \(\alpha\).
\[ y_i = a_{C(i)} + \sum_k \beta_k x_{ik} + \varepsilon_i \] where \(C(i)\) is the geographical unit to which location \(i\) belongs to.
Spatial Fixed Effects
- At the core of spatial fixed effects is the idea that there are geographical effects on the dependent variable that we should consider.
- For example, consider income: it is very different across neighborhoods, counties, and states in the US. Thus, a model explaining it should take into account those differences.
- If you think about it, a spatial fixed effects model is a way to “normalize” income by geographical units. The effect should be the same.
\[ y_i - a_{C(i)} = \alpha + \sum_k \beta_k x_{ik} + \varepsilon_i \]
Geographically Weighted Regression
- Another way to allow for spatial heterogeneity is to allow the coefficients \(\beta_k\) to change over space. This is typically known as a spatial Random Effect model or a Geographically Weighted Regression (GWR).
\[
y_i = \alpha + \sum_k \beta_{k}(lat_i, lon_i) x_{ik} + \varepsilon_i
\]
In this model, we are allowing the coefficients \(\beta_k\) to change over space. This allows us to capture the spatial heterogeneity in the relationship between variables.
The coefficients \(\beta_k\) can are estimated using a local weighted regression using a local search window from one regression point to the next.
![]()
GWR with fixed spatial kernel. Source [6]
Geographically Weighted Regression
Thus, we need to define a Kernel function to define the relationship between locations. The most common kernel functions are Gaussian, Exponential, and Uniform, and they depend on whether the kernel is fixed or adaptive.
- Fixed kernel: the kernel function is the same for all locations. This is the simplest kernel function, but it might not be the best choice if the relationship between variables changes over space.
- Adaptive kernel: the kernel function is different for each location. This allows us to capture the spatial heterogeneity in the relationship between variables. But it is more computationally expensive.
For example, here are the weights for a fixed Gaussian kernel
\[
w_{ij} = \exp\left(-\frac{d_{ij}^2}{2h^2}\right)
\] GWR methods find the best bandwidth \(h\) that minimizes the error of the model.
Geographically Weighted Regression
GWR are computationally expensive, as they require estimating the coefficients \(\beta_k\) for each location. This can be a problem if we have a large number of locations.
GWR are also sensitive to the kernel function and bandwidth choice. The choice of the kernel function and the bandwidth can greatly impact the model results.
GWR is particularly affected by problems like local multicollinearity, edge effects, etc.
GWR is difficult to interpret and compare to other models.
For these reasons, GWR is generally recommended as an exploratory technique rather than a confirmatory one.
Loading required package: spgwr
Loading required package: sp
Loading required package: spData
To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`
NOTE: This package does not constitute approval of GWR
as a method of spatial analysis; see example(gwr)
Software for Spatial Inference
Software:
- In
R
we can use the spatialreg
package to estimate spatial regression models.
- For GWR models, we can use the
spgwr
package.
- In
Python
we can use the pysal
package to estimate spatial regression models and GWR models
We will use those packages in our practical session.
Machine Learning in CUS
Machine learning is a powerful tool in CUS. Again, we need to be aware of the spatial properties of our data when using machine learning models. There are three areas of machine learning which are affected by spatial data:
Feature Engineering: we can use spatial relationships between variables and geography to create new features to help us build better models.
Model Training: since data is spatially correlated, we have to use spatial cross-validation to prevent data leakage and overfitting.
Model Selection: Most models are based on statistical tests that assume the data is independent and identically distributed. However, this is not the case in spatial data, so we have to use spatial tests to select the best model.
Feature engineering in CUS
![]()
From [7]
Model Training in CUS
Cross-validation in CUS
There are many methods to do spatial cross-validation, like:
- CV by geographical area: using neighborhoods, counties, states, or rectangular or hexagonal regions to split the data.
- CV by spatial distance: using the distance between data points to split the data.
- CV by spatial clustering: using spatial clustering to split the data into groups.
The choice of the method depends on the spatial properties of the data and the research question.
![]()
Hyperparametric tuning using spatial CV. From Geocomputation with R
Cross-validation in CUS
In general, using spatial data in training models without considering the spatial properties of the data can lead to over-optimistic and biased results. For example, a massive model using Random Forest was trained and validated to predict aboveground biomass (ABG) in Africa \(R^2 = 0.53\)
![]()
From
However, when spatial cross-validation was used, the \(R^2\) dropped to 0.14. See [8]
Network models in CUS
Example: Community finding algorithms in spatial networks. As we saw in the previous lectures, social networks have a strong spatial component. In particular the weights of the edges in the network can be related to the spatial distance between the nodes.
Community finding algorithms try to find partitions that maximize Modularity, a measure of the quality of the partition. Modularity is defined as:
\[
Q = \frac{1}{2m} \sum_{ij} \left( A_{ij} - \frac{k_i k_j}{2m} \right) \delta(c_i, c_j)
\]
where \(A_{ij}\) is the weight of the edge between nodes \(i\) and \(j\), \(k_i\) is the degree of node \(i\), \(m\) is the total weight of the edges in the network, and \(\delta(c_i, c_j)\) is the Kronecker delta function that is 1 if nodes \(i\) and \(j\) are in the same community, and zero otherwise.
Note that modularity is a function that depends on a Null model: the partition model.
Network models in CUS
Using that definition, most community algorithms find contiguous spatial communities. Here are the communities that Expert et al. [9] found in Belgium using the social network of calls between areas:
Network models in CUS
However, we know that the social relationships between areas are stronger when the areas are closer. Thus, we can use as a Null model a gravity-like model that incorporates the spatial distance between areas.
\[
P_{ij} = N_i N_j f(d_{ij})
\]
and modified the modularity accordingly:
\[
Q = \frac{1}{2m} \sum_{ij} \left( A_{ij} - P_{ij} \right) \delta(c_i, c_j)
\] Using this new definition of modularity, Expert et al. [9] found a different partition of the Belgium areas that was more related to the linguistic separation of the country.