Network Science Institute | Northeastern University
NETS 7983 Computational Urban Science
2025-03-31
This week:
Mobility models for urban behavior
Data -> Methods -> Models -> Applications.
Mobility models are used to describe the movement of individuals in a city.
They are essential for urban planning, traffic management, and public transportation planning.
They are used to predict the spread of diseases, the distribution of resources, and the impact of new policies.
They are used to understand the behavior of individuals and their relationship with the structure of cities.
They serve as null models to investigate the impact of different factors on the movement of individuals.
There are two main types of mobility models:
Population-level models are used to describe the behavior of many individuals in a population.
Aggregating over a large number of individuals has several consequences:
At aggregated level, human mobility is affected by:
Population level models typically describe the origin-destination (OD) matrix of trips in a city.
The OD matrix is a square matrix that describes the number of trips or people \(T_{ij}\) between each pair of zones or places.
These matrices are used in many applications, such as (see [1])
There are many sources of OD matrices:
Gravity models are a class of population-level models that describe the number of trips between two zones as a function of the characteristics of the zones and the distance between them.
They are inspired by Newton’s law of gravitation and by the idea that distance impedes movement and opportunities create them. Although they can be derived from the principle of maximum entropy [7], they are typically phenomenological. The simplest form of a gravity model is:
\[ T_{ij} = k \frac{P_i O_j}{f(d_{ij})} \]
where \(T_{ij}\) is the number of trips between zones \(i\) and \(j\), \(P_i\) is the population of the zone \(i\), \(O_j\) is the number of opportunities in the area \(j\), \(d_{ij}\) is the distance between the zones and \(f(d_{ij})\) is a (increasing) function of the distance.
The function \(f(d_{ij})\) is typically a power-law function of the distance:
\[ f(d_{ij}) = d_{ij}^\beta \]
where \(\beta\) is a parameter that describes how the number of trips decreases with distance. Other functions can be used, such as
Also, the distance can be the travel time between the zones, or include any other type of demographic or economic distance between areas \(i\) and \(j\).
Generalization: Gravity models can be generalized to include more complex dependencies with the population and opportunities of the zones. Also assuming that \(O_j \sim P_j\) we get
\[ T_{ij} = k \frac{P_i^{\alpha_i} P_j^{\alpha_j}}{f(d_{ij})} \]
Fitting the gravity model: As we will see in our practical, there are many ways to get the exponents defining the gravity models. Typically we use a log-linear regression to fit the model:
\[ \log T_{ij} \sim \log k + \alpha_i \log P_i + \alpha_j \log P_j - \beta \log d_{ij} \]
Although there are many ways to fit the model, including:
The model obtained might fit correctly individual pairs of flows \(T_{ij}\). However, it might not be able to reproduce the total number of trips originating from an area exactly. To solve this problem, we can use a constrained gravity model to keep \(T_i\) the total number of trips from an area \(i\) to all other areas fixed:
\[ T_{ij} = T_i \frac{P_j / f(d_{ij})}{\sum_k P_k / f(d_{ik})} \]
The Huff model is a particular example of the gravity model that is used to describe the distribution of trips to retail locations. The model calculates the probability that we have a visit to a store \(j\) given that we are in a zone \(i\):
The model is:
\[ P_{ij} = \frac{A_j^\alpha/f(d_{ij})}{\sum_{k} A_k^\alpha/f(d_{ik})} \]
where \(A_i\) is the attractiveness of the zone \(i\), and the sum runs over all stores in the city
Strengths:
Limitations:
In gravity models, we assume a direct relationship between distance and mobility. Mobility happens less to far away places. However, in many cases, mobility is determined not only by distance but also by opportunities between the origin and destination zones.
This is the famous model of intervening opportunities [8], where the probability of going from \(i\) to \(j\) is proportional to the opportunities in \(j\) and inversely proportional to the number of opportunities between them.
For example, think about deciding to go to a grocery store. The one you choose is probably a combination of how good it is for you but also inversely proportional to the number of stores in between that you have not selected.
In some sense, the intervening opportunities idea redefines the distance between places \(i\) and \(j\) as the number of opportunities in between them.
The radiation model [9] is a gravity model that uses the idea of intervening opportunities to explain the flows between areas \(i\) and \(j\)
\[ p_{ij} = \frac{P_i P_j}{(P_i + s_{ij})(P_i + P_j + s_{ij})} \]
where \(s_{ij}\) is the population of the areas within radius \(d_{ij}\) from \(i\) to \(j\).
If we multiply by the total number of flows from \(i\) to all other areas, we get the number of trips from \(i\) to \(j\):
\[ T_{ij} = T_i \frac{P_i P_j}{(P_i + s_{ij})(P_i + P_j + s_{ij})} \]
Note that:
Of course, we can use deep learning methods to predict the flows between areas and the properties of those areas. The recent Deep Gravity Model is an example of that [10]. Apart from population and distances, using land use, road networks, type of POIs, and many other features of areas, the Deep Gravity model, expectedly, shows more performance than simple gravity models:
Architecture of the Deep Gravity model, from [10]
In a recent paper, we used data and Bayesian symbolic regression to find the best model to describe commuting flows in the US [11]. We found still flows can be accurately described by simple closed analytical gravity-like models
Examples of gravity-like models found in [11]
To compare the models, we have to define their error. We typically use
\[ CPC = \frac{2 \sum_{ij} \text{min}(T_{it},\hat T_{ij})}{\sum_{ij} T_{ij} + \sum_{ij} \hat T_{ij}} \]
where \(T_{ij}\) is the observed flow and \(\hat T_{ij}\) is the predicted flow.
For example, in [12], they found that the gravity law performs better than the radiation model in predicting commuting flows.
Comparison between models to describe commuting flows, from [12]
In [10], the Deep Gravity model was better than gravity models at describing the commuting and daily flows between areas in the UK, US, and Italy.
Deep Gravity performance, from [10]
However, using Bayesian symbolic regression [11], we found that simple gravity models are better than deep learning models to describe the commuting flows in the US.
Comparison between models to describe commuting flows, from [11]
Individual-level models describe the movement of individuals in a city and incorporate more detailed information about the behavior of individuals.
They are based on the idea of simulating agents that move around the city according to their preferences and constraints, leading to a degree of stochasticity in the trip patterns
Minimal models thus borrow the concept of random walks or probabilistic methods in the city, where individuals move from one place to another according to probability distributions and are subject to some constraints like home, work, etc.
However, as we will see, several studies have shown that the movement of individuals is not random, but instead follows some patterns that can be described by simple rules and can be exploited to predict the movement of individuals.
The initial studies of large-scale mobility data show precisely that:
The area covered by an individual \(R(t)\) and the number of different places \(S(t)\) as a function of time does not scale like \(R(t) \sim t^{1/2}\) and \(S(t) \sim t\) as in random walks and, but rather like \(R(t) \sim t^{\alpha}\) and \(S(t)\sim t^\mu\), where \(\alpha\) is smaller than \(1/2\) (subdiffusive) and \(\mu < 1\). [13]
Humans tend to return to the same places over and over again, a phenomenon known as returners, something that random walks do not capture. [14]
As opposed to random walks, the trajectories of individuals are highly predictable. Around 80-90% of our movements are predictable [14].
Individuals spend most of their time in few places. Actually probability of visiting a place decays like. \(P(n) \sim 1/n^\xi\) [15], as oppose to the uniform distribution obtained in random walks.
All those phenomena show us that models based on Continuous Time Random Walks (CTRW) [13] cannot reproduce the returning and predictability and submissive character of human movement.
A more realistic model is the Exploration and Preferential Return model [14], where individuals explore new places and return to the places they have visited before.
The model is based on the idea that individuals have a memory of the places they have visited before, and the probability of visiting a place is proportional to the number of times they have visited it before.
The model can reproduce the subdiffusive character of human movement, the returning behavior, and the predictability of human trajectories.
The EPR model has two processes. If \(S_T\) is the number of places visited at time \(T\) and \(f_i\) is the frequency of visits to place \(i\) by the user, at time \(T+1\) the user decides
The model can be solved analytically and the number of places visited by the user at time \(T\) after \(N\) movements is given by:
\[ S_T = [1+ \rho (1+\gamma)N]^{1/(1+\gamma)} \] In the asymptotic regime, we also get the probability that the user visit a place \(n\) times is given by:
\[ P(n) \sim \frac{1}{n^{(2+\gamma)/(1+\gamma)}} \]
The predictability of the EPR model is, of course, inverse proportional to \(\rho\).
As we can see, the model reproduces the subdiffusive character of human movement, the frequency of visitation of places and the predictability of human trajectories with only two parameters \(\rho\) and \(\alpha\).
Different papers have found that real data matches the microscopic details of the EPR model and the macro descriptions of human mobility. For example, in [14], using CDR data, they found that \(P_{new} \sim \rho/S_T^\gamma\) with \(\gamma = 0.21 \pm 0.02\) and that \(\rho\) is distributed as a normal distribution with mean \(\rho \simeq 0.6\)
Comparison between the EPR model and real data, from [14]
The same results were found by Moro et al. [16] using mobile phone data. Remarkably, using a different dataset they found that \(\gamma = 0.23 \pm 0.01\).
Comparison between the EPR model and real data, from [16]
The EPR model has been extended to incorporate more complex human behaviors, like
The social-EPR model from [16]
The TimeGeo framework from [20]
The EPR models are based on places. However, the model does not take into account the fact that places are not isolated. For example, we might decide to go to a mall, shop in a store within it, and have dinner in another place. That is, places have hierarchies. The Container model [21] incorporates that hierarchy in which places are part of a container, and the distance in the hierarchy gives the distance between containers.
The model reproduces the area covered by users, the number of places visited, and the predictability of human trajectories much better. Furthermore, it gives an intuition as to why humans seem to have power-law distributions in their traveled distance.
The container model [21]
The models we have seen are based on simple rules and can reproduce the main features of human mobility. However, they are not able to capture the complexity of human behavior. In particular
They do not include the complex sequential transition regularities. For example, transition from home to office is more likely in workday mornings than weekday mornings.
As we mentioned before, there is often multi-level periodicity (daily, weekly, monthly) in human mobility that simple models do not capture.
Finally, mobility data is often noisy and incomplete, and is difficult to train algorithms to predict the movement of individuals.
For these reasons, in the last years, there has been an increasing use of deep learning models to model and predict human mobility, due to their ability to capture complex patterns in the data. Methods like LSTM, RNN, GAN, and more recently, LLMs, with their attention mechanisms, can easily incorporate the exploration and returning behavior of individuals, the periodicity of human mobility, and tackle the noise in the data.
Some methods:
Those methods have been shown to be able to generate synthetic trajectories that are more similar to real data but lack the interpretability of simple models.
Understanding human mobility in urban areas is critical for understanding urban behavior and designing better cities.
Human mobility is very complex and depends on many factors, such as the area’s population, the area’s opportunities, the distance between the areas, the time of the day, the day of the week, and the season.
But most of our mobility is very repetitive and can be described by simple rules, such as the Exploration and Preferential Return model.
At aggregated levels, gravity models are widely used to describe the origin-destination matrix of trips in a city.
Mobility Models can predict the spread of diseases, the distribution of resources, and the impact of new policies.
However, they could also be null models to investigate the impact of different factors on the movement of individuals.
The aggregated and individual models considered have several limitations.
CUS 2025, ©SUNLab group socialurban.net/CUS