Lecture 4-1
Social Media Data
for CUS

Esteban Moro

Network Science Institute | Northeastern University

NETS 7983 Computational Urban Science

2025-03-31

Welcome!

This week:

Introduction to Urban Data - Social Media Data as a tool for understanding urban systems

Aims

  • Understand the potential of social media data for urban science.
  • Learn about social media data and how to extract insights from them.
  • Understand the challenges of social media data for urban science.
  • Analyze social media data to understand social connectivity in cities.

Social Media Data

Why Social Media Data

Social media data is (was?) a powerful source of information for understanding urban challenges. The main reasons are:

  • Since their introduction in the early 2000s, social media platforms have become prevalent in our societies. Some platforms, like Facebook or Twitter, have billions of users.

Data from https://wearesocial.com/us/blog/2023/01/digital-2023/

Why Social Media Data

  • Users also expend a significant amount of time on these platforms

Why Social Media Data

  • Social Media Data contains many different types of information. A tweet,, for example,, can contain information about the user, the location, the time, the content, the sentiment of the text, the analysis of the pictures, etc.

Why Social Media Data

  • Social Media Data is also a source of real-time information, which makes it very useful for monitoring events in real-time, like natural disasters.

Temporal and spatial distributions of tweets after the Tohoku earthquake. From [1]

Why Social Media Data

  • Until recently, most social media platforms were open, meaning the data was available to researchers to analyze through their API. This has changed in recent years, with many platforms closing their data to researchers.

Hedonometer project, at the University of Vermont

Information in Social Media Data

Social media data is probably the semantically reachest data source available. It contains information about:

  • Geolocation of users, and thus, we can analyze the mobility of users (like LBS data).
  • Interaction and communication between users, and thus, we can analyze users’ social network.
  • The content of the messages allows us to analyze users’ opinions about a specific topic or event or their sentiments.
  • We can analyze the content of the images and videos shared by users.
  • The hashtags used by users, and thus, we can analyze users’ topics of interest.
  • The time of the messages, and thus, we can analyze the temporal patterns of users.

And many other meta-data about the user, their activities, their device, etc.

Uses of Social Media

Social media data is used by different groups of organizations:

  • Companies: to understand the opinion of their customers and products, to analyze the sentiment of their customers, etc. The own platforms use this data to improve their services and to target ads to users.
  • Governments: to monitor events, to analyze the sentiment of the population, to analyze the opinion of the population about their policies, etc.
  • Researchers: to understand urban challenges, to monitor events, to analyze the sentiment of the population, etc.

Consumer confidence index from surveys and Twitter. From https://www.ecb.europa.eu/pub/pdf/scpsps/ecbsp5.en.pdf

Users of Social Media data in Urban Science

Similarly to mobile phone data, social media data has been used in many different urban science studies. Some examples are:

  • Natural disasters: social media, especially Twitter has been used to monitor natural disasters in real-time, detect evacuation patterns, identify most affected areas using the content of the tweets, etc. [2]

  • Urban mobility: social media data has been used to analyze the mobility of users, to analyze the traffic congestion, to analyze the public transportation usage, mobility patterns of residents versus tourists. [3]

  • Urban health, like mapping how food environments influences health outcomes, detecting trends in food consumption, obesity, depression, and other health-related conditions in urban areas [4]

  • Environmental sustainability, like studying visits and usage of parks and green spaces, or the sentiment of the population about environmental issues like climate change. [5]

Users of Social Media data in Urban Science

  • Social equity: studying socio-spatial segregation and inequalities offline and online, understanding evolving urban demographics and community interactions. [6]

  • Economic development: analyze people’s behavior to understand unemployment, identify popular attractions and points of interest, etc. [7]

  • Social networks: analyze the social networks of users, to understand how people connect with each other, how information spreads in cities, etc. [8]

Main Social Media Data Platforms

The main social media platforms are:

  • Facebook: the biggest social media platform, with more than 2 billion users. It contains a lot of information about the users, their friends, their likes, etc.

Facebook data for Good webpage

Main Social Media Data Platforms

  • Twitter (now X): a microblogging platform where users can post short messages.
    • It is (was?) widely used by journalists, politicians, and celebrities, institutions so it has been amply used in political science.
    • Also contains a lot of information about users and events in real time, so has been used in natural disasters, detection and analysis of events, etc.
    • Due to his information-sharing nature, Twitter has been used by many researchers working on information spreading, missinformation and disinformation.
    • Since Twitter has also information about how people follow or mention each other, it has been used to analyze social networks and communities.
    • Twitter has also closed its data to researchers for free in the last years.

Main Social Media Data Platforms

  • Foursquare: a location-based social network (LBSN) where users can check-in in different places.
    • It has been used to analyze the mobility of users, to analyze the popularity of different places, the urban infrastructure, etc.
    • Foursquare has also closed its data to researchers in the last years and closed down part of the platform in 2024.
    • However, they published recently a public dataset of 11million places of interest.
  • Yelp: a platform where users can review different places. It has been used to analyze the economic development of different areas, to analyze the popularity of different places, etc.
    • Although Yelp has an API, it has also closed its data to researchers for free in the last years.
    • They have, however, some open datasets to work with, see Yelp Open Datasets

Main Social Media Data Platforms

  • LinkedIn: a professional social network where users can share their professional experience, connect with other professionals, etc.
    • It has been used to analyze the professional networks of users, to analyze the job market, etc.
    • LinkedIn had a program to access their data called Linked Economic Graph
  • Reddit: a social news aggregation, web content rating, and discussion website. It is widely used by people looking for news and information.
    • Reddit has been used to analyze the sentiment of the population, to analyze the opinion of the population about different topics, etc.
    • Reddit has also closed its data to researchers in the last years due to the use of their data to train LLMs.

Examples of uses of Social Media Data

Urban segregation: Use Twitter physical mobility and conversations to investigate the segregation of different groups in the city [6] The authors found that people are equally segregated online than in the physical space

Movements in the physical space (left) and Twitter mentions (right), from [6]

Examples of uses of Social Media Data

Urban planning: Use check-ins collected from Foursquare to understand the real neighborhoods in our cities [3]

Examples of uses of Social Media Data

Natural disasters: Use tweets to assess the impact of natural disasters in different areas almost in real time [2]

Correlation between Twitter activity and damaged reported during hurricane Sandy. From [2]

Examples of uses of Social Media Data

Public health: Use tweets to detect changes in mood of people during and after visiting a park [4]

Mood changes in Twitter users before and after visiting a park. From [4]

Examples of uses of Social Media Data

Economic development use Yelp reviews to understand the economic development of different areas in the city [7]

Coverage of restaurants reviewed in Yelp by zip code. From [7]

Challenges of Social Media Data

Social media data is secondary data and has the same challenges we have seen in mobile phone data (e.g., bias, drifting) and some others which are specific to social media data:

  • Dirty: Social media data is noisy and plagued by spam, bots, and fake accounts. This makes it challenging to analyze the data and extract insights.

  • Data accessibility: In recent years, most social media platforms have closed their data to researchers, making it difficult for researchers to access it.

  • Algorithms are confounded. Social media rely on algorithms to capture users’ attention, so data overrepresents some activities, messages, etc. For example, popularity rankings bias social connectivity toward celebrities.

Challenges of Social Media Data

  • Privacy: Social media data contains a lot of information about users, making it difficult to analyze the data without infringing on their privacy.
    • For example, just by looking at a user’s Facebook likes, it is possible to learn some highly sensitive personal attributes, such as sexual orientation, ethnicity, religious or political views, etc. [9].
    • Even behavior on social media is predictable using only information about 8/9 of your friends [10].
  • Complexity: social media data requires spatial analysis, network science, NLP, computer vision, and many other complex analytical and visualization tools.

Prediction accuracy of classification for attributes expressed by the AUC. From [9].

Read more

Resources for social media data

References

[1]
L. Burks, M. Miller, and R. Zadeh, “Rapid estimate of ground shaking intensity by combining simple earthquake characteristics with tweets,” in 10th US nat. Conf. Earthquake eng., front. Earthquake eng., anchorage, AK, USA, jul. 21Y25, 2014.
[2]
Y. Kryvasheyeu et al., “Rapid assessment of disaster damage using social media activity,” Science Advances, vol. 2, no. 3, p. e1500779, Mar. 2016, doi: 10.1126/sciadv.1500779.
[3]
J. Cranshaw, R. Schwartz, J. Hong, and N. Sadeh, “The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City,” Proceedings of the International AAAI Conference on Web and Social Media, vol. 6, no. 1, pp. 58–65, 2012, doi: 10.1609/icwsm.v6i1.14278.
[4]
A. J. Schwartz, P. S. Dodds, J. P. M. O’Neil-Dunne, T. H. Ricketts, and C. M. Danforth, “Gauging the happiness benefit of US urban parks through Twitter,” PLOS ONE, vol. 17, no. 3, p. e0261056, Mar. 2022, doi: 10.1371/journal.pone.0261056.
[5]
R. T. Ilieva and T. McPhearson, “Social-media data for urban sustainability,” Nature Sustainability, vol. 1, no. 10, pp. 553–565, Oct. 2018, doi: 10.1038/s41893-018-0153-6.
[6]
X. Dong et al., “Segregated interactions in urban and online space,” EPJ Data Science, vol. 9, no. 1, pp. 1–22, Dec. 2020, doi: 10.1140/epjds/s13688-020-00238-7.
[7]
E. L. Glaeser, H. Kim, and M. Luca, “Nowcasting the local economy: Using yelp data to measure economic activity,” National Bureau of Economic Research, 2017.
[8]
E. Cho, S. A. Myers, and J. Leskovec, “Friendship and mobility: User movement in location-based social networks,” in Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, in KDD ’11. New York, NY, USA: Association for Computing Machinery, Aug. 2011, pp. 1082–1090. doi: 10.1145/2020408.2020579.
[9]
M. Kosinski, D. Stillwell, and T. Graepel, “Private traits and attributes are predictable from digital records of human behavior,” Proceedings of the National Academy of Sciences, vol. 110, no. 15, pp. 5802–5805, Apr. 2013, doi: 10.1073/pnas.1218772110.
[10]
J. P. Bagrow, X. Liu, and L. Mitchell, “Information flow reveals prediction limits in online social activity,” Nature Human Behaviour, vol. 3, no. 2, pp. 122–128, Feb. 2019, doi: 10.1038/s41562-018-0510-5.
[11]
J. Bao, Y. Zheng, D. Wilkie, and M. Mokbel, “Recommendations in location-based social networks: A survey,” GeoInformatica, vol. 19, no. 3, pp. 525–565, Jul. 2015, doi: 10.1007/s10707-014-0220-8.
[12]
P. Martí, L. Serrano-Estrada, and A. Nolasco-Cirugeda, “Social Media data: Challenges, opportunities and limitations in urban studies,” Computers, Environment and Urban Systems, vol. 74, pp. 161–174, Mar. 2019, doi: 10.1016/j.compenvurbsys.2018.11.001.
[13]
M. J. Salganik, Bit by bit: Social research in the digital age. Princeton University Press, 2019.