Home Location Leakage via Weather-Related Social Media Posts
by Akitaka Yamashita, Keishi Tajima
Abstract
We analyze the risk of home location leakage via social media posts
about current weather at the user's location. To quantify this risk,
we develop a two-step location estimation method: (1) identifying
posts mentioning current rain or snow at the user's location, and (2)
ranking locations by matching the post timestamps against nationwide
precipitation data. To train a post classifier for Step (1), we
collect posts including the words "rain" or "snow" from users with
known home locations, and automatically label them as follows: if
there was no precipitation at the user's home location at the time of
posting, the post is not about the current weather of the user's home
location; otherwise, it may or may not be about it. Thus, the problem
corresponds to Positive-Unlabeled learning under the Selected At
Random (SAR) assumption with a known labeling mechanism, where the
labeling probability depends on the precipitation rate at the user's
location. For Step (2), to avoid bias towards areas with higher
precipitation rates, we design a probabilistic model of users' posting
behavior and rank locations based on likelihood that the observed set
of posts were generated at each location. Our experiment on X data
demonstrates a non-negligible privacy vulnerability: our method
successfully identified the home locations of 68% of users with 20
posts about precipitation.
Keywords
social network analysis;
user profiling;
geographic information
Published in Proc. of ACM Conference on Web Science, 6 pages, Braunschweig, Germany, 2026