Hirotaka Nagashima from our group gave a presentation at The Web Conference 2024.
The presentation was about identification of descriptions that indicate the “expiration time of information value.” For example, suppose a text includes phrases like those below.
The informational value of such a text must largely decrease after the time described in these descriptions. Therefore, if we can identify such phrases describing “the expiration time of informational value”, it has many applications, such as:
However, phrases including time expressions do not always describe the expiration time of information values. For example, the phrase below does not describe expiration time of information.
Therefore, we need a method for classifying phrases including time expressions into those that describe expiration time of some information and those that do not.
With recent advancements in machine learning and natural language processing, such classification seems possible. However, this requires a dataset labeled with “expressions indicating expiration time” and “expressions that do not.” The goal of our research is to automatically generate such a dataset using data from X (formerly Twitter).
On X, many posts contain phrases including time expressions. Some of them indicate expiration, while others do not. Additionally, on X, we can track how many times each post has been reposted or liked. By monitoring changes in the number of reposts and likes for posts containing time expressions, we can obtain graphs like the ones below. Each of these three graphs represents the changes in repost counts (blue line) and like counts (red line) over time for a single post.
Looking at these graphs:
By manually checking the content of these posts, we could find:
Based on this observation, we developed a method to automatically create a dataset of time expressions indicating expiration time and time expressions not indicating it. We first collect X posts that contain a time expression and have a certain number of reposts. We then label them as follows.
This automated labeling method is the key contribution of our study.
The graphs shown above also include the number of likes (red line), but the middle graph does not show a clear stopping point for likes as it does for reposts. This suggests that while the growth of reposts slows down significantly after the information value expires, the growth of likes do not necessarily follow the same pattern. For this reason, our research focuses only on repost counts and does not use like counts in the analysis.
The conference was held at Resorts World Sentosa Convention Centre in Sentosa island of Singapore.
The photo below was taken from the airplane just before landing at Singapore Changi airport. Strait of Malacca (this area is actually Singapore Strait?) is as crowded as the main gate of Kyoto University during the break between classes in the first week of April.
Sentosa Island is a tourist destination with beaches, Universal Studios Singapore, and other attractions, and can be accessed from mainland Singapore by monorail, on foot, and by cable car. The photo below was taken from the mainland Singapore side, and the building behind the semi-sylinder-like building in the center is the conference hall. The cable car is also visible on the right.
The distance is short enough and it was easy to walk to the island. You can see the monorail to the left.
The monorail station at the island side. The entrance to the island looks like an entrance of some theme park.
Main hall during the lunch break.
Professor Kleinberg from Cornell University, the developer of the HITS algorithm, gave the keynote speech on the second day.
Every year at this conference, one paper presented over a decade ago is selected and honored for its significant impact over time. This year, the award went to Topic-Sensitive PageRank, a paper presented at this conference 22 years ago in 2002 (press release).
Our paper was accepted as a short paper, and was presented as a poster presentation.
Explaining to people who come and go.
The room for the poster session looks like this.
After the last day of the conference, we went to eat Singapore’s famous chili crab.
Singapore is a city with many places to go at night, including Universal Studios Singapore, the Marina Bay light and water show, casinos, and a zoo with a “Night Safari”.
Many interesting items were sold around the hotel in Chinatown.