Categorization of Cooking Actions Based on Textual/Visual Similarity

by Yixing Zhang, Yoko Yamakata, Keishi Tajima


In this paper, we propose a method of automatically categorizing cooking actions appearing in recipe data. We extract verbs from textual descriptions of cooking procedures in recipe data, and vectorize them by using word embedding. These vectors provide a way to compute contextual similarity between verbs. We also extract images associated with each step of the procedures, and vectorize them by using a standard feature extraction method. For each verb, we collect images associated with the steps whose description includes the verb, and calculate the average of their vectors. These vectors provide a way to compute visual similarity between verbs. However, one type of action is sometimes represented by several types of images in recipe data. In such cases, the average of the associated image vectors is not appropriate representation of the action. To mitigate this problem, we propose a yet another way to vectorize verbs. We first cluster all the images in the recipe data into 20 clusters. For each verb, we calculate the ratio of each cluster within the set of images associated with the verb, and create a 20-dimensional vector representing the distribution over the 20 classes. We calculate similarity of verbs by using these three kinds of vector representations. We conducted a preliminary experiment for comparing these three ways, and the result shows that each of them are useful for categorizing cooking actions.

Full Text: pdf

Slides: pdf

Poster: pdf

BibTex entry


recipe data; text understanding; vectorization; word embedding;
Published in Proc. of 5th International Workshop on Multimedia Assisted Dietary Management (MADiMa) (In conj. with ACMMM), pp.42-49, Nice, France, 2019

tajima@i.kyoto-u.ac.jp / Fax: +81(Japan) 75-753-5978 / Office: Research Bldg. #7, room 404