In this paper, we introduce a new recipe dataset MIRecipe
(Multimedia-Instructional Recipe). It has both text and image data
for every cooking step, while the conventional recipe datasets only
contain final dish images, and/or images only for some of the
steps. It consists of 26,725 recipes, which include 239,973 steps in
total. The recognition of ingredients in images associated with
cooking steps poses a new challenge: Since ingredients are processed
during cooking, the appearance of the same ingredient is very
different in the beginning and finishing stages of the cooking.
The general object recognition methods, which assume the constant
appearance of objects, do not perform well for such objects.
To solve the problem, we propose two stage-aware techniques:
stage-wise model learning, which trains a separate model for each
stage, and stage-aware curriculum learning, which starts with the
training data from the beginning stage and proceeds to the later
stages. Our experiment with our dataset shows that our method achieves
higher accuracy than the model trained using all the data without
considering the stages. Our dataset is available at our GitHub
repository.