In this paper, we report our experience in constructing a cooking
recipe text corpus. We describe problems we found and explain how we
managed them. One of the problems we faced in the construction of our
recipe corpus is the difficulty of establishing a clear, stable, and
complete guideline instructing annotators how to annotate. During the
annotation, we found many unexpected cases for which the pre-defined
guideline is not clear enough, and even cases for which the
pre-defined guideline provides no guidance at all. As a result, we
needed to update the guideline twice during the annotation, and also
needed to revise annotations we have done before the updates. During
that process, we have several trade-offs, and it is not easy to decide
when and how often we should revise the annotations. It is even
unclear whether we should revise them or should instead use the human
resource for annotating more data. We show an experiment, whose
result suggests that we should revise the old annotations. Another
problem we had is the management of versions of the guideline, sets of
annotations corresponding to them, and communication between
participants.
recipe data; dataset creation; corpus creation; corpus construction;
data annotation; annotation guideline; annotation support
Published in Proc. of HMData (collocated with IEEE BigData), pp.3564-3567, Seattle, WA, 2018