A Cache-based Approach to Dynamic Switching between Different Dataflows in Crowdsourcing

by Yusuke Suzuki, Masaki Matsubara, Keishi Tajima, Toshiyuki Amagasa, Atsuyuki Morishima


At times, a composite dataflow needs rerunning in crowdsourcing for various reasons, even when the dataflow may be half complete. Rerunning the dataflow requires more time and incurs monetary costs for the additional work that would need to be completed by crowd workers. This time and cost may be reduced by reusing complete or intermediate results in the previous run. However, at times, such results cannot be used as is (e.g., when the dataflow has been changed), and some additional tasks need to be completed in the old dataflow in order to make them reusable in the new dataflow. The benefit of reusing these results in the previous run may or may not be worth the cost of these additional tasks. This paper gives a general framework for formulating this problem, and proposed a method to estimate the additional costs. The simulation result shows that it is worth devising optimization techniques to identify feasible (namely, cost-effective) plans.

Slides: pdf

Poster: pdf

BibTex entry


crowdsourcing; workflow; optimization
Published in Proc. of HMData (collocated with IEEE BigData), pp.3552-3554, Seattle, WA, 2018

tajima@i.kyoto-u.ac.jp / Fax: +81(Japan) 75-753-5978 / Office: Research Bldg. #7, room 404