Task and Workflow Design

From CrowdsourcingSeminar
Jump to: navigation, search

Reading Responses

You only need to respond to Greg Little's paper.

Sally Ahn

This paper presents an in-depth comparison of iterative and parallel processes in Mechanical Turk. They report their findings on three different problem tasks: writing (describing an image), brainstorming (suggesting company names), and transcription (deciphering blurred text). Their results indicate that iteration increases the average quality, but highest quality work tends to come from parallel processes.

My first thoughts regarding this paper was how these results can be generalized. As interesting and revealing the results are in the context of the three problems, the fact that these problems had to be carefully selected and designed for this experiment makes me wonder how these results can be applied for designing completely different tasks.

I like the distinction they make on creation and decision tasks, but I think their comparison method may be subject to individual biases of the workers who perform the rating tasks. For example, in Figure 2, the lightning photograph, the iterative result was longer and received higher ratings (Little et. al. notes the strong correlation between longer answers and higher scores in this problem domain), but I noticed some redundant words in the description ("white", "blue"). While the parallel process answer was shorter, there was no redundant description, and it provided details missing in the iterative result ("wispy clouded," "bottom fifth...silhouette"). Furthermore, the parallel result contained a small typing error, which doesn't really impact the quality of the overall description but may nevertheless bias a rating task turker to give that description a lower score. It's good that they take the precaution to separate the task workers from the rate workers, but I wonder if there is a way to frame the rating task better to yield more objective results. For example, we could ask the worker to count the number of unique accurate description-to-noun pairs provided in the description.

The disparity between iterative and parallel processes raises interesting questions about prior workers' influence--how to provide inspirations while minimizing inhibition of creativity. These seem to be two opposing forces present in the iterative process, and I wonder if there has been research in the psychology department from which we can draw more insight into these questions.

Dave Rolnitzky

This paper uses Mechanical Turk and TurkIt to run experiments in order to compare iterative and parallel human computation processes. I found the paper to be quite valuable for a number of reasons. I thought that the experiments were straightforward, and logical, and had an application to a variety of areas, and implication for design. There is a lot of good foundational research here that could be built on for any crowdsourcing platform, or more generally with many decision and creation tasks.

The article discusses the trade-offs between iterative and parallel processes. With creation tasks like brainstorming, there is a tradeoff between increasing average response quality and the probability of the best response. I always assumed that an iterative process was best, but based on this experiment, I may want to rethink that assumption. One specific (though admittedly simple) implication from this experiment for our group project is to try a different approach to naming our project. We've done a couple brainstorming sessions to try and come up with a good name. And we've come up with some satisfactory names. This paper suggests that using a parallel process might be more effective (on average) for producing the highest quality name.

Beth Trushkowsky

"Exploring Iterative and Parallel" describes a disciplined approach to compare the quality of iterative and parallel human computation processing for three tasks: writing, brainstorming, and transcription. It appeared that the point was to show that turkers employed in an iterative manner has advantages over turkers working isolation. It would have been an interesting extension to see if solo turkers would benefit from seeing examples of what good work would be for their particular task, as turkers see what other turkers have done in the iterative technique. However I liked how they changed their analysis of which technique is better, e.g. in the brainstorming task the parallel experiment had the "best" company name while the iterative and the highest average rating. I also liked their analysis of alternatives for not restricting creativity in the transcription task, e.g. showing multiple opinions of an already-transcribed word; in general we should consider how task design impacts quality. As with all the papers detailing experiments on mechanical turk, I wonder if the various nuances are fundamental human nature or just turker laziness. For example, only adding/editing a little to a textual description or only brainstorming additional ideas based on ideas that already exist.

Wesley Willett

The 'Exploring Iterative and Parallel Human Computation Processes paper expands on one of the tracts of research featured in the TurKit paper (also by the same authors) in which they explored how crowd workers perform on content creation and decision tasks - specifically comparing parallel and iterative versions of them. As someone who's been thinking about task decomposition and parallelizability quite a bit lately, I found the authors' model for human computation processes refreshingly clear and simple. Thinking about tasks either as creation or decision operations that can be chained in various iterative or parallel combinations seems like a nice abstraction that generalizes well across a wide range of human computation tasks. There are edge cases between these that get pretty fuzzy (for example, is editing and selectively deleting from a block of text creation or decision?) and there are many more nuanced and domain-specific versions of these operations. However, I think that by steering clear of those details and defining the space at the high level they provide a way of thinking about human computation that isn't so bogged down in the details and provides a simple high-level language for talking about workflows that involve many operations. This may seem obvious or straightforward to others, but I found it illuminating.

As for their specific experiments - they're an interesting sample of a set of tasks one might think about in this way, but I don't feel as though I learned too much from them. The utility of parallelizing or iterating seems very domain-specific and I think we need to either (1) consider the usefulness of both kinds of workflows in very specific applications or (2) consider their application in very discrete subtasks that can be composed. Their examples skew towards the second, but still seem too complex for the results to really tell us something broadly about the usefulness of employing parallel or iterative workers.

Nicholas Kong

This paper presents a framework for thinking about how to subdivide a task, and specifically the task pipelines one might use. The authors make a distinction between parallel and iterative processes. Workers work on tasks separately in parallel processes, whereas they build on previous work in iterative processes. THe authors then experiment with these processes in three different tasks: an image description task, a company name brainstorming task, and a blurry text recognition task.

I liked the authors distinction between creation tasks and decision tasks and the way they used decision tasks to maintain quality throughout their process. I would imagine the use of Turkers to decide quality would constrain the domain of the tasks that are computable with these processes, however. The tasks the authors used had rather straightforward quality metrics, but this might not be true for more complex tasks.

In the brainstorming task, I wasn't completely convinced by the conclusion that a parallel process is more likely to result in the best name due to higher variance. Ratings are almost never normally distributed across the entire range, so I agree with the authors that a different model would be more appropriate.

Manas Mittal

I think this paper provides some models for combining work from Mechanical Turk to improve the quality of the output. I have questions about the authors evaluation. I believe that a more interesting control case would have been to present the same task, twice, to the same turker. It is possible that just spending more time might make the solution better. Another alternative control would be to present 2 people with the same task, and pick (its likely that you'll get better results. No big surprises in the paper except that there really was no surprise : i.e., the benefits of combining are almost insignificant (albeit, "Statistically Significant"). Also, it is hard to generalize with so few samples (30 images, 3-4 tasks).

I think a lot of people found the authors model of task decomposition as "refreshing", and intuitive. I'd think that this model is the first approximation. The value of a model is that it lets us predict the behavior of systems. However, I am not entirely convinced that is true with the current model. However, I do think its an interesting area to think about, and the authors have started a discussion .....

Yaron Singer

This paper gives a methodology for comparing between iterative and parallel crowdsourcing tasks. In an iterative model workers receive tasks that have been processed by previous workers; In the parallel model, each worker works on one segment of the task.

The authors show a nice distinction between two different kinds of tasks which is based on previous research by Malone et al. and Kosorukoff which also make a distinction between workers working dependently and independently, but at a higher level. It seems like these are thoughtful experiments that can further show the difference between cognitive tasks in collective intelligence systems.

Kristal Curtis

This paper explores the impact of task design on result quality in a few different domains. In particular, the authors are interested in whether workers should approach tasks independently or whether they should attempt to build off the work of others. On the one hand, collaboration can lead to synergy. However, it can also lead to groupthink. The blurred word transcription task is a good illustration of each of these phenomena -- a group of workers is able to do better than one alone, yet mistakes early on can derail future workers. It would be interesting to know more about related work in the social sciences, as it seems like the questions here are more about people than computation. I was surprised at how small the difference in quality between the two process types was. It seems that this work doesn't really provide strong conclusions about which task design is better suited for each process. It would definitely be interesting if the authors explore hybrid series/parallel approaches -- this seems like a very natural extension. I did think the idea of optimizing the average vs. the best was interesting, and I'd also like to see more work there. I also liked the idea that with the parallel approach, you may be more likely to get the correct answer in your pool, although it may not receive the most votes; this suggests a need for more sophisticated ways for combining Turker contributions.

Philipp Gutheim

Exploring Iterative and Parallel Human Computation Processes The paper explores how task design, either simultanious or sequential/iterative, impact the quality of responses. The results presented for the deciphering task of the blurred text is very interesting. It is great to see what workers are able to solve it, when requesters fully leverage the visdom of the crowd. In general, the paper presents some interesting results indicating that more complex, creative works (e.g. brainstorming, writing) benefit from iterative solving. However, it seems like that they could have chosen a set of more diverse tasks that would have illustrated this finding at bit more. Also, it is interesting to see the tradeoff between the average (reward the crowd) and the best (reward the best individual response). This rises an important question of what is each persons contribution to a solution.

Prayag Narula

The paper talks but 2 different design approaches for posting tasks on MTurk: Iterative and Parallel. The authors used three different types of tasks: 1. Describing a picture 2. Brainstorming a company name 3. Recognizing a blurry text. The authors found out that description and recognition work best with Iterative process however parallel produces better names. The author attributes it to the fact that for better brainstorming, you should come up with more content.

The authors talk about increasing the variance of the answers. This is very similar to stochastic algorithms used in artificial intelligence in which a factor is introduced to promote variations and to delay convergences to local maximum as compared to global maximum. I am not an expert but I would like to know if such parallels exist.

As an aside the following statement is hilarious "The other turker suggested names that could be considered offensive: “the galloping coed” and “stick a fork in me” ".

Chulki Lee

"Exploring Iterative and Parallel Human Computation Processes" applied iterative and parallel processes to different types of tasks. The authors categorized tasks into creation tasks and decision tasks. Although they gave related works for the classification, I'm curious that how different types of tasks effect how workers work. Maybe pyschology and cognitive science would be related. For example, how providing previous works or more money decrease quality of creation task results?

Kurtis Heimerl

Exploring Iterative and Parallel Human Computation Processes

What's an iteration in a parallel design process? I'm not groking this well. So, the difference between the models is that in each "iteration", the parallel model hands two people an image and then votes on the better one, while in the iterative model we hand one user an image and then hand their result to another user. I think. The wording here is terrible. Why don't they have the head-to-head comparison of the parallel and iterative models for brainstorming? Why is there a systematic pro-iteration bias to this paper? Are they trying to argue that the current state (primarily parallel) is wrong?

Surprisingly interesting paper. What was most surprising was that the two models were generally not that different. Given that understanding the nuances of the differences was difficult for me, it's likely slightly related to the fact that the two models aren't that different. However, the core idea is. That such a fundamental difference struggled to demonstrate basic statistical significance is really strange. Perhaps the workers feel so empowered that the model ceases to matter? As they explained, many users would completely disregard the previous iteration, making the model a lot more like parallel. If this were the issue, you'd have to keep shoving people deeper and deeper into the box, and I think it would remove any of the joy inherent in these tasks. The nice thing about a parallel task is that it is mine, you know?

I really liked this paper.