Assignment1-WesleyWillett

From CrowdsourcingSeminar
Jump to: navigation, search

Assignment 1 - Create and Perform HITs on Mechanical Turk

Worker

I completed several product categorization and rating tasks though CrowdFlower as well as some transcription and address verification tasks. I've also gone through a similar exercise when I was first starting out with Turk to try to get a feel for the system.

Aside from the monotony of most of the available tasks and the trivial amount of compensation most of them offer, I think my strongest sentiments all relate to the usability of the Mechanical Turk platform. Given how painful it is to use the thing, and how little it's changed since its debut a few years ago - I can't help but think it's destined to be swept away by competing systems or private clouds in the not-too-distant future. Almost every large-scale task seems to be hosted externally and I get the sense that almost no requesters use Amazon's tools for generating, posting, hosting, or monitoring their HITs. As a result, it seems like a pretty trivial move for these requesters to move to a competing system or to their own private cloud. If all Amazon is offering is a (crappy) listing and a payment system, that doesn't really seem worth the overhead for a high-volume requester.

On a related note, I've always found Turk's user interface to be very cumbersome and I find myself wondering how many Turkers are actually using it to find and carry out tasks. The much-maligned HIT listings are indeed very bad and they make it pretty difficult to find HITs. (Sorting on one dimension at a time or searching by keywords does NOT give a good sense of the space of available tasks.) It's also cumbersome to switch between tasks and impossible to cue multiple tasks for future completion. This makes me thing that there must be alternate front-ends for the system that make it easier on the users. Being able to sort along multiple dimensions, queue multiple HITs for future completion, and see ratings, approval rates, comments and other metadata for HITs and requesters, all seem like natural things that workers would want. Are there 3rd party tools that provide some of this functionality? At the very least, it seems like crowdsourcing shops in places like India would want to provide custom frontends to improve the efficiency with which they push workers through tasks.

Requester

I've actually been posting visual analysis tasks on Turk pretty regularly for the past month or two. I've been trying to find ways of breaking down fairly complex, sometimes esoteric tasks, and I feel as though I'm still working out what the right kinds of prompt/response formats are. Answers for most of the tasks have come fairly quickly (in the course of a few hours), but in a few cases timing has proved variable and tasks have dragged out as long as a day - presumably once they get lost deep in the HIT list. Response quality has also been pretty variable, which seems consistent with the results reported in most of the publications I've read and what I've picked up from talking with other requesters. I get the sense that developing protocols that help rule out bad results is a pretty crucial piece of designing any crowdsourced task, and this seems like a particularly valuable are for future research.

Amazon's requester interface - like the worker one - is pretty atrocious. It clearly wants to be an accessible end-user tool, but the poor documentation and erratic behavior mean that it's not something I can imagine a lot of novices using to build tasks. At the same time it's woefully underpowered for a more expert user, as evidenced by the paucity of high-volume tasks that are actually built using it.

Lastly, I find the Accept/Reject dichotomy for scoring responses really unsatisfying. As a requester, it offers no good way to separate a user's understanding of a task, their effort put into the task, and the correctness and completeness of the answer. This is exacerbated by the heavy weight that is placed on approval rates as a qualification (many HITs require 95% or greater). This means that if someone completes a task almost completely, but say, leaves out one key piece of information, the requester must either 1) accept an incomplete response but retain a potentially valuable worker or 2) reject the response, despite it being almost complete. A mechanism by which requesters could provide more nuanced feedback, clarify tasks, or request corrections seems like it would be very valuable. I also get the feeling that workers have very little recourse in the event that tasks are rejected, while requesters are free to go ahead and use their contributions regardless. This seems like it opens the door to scammers in a big way.