Annotations and Participatory Sensing

From CrowdsourcingSeminar
Jump to: navigation, search


Paper 1: Utility data annotation with Amazon Mechanical Turk. Alexander Sorokin, David Forsyth. First IEEE Workshop on Internet Vision at CVPR 2008.
Paper 2: Participatory Sensing: A citizen-powered approach to illuminating the patterns that shape our world Jeffrey Goldman et al. 2009.

Discussant's Slides and Materials

Reading Responses

Dave Rolnitzky

Utility data annotation with Amazon Mechanical Turk

The article discusses data annotation, specifically for images, and shows that it can be efficiently outsourced to an online worker community (specifically Mechanical Turk, though they do discuss some other games).

The most interesting thing about this article for me was the cost data associated with image annotation obtained through some experiments. Based on these experiemnents they come up with about $3 per hour. At this rate, this becomes a potentially viable source of at least supplemental income in the developing world (and so there is some extra evidence that image annotation may be viable at some point for our course project). It's certainly a limited experimental data set, but interesting nonetheless.

Participatory Sensing Whitepaper

This foundational research paper introduced the concept of Participatory Sensing, and explored five hypothetical situations in which it might have real-world application, and suggested some future measures that could impact the concept in the future.

I really appreciated the real-world application of Participatory Sensing described in this paper and some of the case-studies. I could envision using a few of these services (perhaps after some wrinkles are ironed out) and could certainly see the business potential with many of these vignettes. I was quite glad that the authors spent time addressing the privacy implications--immediately when I read the first couple pages, I thought of the enormous privacy implications with many of these hypothetical situations. So I was glad to see it addressed later in the paper.

I did feel that some of the applications were over-engineered, e.g. bike sensing for bike commuters. In that example, the authors mention how bikers "...document hazards and impediments along their way by taking photos with a mobile phone or sending text messages..." As a frequent bike commuter, I just don't see this type of task as realistic for the number of commuters that would need to use the system to make it valuable enough for the community. There is somewhat a chicken-and-egg problem here with that type of scenario.

Kristal Curtis

Utility data annotation with Amazon Mechanical Turk: The authors described their exploration of good techniques for gathering image annotations on MTurk. This is a very natural application for MTurk -- I believe it's one of the canonical HIT types. It's interesting how the authors noted that you can save money while maintaining annotation quality by using gold data to provide feedback to the user or by using a separate batch of workers to verify annotations instead of just using several assignments per image and using majority voting. I experienced the gold data + feedback approach when I was working on HITs for CrowdFlower as part of our course homework assignment, and I thought it was a good idea. It was also interesting how the authors explored which types of annotation tasks were easier for people to complete.

Participatory Sensing Whitepaper: The authors of this work presented a vision for combining social networks and multi-sensor-equipped mobile phones to create next-generation software applications for personal and societal use. I really enjoyed the authors' vision. Their apps touched on some themes that have come up quite a bit during our conversations about crowd computing, such as apps that benefit from "many pairs of eyes" and also the idea of using a specialized crowd (eg, people with a special interest in botany). I also really liked their idea about algorithms that can prompt people in the crowd to gather extra data as needed. This is related to the machine learning idea of active learning, which seems very interesting. The authors didn't really touch on how to provide quality control, which is a very important issue given the volume of crowd-generated data they expect to acquire. The apps do explain how to incentivize people to participate, since the users would be gaining either personal benefit (as in the case of the personalized apps) or feel that they are contributing to a social good (as in the case of the diesel traffic tracking app).

Kurtis Heimerl

Utility data annotation with Amazon Mechanical Turk

A very functional paper on how to do image annotation using mechanical turk. No magic, they used the basic methods that everyone else does: statistical, gold, and qualification. I'm somewhat confused as to what the result was in regards to this, but that's probably due to my lack of deep reading more than anything else. This seems like the kind of task that is simple enough(less 1USD/HR) to put into games rather than actually paying people. When it's 1USD, you're effectively not paying people, so let's not even pretend.

Participatory Sensing: A citizen-powered approach to illuminating the patterns that shape our world

What always bothers me about these "sensing world" projects is the ubiquity of their first-world problems. It's always health (mostly jogging/biking), pollution, or traffic. This is not what people need. If it was, it would be sensed already, like traffic. While I understand that cell-phones change this in some way, I have yet to be convinced that data collection is the problem. Now, this sort of sensing could make more sense if you're improving lives in a more direct sense: catching taxis together to save money or finding the cheapest cigarettes in a local area. Traffic is again the example of something that people actually want, and that was done years ago without crowdsourcing.

There are, no doubt, thousands of blogs that are never read. Billions of posts on message boards that are mostly ignored. We could count all of the stuff in this paper with simpler, non-distributed sensors, any day now. It's just not feasible because no one wants it. The problem isn't data collection (which cell phones enable), but data munging and visualization. I know smart people are working on that, so I don't feel so bad. This is the big data we've been talking about, and I don't think it has much to do with crowdsourcing.

Philipp Gutheim

Utility data annotation with Amazon Mechanical Turk

The paper is about image annotation using MTurk. In general, this paper does not differ significantly from previous papers and the described threefold quality inference process is similar to related work. It would have been interested to get some information about the payment en detail. For instance, did they use double entry (or more), how many gold standard tasks did they spread across a set of tasks, did one HIT contain several annotation tasks etc. If one annotation tasks would have cost 1 cent, it would have been interesting to see how the above mentioned decisions impact the price, e.g. (double entry = 2 cents, golden standard 2.2 cents, etc).

Participatory Sensing Whitepaper

This is an interesting yet endless and somewhat questionable paper. The paper explains participatory sensing and illustrates potential scenarios and use cases. While the paper gives several highly valuable use cases that sound quite convincing, an important obstacle arise while reading:

We have seen new technology or ideas spreading across researchers and practitioners, getting hyped and eventually seeing them failing or not turning into reality. An example is context awareness. Research has focused on showing us within hundreds of papers what we can do with context-aware cell phones and what benefits these would deliver. Although technologically theoretically somewhat feasible, it has not really been implemented. Why? I believe an important reason is because we missed to go beyond the "look that's what we can do with it" and actually started to push standards, to market adoption and to build actual systems. In a nutshell: We missed to consider the non-technological environment (social, political, economical) to push context beyond endless use case papers. Reading this article, I feel that this field could run the risk of experiencing similar challenges.

Nicholas Kong

Utility data annotation with Mechanical Turk

This paper describes crowdsourcing image annotation tasks in the context of building datasets for computer vision. In terms of crowdsourcing, I don't think it breaks any new ground. It uses standard techniques for quality control, such as voting and use of known results for comparison. However, it's an interesting proof-of-concept that annotation tasks aren't necessarily too difficult or intensive for retrieving good quality results from MTurk. It's also an immediately obvious application that will produce a necessary resource quickly and cheaply. They found a 30% rate of poor annotations, which seems to corroborate well with the other papers we've read.

There is one specious claim in the caption of Figure 3, where the authors attribute time-to-completion solely to the amount of pay. People completed the human body part annotation tasks quicker than the site-clicking and boundary-tracing tasks. However, I would suspect the difficulty of the task had some influence.

Participatory sensing whitepaper

This paper outlines a vision of participatory sensing, which is essentially the sensemaking process from data collection to visualization, with the twist of using mobile devices as sensors. It's a description of the potential power of distributed data collection and cognition, which it reifies using five scenarios.

One thing I liked about this paper is how it focused on empowering communities who already have a vested interest in an issue to collect data pertaining to that issue. Here, unlike in MTurk, the incentive is personal or community gain. Although the approach might not scale, the results are likely to be of higher quality, and the emphasis on community makes participants more accountable.

An element that seems to be missing from all the systems is the notion of distributed analysis. Users in these systems collect and aggregate data together, but seem to analyze the data separately (e.g., CycleSense and PEIR) or cede analysis to others (e.g., the plant study). I think it would be helpful if participants, who are likely motivated and somewhat expert, could engage in analysis earlier in the sensemaking pipeline.

Beth Trushkowsky

"Utility data annotation" describes an effort to use AMT workers to annotate images. In general, this paper didn't discuss more than we've already discussed in class, namely that AMT makes it easy to get a lot of work done quickly. There were a number of distracting things about the paper, like when the figures/tables don't appear on the same or subsequent page as their discussion. Also, I wasn't sure what to make of their conclusion that a fair price is $3/hour based on some undisclosed number of worker comments that said it should be that way. I have been intrigued about how to determine the "correct" price for a task, but the authors' evaluation could have been more rigorous.

The "Participatory Sensing" article gave example scenarios in which people can collect and/or analyze data using mobile devices. They classify such applications based on whether participants analyze in addition to collect data, and if they leverage the social aspect. While the article was light on technical details, it did make me think again about crowdsourcing beyond AMT. What I wonder is if it should be our goal as crowdsourcing researchers to devise a framework like AMT that developers could use to support the different types of applications described in the article. If so, the framework would have to have built into it stuff like knobs for specifying privacy and data cleaning.

Sally Ahn

The Utility data annotation with Amazon Turk paper details classic use of crowdsourcing for gathering annotated image datasets for computer vision. Many of the ideas presented in this paper, such as collecting multiple annotations, creating separate grading tasks, and building a "gold standard" have become common practice in Mechanical Turk today. Most of the tasks I performed on Mechanical Turk implemented the "gold standard" method in a training mode, and the grading tasks are reminiscent of the verify step in Soylent's find-fix-verify paradigm. Despite these familiar ideas, it was interesting to see an analysis on various annotation protocols because it shows task-related trends. For example, the boundary-tracing task attracted much more low quality annotators than the site marking protocols.

The Participatory Sensing paper presents several scenarios in which members of a community contribute to data collection with the aid of mobile devices. While the scenarios they describe illustrate the potential of participatory data collection, I think they are idealized visions that overlook some challenges that often rise in human-powered systems. One such challenge is commitment: these scenarios all work under the assumption that the members remain committed to their data collection duties. Granted, the nature of participatory sensing suggests that the members have personal interest in the goals of each project, but taking human nature into account, we can't assume that intention always leads to action; even for dedicated members, maintaining consistent participation after the first few weeks or even days may be difficult. The authors argue that current technology is enough to accommodate participatory sensing, but I think they understate the challenge that lies in the human realm (i.e. recruiting members and maintaining their participation) which is just as important as the technology realm in the scenarios they describe.

Manas Mittal

Participatory Sensing Paper: Interesting, broad whitepaper targeted at the general public. I would have liked if they had clearly disambiguated some of the work they've done as opposed to the vision. One thing that stood out to me was that while individuals can serve as the last mile sensors, the data collection and organization mechanisms are still fragmented. I would suggest an app-store approach for enabling data collection and consumption. People with varying interest levels can install applications that enables them to submit sensed data, and also look at the data collected by others. This introduces several new research questions in data management, interface design and privacy controls.

Utility data annotation with Amazon Mechanical Turk: A paper that talks about using mTurk for dataset labeling. I would love to see a discussion about how task design and description would influence the crowd's response, especially since one of the stated motivations for the image labeling was the 'fun' factor. The task reminded me of Doug Lenat's Cyc, and Push Singh's open-mind project which are both databases to build AI systems which require considerable human input. In this case, the task was quite well defined and deterministic. The paper would also benefit by a more quantitative characterization of some of the claims. For example, the $3 per hour figure for ideal worker price is a suggested because "some comments suggested" that number - was that the mode, median, or ?. It would be also interesting to compare such paid work to a voluntary platforms such as mechanical zoo, along the axis of quality of responses, number of contributors and average engagement duration. Also, the formatting did not mention the contributions up-front, and those are not discussed in the text.

Prayag Narula

Utility Data Annotation

Image annotation is one of the most common applications of MTurk. It was good to see a systematic study of such application. I am not an expert in image recognition but would be interested in knowing if these are the most common methods of image annotations. If not, what are the other methods. I would’ve also wanted the authors to change only one variable at a time i.e. the image annotation method or wages per task but clearly that wasn’t how they had designed the experiment. I would be interested in reading a full paper about the work.

Participatory Sensing

I was disappointed by the participatory sensing paper. The only redeeming aspect of this reading was that it was a welcome break from papers about MTurk (it sometimes feel that we are researching MTurk and not crowdsourcing). The reason I didn’t like the paper was that there were no new or innovative proposals in the paper. A lot of it what was presented has been done multiple times. Case in point, Nokia’s experiment of turning Mobile phones into traffic sensors was done in Feb 2008.

On top of that, the author recommended a scale of investment in participatory sensing as was done in developing the Internet technologies. However their case was not very compelling. Hardly any application discussed in the paper seemed that important. So, you help bikers select a better route. Big deal! The only compelling case was for elder care which wasn’t as participatory as the rest of the applications described.

Yaron Singer

Utility data annotation with Amazon Mechanical Turk

This paper seems to be an introduction to Mechanical Turk for the vision community. They authors don't claim to have any general results for crowdsourcing, and report in a rather straightforward manner how to effectively use MTUrk for image annotation. It is quite remarkable that they received 300 annotations an hour when paying $1 per hour.

Participatory Sensing

This is a nice piece on participatory sensing and its different applications. The basic idea is to use data collected from people's phones (e.g. location, images and timestamps) for analysis. I think this piece is interesting, though it does not seem to fall into the crowdsourcing category. It seems like crowdsourcing typically uses the crowd to solve a problem, while the platforms suggested here do not necessarily solve problems, they collect data. It would have been interesting if this piece included a section about incentives. For example, this paper from Nokia Research is completely focused on the incentive issues for these tasks:

Siamak Faridani

I enjoyed both papers. The first one "Participatory Sensing" was an interesting paper in the sense that they did not look at low quality systems like Mechanical Turk and instead they have tried to leverage the theories of behavior change in order to get the participants interested in using their system. Authors have tried to make their system fun and interesting and that would probably encourage more participation that even a paid system. Another interesting factor about designing systems that have intrinsic incentives for participation is that one can get a high quality of responses for free. Such a system can save us time and money that is spent on QA in other paid systems.

The other paper "Utility data annotation with Amazon Mechanical Turk" looks at a use case for mechanical turk, annotating images, and presents a number of methods that authors used in order to annotate large amount of visual data. The work is certainly interesting and is a great contribution. Although one problem that I have with this paper is that authors assume that for each HIT there is an appropriate price and they go on to say that we can predict this price. I think if we look at other works that suggest that the price of the HIT is independent of the quality and it only determines the rate of the HIT submition. If we assume that results are true we can then assume that there is this "Desired completion time" that is easy to determine and each completion time is then can be associated to a HIT price (if we set the price to low we will not finish it on time and if we set a higher reward we will finish sooner but we are over paying) the whole idea of this dichotomy of reward vs. completion time is overlooked in the publication.