- 1 Bjoern's Slides
- 2 Extra Materials
- 3 Discussant's Materials
- 4 Reading Responses
- 5 Valkyrie Savage - 9/22/2011 17:33:41
- 6 Hanzhong (Ayden) Ye - 9/25/2011 0:42:55
- 7 Laura Devendorf - 9/25/2011 12:00:47
- 8 Steve Rubin - 9/25/2011 16:20:51
- 9 Yun Jin - 9/25/2011 16:58:06
- 10 Amanda Ren - 9/25/2011 17:00:21
- 11 Alex Chung - 9/25/2011 18:41:41
- 12 Viraj Kulkarni - 9/25/2011 20:23:12
- 13 Derrick Coetzee - 9/25/2011 21:57:29
- 14 Galen Panger - 9/25/2011 22:21:23
- 15 Cheng Lu - 9/25/2011 23:02:00
- 16 Yin-Chia Yeh - 9/25/2011 23:45:00
- 17 Apoorva Sachdev - 9/26/2011 0:41:18
- 18 Hong Wu - 9/26/2011 1:06:35
- 19 Suryaveer Singh Lodha - 9/26/2011 1:40:24
- 20 Allie - 9/26/2011 1:46:21
- 21 Ali Sinan Koksal - 9/26/2011 3:09:39
- 22 Sally Ahn - 9/26/2011 3:32:21
- 23 Donghyuk Jung - 9/26/2011 3:52:41
- 24 Peggy Chi - 9/26/2011 6:46:40
- 25 Jason Toy - 9/26/2011 7:21:10
- 26 Shiry Ginosar - 9/26/2011 8:31:19
- 27 Vinson Chuong - 9/26/2011 8:33:41
- 28 Manas Mittal - 9/26/2011 8:41:49
- 29 Rohan Nagesh - 9/26/2011 8:59:50
Valkyrie Savage - 9/22/2011 17:33:41
Humans have idle cycles which can be exploited to a variety of ends; they can solve problems that are traditionally very hard for computers (e.g. spatial reasoning and NLP tasks). This field has many dimensions and can be applied to a variety of ends.
The “Human computation: a survey and taxonomy of a growing field” paper was well-balanced. They presented a variety of definitions to set the stage, and tried to break down work that has been conducted in this field into discrete categories. They compared and contrasted it with similar fields (data mining and crowdsourcing), and pretty effectively pigeonholed it. That is both the strength and the weakness of this paper: as humans, we naturally want to understand things in the context of other things, but this isn’t always the most effective way of really grokking them. Although the tail end of the paper describes a variety of ways in which to re-combine and re-ponder the categories that they provided, it doesn’t seem to indicate that its structure is in any way flawed. I don’t know... I’m not sure about pigeonholing, myself. Isn’t there much talk about how something set forth formally is much less easily criticized than something presented informally?
The “GWAP” paper was interesting, though! I went and tried out a couple of the games (Verbosity and the ESP game). I don’t know whether they’re reaching the end of their useful, easy, low-hanging fruit cycles, but I found them really irritating. Verbosity, for instance, gave me almost entirely verbs, and all the sentences that one can fill in for them seem much more suited to nouns. My partner, also, seemed to be frustrated, and filled in clues like “rhymes goat” for sentences in which it was not grammatically correct. I’m not, therefore, certain that all the data they are hoping to cleanse out is being effectively cleansed out, or that the data they are collecting is all valid. Teaching a machine that “boat is like rhymes goat” doesn’t seem useful.
In that paper’s favor, though, it does seem to be true that people have cycles that they are willing to devote to this kind of exercise. Protein-folding is almost undeniably a better use of them than “Angry Birds” (though you’ll have to talk to the developers of the latter to confirm that). Adding some purpose to Internet downtime is a noble goal, which will never be solved with lolcats. I desperately want to link to today’s Dinosaur Comics, of which the reader need only appreciate the last panel: http://www.qwantz.com/index.php?comic=2048
The third paper, the “Soylent” paper, I would like to remark on very briefly: I liked that they ate their own dogfood at the end and had their conclusion run through their Shortn service. That was commendable for sure.
Hanzhong (Ayden) Ye - 9/25/2011 0:42:55
Reading response for: Human Computation: A Survey and Taxonomy of a Growing Field. Alexander J. Quinn, Benjamin B. Bederson. CHI 2011. Soylent: A Word Processor with a Crowd Inside. Bernstein, M., Little, G., Miller, R.C., Hartmann, B., Ackerman, M., Karger, D.R., Crowell, D., and Panovich, K. In Proceedings of UIST 2010. Designing games with a purpose. von Ahn, L. and Dabbish, L. Communications of the ACM 51, 8 (Aug. 2008), p. 58-67.
The reading for this week is very interesting in many ways. The topic of crowd sourcing has been attracted me for long, and I gained much new insight from these three interesting papers.
The first paper talks widely about different classifications and terms that emerge in the area of human computation. The paper, however, first differentiates several different ideas such as groupsouring, human computation and social computation. The taxonomy the paper discusses is very beneficial as a guidance for future research work in human computation. Many examples are very typical and represent well their unique attributes. The aspects such as motivation, evaluation process and requesting process are some of the most essential aspects used to define a particular category of groupsourcing.
The paper introducing Soylent is an original initiative to deploy online resources as an alternative to improve writing efficiency. The work is innovative as well as practical enough to be implemented. Testing on three useful features: Shortn, Crowdproof and The Human Macro illustrate different procedures which can be groupsourced. Although the result is not satisfactory enough to be directly put into daily use, we can still look forward to future development of such approaches. In my point of view, I believe word processing is one of the most important and promising areas in which more applications can be developed with groupsourcing features in the future.
The third article, introducing the concept of GWAP, talks about the development of GWAP and existing examples of such games. I have tried to play with those games and found then both interesting and productive. The analysis of the design process is given by specific kind of games, upon which the concept of expected contribution is also established to provide accurate measurement to measure the productivity and entertainment capacity. The article gives me a good introduction of GWAP, and also lend me insight of the power of game and design which could bring unlimited motivation for game players to participate in human computation activities.
-By Ayden (Sep 25th, 2011)
Laura Devendorf - 9/25/2011 12:00:47
Together, the articles on Crowdsourcing provide a taxonomy of the field, a look into applications of game playing to assist computational tasks as well as a method of leveraging crowdsourced labor to improve the quality of editing in word processing. After reading these articles, I feel as thought I have learned a lot of new information about implementations and applications but I am also left with some lingering doubts about the ethics of some of the implementations.
Designing Games with a Purpose discuses a series of game designs and evaluation metrics that allow for the crowd to enhance AI systems. I was struck by a bit of an "icky" feeling in reading this article as it, to me, poses some serious ethical concerns which were not discussed. I would hope that users are well informed that the games they are playing are contributing to a knowledge-base and that the knowledge-base will serve a variety of uses. In my opinion, in order to be ethical the player should be well informed of the possible uses for the information they are providing. If those are not yet known, it should be stated. It also seems a bit like smoke and mirrors to get free labor. For instance, why are users on Mechanical Turk getting paid for similar types of activities, and game players aren't? People contribute to Wikipedia for free but I would argue that they do so willingly because they support the mission of the Wikipedia project. I think the article does present a number of productive examples that will be extremely useful to many researchers. I second their encouragement for better evaluation metrics in the conclusion as I find their measure of "fun" to be a bit skewed. Amount of time spent playing isn't always directly linked to enjoyment, it could be attributed to boredom and how users find out about the game.
Human Computation outlines the current technologies as a way to solidify meanings, assist future researchers in developing human computing applications and identify "holes" in the logic. In the conclusion, the article briefly addresses my previous issues of ethical responsibility and labor relations in saying: "Beyond new algorithms and designs, there is also a pressing need to address issues related to ethics and labor standards. It is possible that as technology obviates the jobs of some unskilled workers, future human computation systems may offer a viable employment option for them." While the quote may have been thrown out to add a sort of "silver lining" to the issue, it also leads me down a slippery slope of thought leading to the question "what is function and future of technology in society." Will human centered computing lead to smarter machines that obviate more "unskilled" workers? While I realize that these discussions may be off-topic, I think it's important to make note of not only the papers contributions to HCI, but also the affects of such technologies on communities beyond HCI.
Soylent managers to dodge my ethical critique by implementing a paid structure where responders are knowledgeable of the task and their role within the task. Good job Soylent. Soylent presents three tools: shortn, crowdproof and the human macro, that use the Find-Fix-Verify pattern to ensure quality. Through the use of crowd-sourced labor, this model presents solutions that have been difficult to solve with artificial intelligence. I believe that this research presents a level of editing that's better than what currently exists within systems like Microsoft Word. The paper discusses many of the limitations of the system such as individual writing styles and domain specific language. I think intention of the writer is also important factor to consider when editing. The writer's ability to choose from multiple results may provide them to best pick the result that matches their intention and intended audience. The current scenario also depends heavily on the success of Amazon's Mechanical Turk. The paper makes assumptions about the tool's future and pay structure to support its findings. I am also wondering who the target audience for Soylent would be? Since it's a paid service, the quality of the work would need to be superior to what you could get for free. In summation, the research is a step in an interesting direction and provides solid ground for future researchers.
Steve Rubin - 9/25/2011 16:20:51
For this class, I read Quinn and Bederson's taxonomy of human computation, and Bernstein et al.'s "Soylent." The area of human computation (and crowdsourcing) are interesting to me for two primary reasons: first, the field is in its infancy, and thus research is wide open, and second, a foundational assumption in the field is that there are many things that humans can do better than computers. This is exciting because computers have been increasingly making human work obsolete.
The taxonomy was enlightening to read as a reference, but as it is a survey, I have no significant comments to make on its methodology. Suffice it to say, Figure 3 is going to be useful when considering further research in the field.
"Soylent" proposes a novel method for harnessing the crowd (via MTurk) to modify documents in Microsoft Word. The paper places special attention, and rightly so, on the new Find-Fix-Verify design pattern. This, I think, is the most important contribution of this paper. The actual techniques for editing documents are interesting and useful, but not without flaws. As the paper mentions, Shortn, Crowdproof, and The Human Macro all take a certain amount of time that may be too long for tight deadlines. The issue of privacy is briefly touched, but it may be a bigger factor than the authors realize. I would not be surprised if, psychologically, people had problems with letting strangers proofread their writing.
Regardless of the actual word processing applications, the Find-Fix-Verify design pattern provides developers with an approach to harnessing the crowd on Mechanical Turk while avoiding the "30% problem." I suspect that this design pattern will see common use in further crowdsourcing research. It would be nice to read more papers that proposed generalized frameworks rather than single-use applications.
Yun Jin - 9/25/2011 16:58:06
Human Computation: A Survey and Taxonomy of a Growing Field In this paper, researchers classified human computation systems to help identify distinctions and similarities among various projects and reveal blind spots in the existing work for new research. And to better understand the position of human computation, the paper also discussed some other related topics with human computation. The classification of human computation has some contributions. First, developing this classification system can be used to stimulate new directions of research in human computation. Second, it is helpful for users to find the significance of human computation as a means of solving computational problems. Third, it helps us to explore the position of human computation with respect to the related topics. Finally, there is also a significant need to address issues related to ethics and labor standards. That is, designers may have the power to orient the systems to encourage positive working arrangements.
Soylent: A word Processor with a Crowd Inside This paper introduces a word processing interface named Soylent which is aimed at enabling writers to call on Mechanical Turk workers to shorten, proofread, and edit parts of their documents. What’s more, the paper also presents the Find-Fix-Verify pattern, evaluate the feasibility of the three components, and conclude with a discussion of privacy issues and inherent limitations of the approach. This new word processing interface is a new interactive user interface in which users have direct access to a crowd of workers for assistance with the tasks instead of begging help from other people around. Thus, Soylent is useful and helpful for users to correct document. More, upon the evaluation of accuracy of correcting of Soylent, it is successful to use Soylent as a effective method to help users deal with their paper. However, there are also some limitations of the new word processor. First, lag times in the current implementation are still on the order of minute to hours, due to worker demographics, worker availability, the relative attractiveness of the tasks. Second, Soylent requires all workers would be paid even if what they did were helpless to the final work product. So, it incurs significant cost. Third, Soylent exposes authors’ document to the third party workers, which leads to an issue of lack of privacy for the authors. Soylent also raises questions over legal ownership of the resulting text. Finally, the anonymous worker may not have necessary knowledge for useful contribute. To improve or eliminate these limitations above, some future work could be done. First are new crowd driven features for word processing. Second are new techniques for optimizing crowd-programmed algorithms to reduce time and cost. Finally, the researchers should focus on the integrating on-demand crowd work into other authoring interfaces.
Amanda Ren - 9/25/2011 17:00:21
The Quinn paper discusses the classification of human computation and compares it to similar topics.
Human computation is defined as using human processing power to solve problems that computers cannot yet solve. The paper compares it to three synonymous terms: crowdsourcing, social computing, and collective intelligence. Human computation systems are based on a classification system that includes six dimensions: motivation, quality control, aggregation, human skill, process order, and task request cardinality. For these systems, we need to also consider the quality of the work of the participants or consider aggregation of results. This paper is important because the classification system they introduced helps drive new directions for research in human computation.
The Soylent paper talks about using crowdsourcing and integrating contributions into a word processing interface.
The Soylent interface consists of three components: text shortening, spelling and grammar checker, and general word processing tasks. This paper is important because it evaluates the feasibility of crowdsourcing editing and integrating it into a user interface. I thought it was interesting that the paper mentioned the "lazy turker" and the "eager beaver" - both are harmful to the reliability of the work because their goal is to signal they have completed the work. To solve this problem, the Soylent interface will split the find and fix tasks. I'm a little skeptical that users will be willing to pay for all the workers, even those whose work is not used in the final product.
The Ahn paper talks about the development and evaluation of a class of games known as games with a purpose.
This paper is important because it talks about one way where we can use humans to solve tasks hard for computers. Researchers are aware of the benefits of adding enjoyment in user interfaces. GWAPs are motivated by three things: much of the population has access to the Internet, there are tasks easy for humans but not computers, and people spend a lot of time playing games. It's interesting how besides these three things, games can also provide a challenge (by keeping score) to increase enjoyment. This paper is relevant to today's technology because we can see the popularity of games grow even more through mobile games and social network games.
Alex Chung - 9/25/2011 18:41:41
Is gaming a waste of time?
The article “Designing Games With A Purpose” by Luis Von Ahn and Laura Dabbish don’t think so. Instead they believe game play can be constructed to channel human brainpower into good use.
Players spend numerous hours on games because they wish to be entertained. While Games with a purpose (GWAPs) must provide entertainment to capture gamers’ engagement, game designers also have to define rules to associate game interaction and the work to be accomplished.
To solve the HIV protein folding mystery, Dr. David Baker turned to the biology puzzle game Foldit because human has much better spatial reasoning skills than today’s computers. While the scientists have been toiling to resolve the physical structure of the retroviral proteases for years, the winning team found the solution in 10 days during three weeks of play with hundreds of teams generating over a million structure predictions. In this example, games provide a framework to bring humans and computers together to solve problems that were not possible before.
Beside of promoting game play with social interaction, Ahn and Dabbish have also listed other key aspects of successful game: 1) timed response; 2) score keeping; 3) player skill level; 4) highscore lists; 5) randomness. In Foldit case, the game motivated players with competitions and social interactions between teams. The game rules are based on physics and scores are given based on how close the player’s built model adhered to those rules. Thus players are motivated to assemble the best possible model in order to get the most points against a large number of players. At the end, the winning teams have received higher ranking from their achievement. More importantly, the teams have also received co-authorship on the protease structure paper that was recently published in Nature Structural & Molecular Biology.
Crowdsourcing is beginning to gain the attention of academics as well as the industry. Yet most of today’s GWAPs focused on problems that are easily divided into subtasks. The challenge of future development will be interpreting the goal into a set of well-specified and challenging game plays that will motivate players to engage with a higher level of effort and performance. However, they must also entice the gamers with a level of enjoyment to keep them coming back for more.
Viraj Kulkarni - 9/25/2011 20:23:12
Human Computation: A Survey and Taxonomy of a Growing Field' is a brief classification of different systems of human computation. 'Soylent' describes the word processor that uses crowdsourcing to perform tasks such as shortening paragraphs, proofreading and doing other small tasks with the help of macros. 'Designing games with a purpose' is about engaging users in playing games which are designed with the intention of completing human intelligence tasks.
Human Computation: A Survey and Taxonomy of a Growing Field' summarizes the field of human computation. The interesting thing about this reading, as opposed to most of the other readings assigned in class, is the breadth of it. It's meant to be a concise collection of everything that is happening in the world in this field. It offers a classification system based on six dimensions which I found to be pretty comprehensive. The paper aims to provide a broad overview of Human Computation with the intention of stimulating research in this field. The secondary aim of the paper is to provide a standard nomenclature and vocabulary in order to avoid confusion amongst researchers.
'Soylent' introduces a system that harnesses the power of crowds to perform certain editing tasks. The paper elaborates the three kinds of tasks it supports and then provides the details of how these tasks are crowdsourced. Along with this, it also presents the Find-Fix-Verify pattern which improves the quality of the final results by filtering out incorrect/unnecessary edits made by turkers. The paper lists examples and it does seem that Soylent does a pretty good job editing documents.
The third paper is about developing games to complete human intelligence tasks or solve problems which are difficult/impossible for a computer but are trivial for humans. It presents a set of guidelines that may be followed in developing games with a purpose (GWAPs). The paper also presents three game templates that can be used for developing such games. The attractive thing about most of these games is that they are very easy to play and don't require a lot of time. The games are such that they can be easily played through a browser or even a cellphone for that matter. With the kind of changes happening in today's world, people have very small attention spans. The fact that these games would be easily accessible and can be played in short breaks helps this case.
Derrick Coetzee - 9/25/2011 21:57:29
This week's papers focused on human computation systems, in which large numbers of unskilled people are leveraged to solve problems cheaply that computers cannot solve automatically.
The first work, by Quinn and Bederson, sought to define human computation systems and offer a taxonomy of attributes to describe them. Although very effective in laying out the large body of existing work along many interesting dimensions, the scope of the paper seemed needlessly narrow: tasks that involve "free will" or subjective decisions, described therein as "not computation", can still be effectively exploited to solve computational problems - it seems difficult to effectively distinguish the free will involved in, say, contributing to Wikipedia, and the free will involved in selecting what Turk task to complete, or what aspect of an image to describe. Moreover, it is a pervasive myth that social networking performs no computation, yet clearly a computer system designed to do what Facebook users routinely do is beyond the reach of modern AI. For example, Facebook effectively exploits their customers to tag images of people in photos in lieu of face recognition software. The discussion of tasks using search aggregation also neglected that negative information could effectively narrow the search, perhaps enough to enable automated approaches.
The paper on Soylent focused in on a narrow solution to a narrow problem: editing documents with the aid of Amazon Turk workers. The paper covered significant new ground in three ways: it integrated the interactive Turk functions directly into the word processor interface, it empirically demonstrated latency of less than an hour at reasonable cost, and it described a successful way of breaking down the problems into multiple stages with the Find-Fix-Verify pattern to minimize errors. Although this was just one method in Quinn and Bederson's hierarchy of reducing error, it seems especially well-suited to complex tasks and may in principle enable allow human computation systems to scale to arbitrarily complex problems.
Some of the limitations of Soylent: restricted to using a single system, they could not experiment with user groups with particular skill sets appropriate to the document at hand, nor experiment with how size of the population influences waiting time. Soylent had a particular focus on minimizing latency with acceptable quality and price, but in some situations the author has time available (e.g. going to bed, back in 8 hours) where longer latency is acceptable. In general, I'd like to see a clearer empirical analysis of the quality/price/latency tradeoff. I'm also not fully confident that the legal issues around copyright transfer are adequately resolved.
Finally, the third paper summarized Games With a Purpose, human computation using enjoyment as a vehicle. GWAPs have been successful at exploiting latent human resources at very low cost. I tried out one of the GWAPs myself and they were successful at exploiting good game design to make the game entertaining. However, as noted by Quinn and Bederson, designing a good game can be far harder than just designing a human computation system, and some tasks do not seem to easily lend themselves to this. Moreover, it's misleading to suggest players are motivated only be enjoyment: the GWAP website specifically highlights the benefits to others, invoking altrusistic motivations, and has a high score board that incorporates aspects of reputation motivation as well. The same type of mixed motivations are common in other crowdsourcing projects like Wikipedia.
Galen Panger - 9/25/2011 22:21:23
I appreciated the taxonomy, because it’s helpful to know how these varying concepts are coming to be defined. Briefly mentioned were ethical issues, and you do have to stop and think about what it means when we substitute humans for a role that would traditionally be filled by a computer—i.e. the very definition of human computing. We treat computers like slaves, but should we transfer our attitudes about functional division and the efficiency of algorithms over to human labor? I know there is an important ongoing debate about this, but find it interesting that the taxonomy piece did not spend even a paragraph discussing ethical approaches to each of the human computation, crowdsourcing, social computing, etc. fields.
GWAP presents somewhat of an interesting contrast to the concept of human computation, because over and over the taxonomy piece discusses the inclusion of humans into computer tasks as a sort of lament—it’s for problems “which cannot be solved by computers,” “that computers cannot yet solve,” and for which “no known efficient computer algorithms can yet solve.” One could interpret this “humans-as-a-last-resort” attitude as either a lament about the limits of computational power, or a lament about the ethical dubiousness of using humans as stand-ins for computers (more likely it’s the former). GWAP, however, dances past both the potential laments about computational power and about labor ethics by being fun—something intrinsically valuable to human subjects even though it is also useful to computers.
Speaking of intrinsic benefits—my first thought going into the GWAP article was that there was a contradiction between our ethical obligation to alert users that their “fun” was part of a larger work task, and users’ internal/intrinsic motivations that could be crushed by tying the task to an extrinsic motivation. Everyone knows that extrinsic motivations crush intrinsic motivations (to state it in black and white terms), and so my thought was that the game would cease to be fun and would acquire a drudging-through quality to it by virtue of being attached to and “for” a larger work purpose. However, this concern was immediately put to bed when I logged onto gwap.com and played around. I had a lot of fun. And I completely forgot that it was for a “good cause.” So... I was wrong. Good to know.
Finally, I want to say a quick word about Quinn and Bederson’s definition of collective intelligence. I don’t think it belongs as a supercategory. The idea of a “collective” intelligence implies a sense of relatedness that is most certainly not present in our patterns of linking for data mining by PageRank (or, at least, it wasn't before we all knew about PageRank). In addition to providing that feeling of connection to something bigger than ourselves (which is the “relatedness” I’d say more about if I had the time and space), I think collective intelligence is also broader than focusing on the production of goal-oriented results. I think collective intelligence is also a representation of how we all think/are thinking/are feeling at any point in time. Collective intelligence is about understanding subjective experience as well as producing objective results. IMHO.
Cheng Lu - 9/25/2011 23:02:00
The first survey paper of Human Computation presents a classification system for human computation systems that highlights the distinctions and similarities among various projects. The goal is to reveal the structure of the design space, thus helping new researchers understand the landscape and discover unexplored or underexplored areas of opportunity. The rapid growth of human computation within research and industry has produced many novel ideas aimed at organizing web users to do great things. Since human computation is often confused with “crowdsourcing” and other terms, the paper explores the position of human computation with respect to these related topics.
Many tasks are trivial for humans but continue to challenge even the most sophisticated computer programs. Traditional computational approaches to solving such problems focus on improving artificial intelligence algorithms. The second paper, “Designing Games With A Purpose”, advocates a different approach: the constructive channeling of human brainpower through computer games. Toward this goal, the paper presents general design principles for the development and evaluation of a class of games called “games with a purpose”, or GWAPs, in which people, as a side effect of playing, perform tasks computers are unable to perform. The GWAP approach represents a promising opportunity for everyone to contribute to the progress of AI. By leveraging the human time spent playing games online, GWAP game developers are able to capture large sets of training data that express uniquely human perceptual capabilities. This data can contribute to the goal of developing computer programs and automated systems with advanced perceptual or intelligence skills.
Yin-Chia Yeh - 9/25/2011 23:45:00
I read the human computation survey paper and the Soylent paper. The human computation paper aims at two points. First, unify the terminologies by reviewing literatures of recent researches. It clarifies the difference between human computing, crowdsourcing, and social computing. (Data mining is also mentioned but is not the major focus.) These three terms generally belongs to the area of collective intelligence. Second, propose six dimensions to classify human computation works. They are motivation, quality control, aggregation, human skill, process order and task request cardinality. The human computation paper is good to read, because since last week’s CSCW reading I felt a little bit confused about the categories of groupware. The three categories described in this paper give a clear definition and include most applications I can think of. The second part of the paper (classification of human computation works) gives a lot of examples to illustrate each dimension. I think exemplar based description is much easier to convey than the direct manipulation paper we’ve read earlier. The Soylent paper proposes a crowdsourcing word processing application embedded in Microsoft Office. Three features proposed are Shortn, Crowdproof, and The Human Macro. The major contribution is the Find-Fix-Verify pattern that improves the reliability and performance of crowd workers. I think this pattern could be applied to many other crowdsourcing applications. There are also some other interesting findings in this paper, such as “paying less does not really lower the performance though it slows the task down” and “workers are usually better at verification than they are at authoring.” One thing I wish to know more is how to improve Lazy Turker’s performance in find stage. I’ve known that by forcing worker to fix a certain patch the overall performance of editing can be improved. However, it is unclear to me how to prevent workers all identify similar patches to be fixed. I am also curious about why in the Shortn application the authors pay more money to find task than fix task.
Apoorva Sachdev - 9/26/2011 0:41:18
Reading Response: This week’s reading was about human computation, crowdsourcing and Games with a purpose (GWAP) which describes how games are being used to harness computational power from the crowd.
Human Computation: A survey and Taxonomy of growing field tries to define what exactly human computation is and how it differs from other categories like crowdsourcing and social computing. They provide various comparisons between the related ideas and then define the dimensions on which these are judged. I don’t completely agree with some of their definitions, for instance when they describe data mining they seem to suggest that the if it is just extraction of patterns from data, it doesn’t count as human computation. One example, where I think definition fails is in the case of the spam filter used by Gmail. Gmail relies on people marking certain mail as “spam” to actually identify spam mail and help in improving the machine learning process of sorting mail (improving the classifier by studying their input). This is an example of data mining which is able to harness people’s ability to differentiate good mail from spam and use the results to improve the AI spam filter for everyone, so it should be classified as Human Computation. Most of the example presented in the paper were very interesting and by providing the various dimensions clearly, the authors encourage other people to mix and match more categories to create new human computational interfaces.
On the other hand, Soylent describes a word processing program with a crowd inside. The authors describe the situation in which crowd would come handy (namely proofreading, finding facts and shortening passages) and analyze the problems associated with Mechanical Turk and how it could be improved by implementing a Find-Fix-Verify Method. My main concern with some of the program features are the stylistic changes that the crowd would bring in. Every person has their own style of writing and splitting one task into smaller chunks and having multiple people edit it may not necessarily leave it coherent. As the saying goes “too many cooks spoil the broth”, the advantages of having a huge crowd to work for you might not outweigh the disadvantages associated with it. Also, if the person needs to spend time verifying and reediting work done by Turkers, it might be easier to just do it ourselves in the first place. Additionally, the current inaccuracy rates of the Turkers are fairly high (30%), so I feel we could introduce more game mechanics into the system to encourage greater accuracy and faster response. For instance, one could have tasks whose price reduces as time goes by, so people are encouraged to finish the task as soon as it is posted than later. Also, one could have an active feedback/rating system to make sure that despite the anonymity, people still feel responsible for the work they do and strive to get better ratings/reviews. This could be done by creating an easy way for requester to review the Turkers’ work and rate them and Turkers being able to use those ratings to unlock more tasks.
I actually tried playing a few GWAP online and really enjoyed them. The motivation offered by games to get correct answers is much greater than in Mechanical Turk so we could definitely take some elements of game mechanics and try to incorporate them into other crowdsourcing interfaces. Additionally, since GWAPs have been so successful, more research can be done to design more projects that take advantage of this incredible resource.
Hong Wu - 9/26/2011 1:06:35
The three papers are all talking about Human Computation.
“Human Computation, A Survey and Taxonomy of a Growing Field” gives the definitions of human computation and describes the difference from cloud computing, social computing and data mining. “Designing Games with a Purpose” shows that we may design games by using human computation power. We can let people play games and meanwhile solve the problems difficult to computers. “Soylent A Word Processor with a Crowd Inside” illustrates how human computation can help us improve and solve the problem in word processor.
Amazon’s Mechanical Turk is the commercial example of human computation and cloud computing. Publisher can spend a little money to get the human help, including getting ground truth or correct the results produced by computer.
One of the problems is that AI is not powerful enough to catch up human knowledge. The human computation is aiming to very simple questions. The second problem is that asking the help from the public is somehow to release the idea or patent, which is not allowed or wanted at lots of conditions.
Suryaveer Singh Lodha - 9/26/2011 1:40:24
Designing games with a purpose: The author talks about utilizing the time and energy people put in playing games towards solving computational problems and training AI algorithms. The key property of “games” is that people want to play them and feel entertained while playing games. The author talks about three game-structure templates that generalize successful instances of human computation games: output-agreement games, inversion-problem games and input-agreement games. One of the problems I find with inversion problem game is that how do we only keep the relevant/helpful/correct hints given by describer? The author does talk about allowing the describer to rate a guess “hot” or “cold”, it would be interesting to give the same control to guesser. These templates work best in scenarios when the game can be won when all its players think alike. Tasks which require diverse viewpoints and creativity on player (user) end may not work well with such templates.
Human Computation – A survey and taxonomy of a growing field Human computation is a paradigm for utilizing human processing power to solve problems that computer cannot yet solve. The paper discusses in detail the various ways we can classify currently available human computation systems. The paper helped me understand the various nuances of human computation systems and gave a good overview and understanding of the current systems.
Soylent: A word processor with a crowd inside – It seems like a really great idea to me personally. The approach to solve really tough AI problems through cheap human labor to get better results is definitely useful. The only issue with the approach, as the paper aptly mentions is its very high dependence on response time of crowd, which ultimately defines its efficiency. One of the interesting learning for me was to learn about Lazy turker and eager beaver behavior on cheap labor market platforms. I also found the Find-fix-verify design pattern particularly interesting in the way it defines the solution pattern to this problem. Shortn, crowdproof and the human macro seem good start points, but these can definitely be developed more so as to give the user more flexibility and to delve into harder tasks such as designing page layout or appropriately positioning graphs/charts/images in a document (may/ may not be research paper). Also one more interesting topic the paper touches is the importance of clearly defining legal rights to documents edited via such a platform.
Allie - 9/26/2011 1:46:21
Von Ahn and Dabbish discussed the implications of GWAPs (games with a purpose) in detail in their paper. There are 3 types of games where players in distributed locations can solve computational problems. The assumption is that large numbers of networked players can accomplish tasks that would otherwise be consuming or impossible for a small group. Open Mind Initiative and interactive machine learning are also possible via collective participation. The HCI community that user interfaces are best if designed to be enjoyable to the user. This increases user motivation of the activity at hand. A measure of GWAP’s success is whether enough human-hours are spent playing it. In output-agreement games, both players must produce the same result, albeit at different times. In inversion-problem games, the guesser wins by coming up with the input the describer intends. Inversion-problem games are complicated because the guesser must dynamically respond to the other players’ actions. In input-agreement games, players must determine whether they were given the same or different inputs. Timing is important in these games, as studies have shown that goals that are well-defined lead to higher levels of player performance than easy or vaguely defined goals. Furthermore, in the case of ESP, players continued to play just to reach a higher rank.
In measuring the efficiency of GWAPs, games with higher throughput, or the number of problems solved by players per hour, are preferred over those with lower throughput. Combined with Average Lifetime Play (ALP), the expected contribution is just the throughput times ALP. Von Ahn and Dabbish believe GWAP contributes to the field of artificial intelligence, by leveraging human players, game developers are able to capture large sets of data. I agree with most of the ideas set forth in this article, which sets up a premise for evaluating the next paper.
In Human Computation: A Survey and Taxonomy of a Growing Field, Quinn and Bederson further explore the idea of “human computation. Collective intelligence, crowdsourcing, and social computing all fall under human computation. Human computation, in contrast to artificial intelligence, is defined as “a paradigm for utilizing human processing power to solve problems that computers cannot yet solve.” Wikipedia is not a computational platform, for by definition a computer with free will to choose its tasks would no longer be a computer.
Quinn and Bederson talk about GWAPs as being formed by motivation, human skill, and aggregation; followed by quality control, process order, and task-request cardinality. It is assumed humans acquire a great deal of common sense, and the collection of such knowledge is an important endeavor. Task-request cardinality is when an end-user uses a service hinged on human computation, and depending on the structure of the problem, one-to-one; many-to-many; many-to-one; few-to-one cardinalities may be adopted. This paper further breaks down the metrics that define GWAPs, without going too in-depth into which types of GWAPs are more successful than others.
In Soylent: A Word Processor with a Crowd Inside, Soylent is a word processing interface that allows writers to call on Mechanical Turk workers to shorten/proofreed/edit parts of documents. Find-Fix-Verify is a quality control program, that splits the tasks into a series of generation/review stages. Soylent is composed of Shortn, Crowdproof, and the Human Macro. The assumption is that crowd workers do tasks the computer cannot do automatically. Find-Fix-Verify breaks down the tasks into a series of generation/review stages to produce reliable results. Shortn, or text shortening, condenses sections of the text up to 15-30% of a paragraph, and ~50% with multiple iterations, with a focus on unnecessarily wordy phrases. Crowdproof aims to catch spelling/style/grammar errors that AI algorithms cannot find/easy and obvious mistakes. The Human Macro is Soylet’s natural language command interface.
Since Soylet adopts human computation, the types of problems it encounters tends to originate with its Turkers: The Lazy Turker and Eager Beaver. The Lazy Turker does as little as possible to get paid. Eager Beavers are helpful, but create further work for the user. Fix-Fix-Verify seeks to control both the Lazy Turker and Eager Beaver each worker can make a clear contribution. Another concern with the Turker is that anonymous workers may not have the necessary knowledge to edit the text.
While Soylet can be evaluated with some of the criterias set forth by the first two papers, there are many problems with employing crowdsourcing to edit papers. Nonetheless, the results produced by the Shortn, Crowdproof, and Human Macro are impressive, and an useful application of human social intelligence where artificial intelligence has yet room to advance.
Ali Sinan Koksal - 9/26/2011 3:09:39
The first paper, by Quinn and Bederson, gives a definition of human computation, presents a number of dimensions in which it can be characterized, and proposes a way of conducting future research by considering these dimensions carefully. What best characterizes human computation is that problems that are dealt with can be framed as computational ones, and that the process is directed by a computational system (rather than users' free will to choose what goals they want to accomplish).
Human computation is important in solving problems that are not yet efficiently solvable by computer systems, but easily solvable by humans. Motivating users to contribute to the problem solving is a key aspect in this domain. One successful class of systems use "enjoyment" as a motivating force, and this is the topic of the paper titled "Designing Games With A Purpose", that I chose to read as the second paper today (this was indeed a hard decision!).
In this paper, the proposed approach is to integrate computational work and games seamlessly to achieve tasks in the form of "side effects" of game playing. Three game templates are presented, as well as general guidelines for designing enjoyable and efficient games. It is important to give users a challenging, entertaining game while also ensuring that the output is accurate.
The discussion on motivation reminds of one of the challenges of groupware that we studied recently: that users may be discouraged by tasks that do not directly benefit them. While paying "workers" is also a valid approach in some cases (by leveraging e.g. Mechanical Turk), I was mainly impressed by the brightness of the work on transforming a computational problem into a game that a large number of people are willing to play. I am personally interested in possibly adopting such an approach in my future research, by seeking ways of embedding parts of software synthesis tasks that are computationally expensive into games/puzzles that might be easily solvable by humans.
It would have been helpful to see at least one example of human computation systems needing no aggregation at all, in the survey paper. Perhaps this lack is related to the absence of well-developed examples outside of collective intelligence that the authors mention.
Sally Ahn - 9/26/2011 3:32:21
Quinn and Bederson provides a classification framework for comparing systems that leverage human skills to overcome computational challenges. The key contribution that they make, which the paper also emphasizes, is the importance of forming a concrete identification system for furthering advance in research. The six dimensions of human computation that they outline, despite being somewhat arbitrarily chosen, do provide a common background against which to compare the many instances of systems that leverage human skills.
They distinguish "human computation" from similar terms like "collective intelligence" and "crowdsourcing" with the features that: 1) the problem is such that it may someday be solvable by computers and 2) the humans' role is directed by the computational system. I found both of these definitions rather vague, but the circularity of the first bothers me in particular; according to their definition, human computation must perform tasks that we know computers will be able to solve in the future. However, this excludes problems which we do not *yet* know how to formulate into a computational model. This does not necessarily mean the problem can never be solved by a computer, but we may still be able to leverage humans' innate abilities along with computers for a partial solution. An example of such problems is automating semantic interpretation of language or images. I think the question of whether and how a problem can be formulated into a "paradigm of computation" is in itself a huge part of "computational challenges". One might even argue that once we know the answer to that question, the value of "human computation" diminishes; it becomes just a temporary shortcut.
"Designing games with a purpose" is a more specific discussion on a particular type of human computation that uses entertainment for motivation. The authors provide general design principles that are supported by the success of past and currently existing GWAPs. As noted in Quinn and Bederson's high-level view, motivation and quality control are key aspects of GWAP design, and the authors of this paper explains how these two aspects can be integrated through proven models: output-agreement games, inversion-problem games, and input-agreement games. Although these examples were helpful for exemplifying the power of GAWPs, there is still difficulty in extending these examples to new problems and applications. Nevertheless, the authors make an important contribution by defining measures for throughput, ALP, and expected contribution for GWAPs evaluation.
Donghyuk Jung - 9/26/2011 3:52:41
Human Computation: A Survey and Taxonomy of a Growing Field
This paper shows a classification system for human computation systems and highlights the distinctions and similarities among human-involved computation systems. The purpose of this paper is to reveal “the structure of those systems”, thus “helping new researchers understand the landscape and discover unexplored or under-explored areas of opportunity.” I don’t agree that this paper help researchers discover unexplored or under-explored areas. However, this paper definitely helps me understand the landscape of this area. I’ve already known CS areas mentioned in this paper but I couldn’t explain them myself clearly due to lack of understanding the criteria for classification. The below are a few definitions and distinctions about human computation.
- Human Computation: “A paradigm for utilizing human processing power to solve problems that computers cannot yet solve.” By von Ahn.
- Crowd-sourcing: “ Crowd-sourcing is the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call.” By Howe
- Difference between Human Computation and Crowd-sourcing: “Whereas human computation replaces computers with humans, crowd-sourcing replaces traditional human workers with members of the public.”
- Intersection of crowd-sourcing with human computation: “Applications that could reasonably be considered as replacements for either traditional human roles or computer roles.” (e.g. translation)
- The key distinction between Human Computation and Social Computing: Social computing facilitates relatively natural human behavior that happens to be mediated by technology, whereas participation in a human computation is directed primarily by the human computation system.
Soylent: A Word Processor with a Crowd Inside
This paper introduces architectural and interaction patterns for integrating crowdsourced human contributions directly into user interfaces. They present Soylent, a word processing interface that utilizes crowd contributions to aid complex writing tasks ranging from error prevention (Crowdproof) and paragraph shortening (Shortn) to automation of tasks (The Human Macro) like citation searches and tense changes.
I agree that Soylent can improve the quality of writings with less cost or time in general. It is largely because traditional writing processes require a fairly serious amount of resources (Draft -> Revise -> Proofread). Although there exists some issues related with crowd-powered word processing (wait time, cost, legal ownership, privacy, and domain knowledge), these problems can be solved by tweaking the system.
However, I think this crowd-sourced word processing has fundamental problem. When it comes to writer’s intention or purpose, readers cannot fully understand these elements by looking at some pages or paragraphs of articles. Likewise, Turkers might do some works on this platform without understanding writer’s intention. For instance, “Eager Beaver” can add some extra information that is not relevant to the story, thus distorting one’s own literary style as well as main topics. Within crowd-sourcing system, keeping qualitative levels of articles is very hard task because it is very difficult to control/maintain Turkers’ performance a certain level.
Peggy Chi - 9/26/2011 6:46:40
Computing power can go up rapidly, but it might not be the only key to solve hard problems. What if we can make use of human efforts? Quinn and Bederson gave a great overview on human computation, including the related fields, elements, issues, and examples. In the other paper, Soylent was a smart example that showed how crowdsourced contribution can be integrated to specific tasks.
It is really a cleaver idea to ask online users for help. Leaping from running programs on mainframes, personal computers, multiple personal computers (taking computing power across the Web), expert input (paying and training people to perform tasks), to Web2.0 idea of online help, researchers have proved many successful examples on human computation. Because of my research interests, one example I like for such comparison is on knowledge collection. Cyc is an AI project running for more than 25 years that aims at collecting and representing human commonsense knowledge in a structural way based on defined ontology inputted by trained experts , to avoid garbage-in-garbage-out problem. The results showed that the knowledge base (KB) was relatively clean, but the constrains made it difficult to spread out and thus became mostly for expert systems. On the contrary, OpenMind project goes into another extreme, to invite the online public to input almost without constrains and learning based on a straightforward structure , supported by mechanism such as vote and data analysis for quality control. Who wins? It's hard to judge by a single perspective, but the efforts (no matter time or economic) are significantly different.
From the survey paper we learned how to motivate the crowd and control quality of the results. However, I'm still not too clear about when and what is best for applying such human power. It seems especially successful on problems of recognition, such as image or semantic processing, when even noisy information is better than no data or processing from scratch. The (unfortunate) example of searching for Jim (also see  for technical details and lessons learned) and the Red Ballon challenge (a competition to identify several targets in the states as fast as possible ) particularly presented how effective it can be to invite people put efforts on finding and reporting. However, when we have a hard problem to solve (also one question is: how hard it should be?), in what conditions we should consider crowdsourcing solutions? Is asking for qualified human help also an equal-difficult problem as the target problem?
Reference:  Cyc http://www.cyc.com/  OpenMind http://openmind.media.mit.edu/  Hellerstein, J. (2011). Searching for Jim Gray: a technical overview. Communications of the ACM, 54(7), 77–87. doi:10.1145/1965724.1965744 http://cacm.acm.org/magazines/2011/7/109892-searching-for-jim-gray/fulltext  Tang, J., Cebrian, M., & Giacobe, N. (2011). Reflecting on the DARPA Red Balloon Challenge. Communications of the ACM.
Jason Toy - 9/26/2011 7:21:10
The purpose of this paper was to develop a classification system in order to categorize current types of human computation, so that they could be compared against other forms.
This paper presents a new framework for categorization of human computation by defining key characteristics of their use. For example, the motivations of users to join a new project and how they could affect its dynamics. A project which works on the basis of altruism may require less error checking than one based on monetary reward because users may be invested in the goal and less likely to attempt to cheat the system. There are many real-world systems that can be classified by this paper: one of which is Foldit, which is a fun-for-purpose game which allows gamers to unfold chains of amino acids in competing groups. This program has allowed users to build an accurate model of monomeric protease enzyme, a part of HIV, in three weeks, a problem that was previously unsolvable and has resulted in the acknowledgement of the crowd in a research paper about the enzyme. Understanding the motivations and successes of previous iterations allows for more research into building better human computation systems.
The paper does a good job with its classification system, using dimensions which are definitive of each system. An example given was VizWiz, where a quick glance at resulted in a motivation, cardinality, and process order that one could look at to create possible new applications. One thing the paper lacks is the application of their system to problems such as the treatment of users as resources. For example, Amazon's Mechanical Turk offers so little money that it appears impossible to make a living off it. In addition, though this is reflective of the systems in use today, many of the verification procedures mentioned depend on the assumption that users know what they are doing. The paper mentions personal bias and lack of knowledge, but systems such as redundancy, input/output agreement, and reputation all assume the majority of people are actually correct.
Soylent is a word processor that uses crowdsourced human contributions for editing. This paper goes on to discuss the problems of working with a crowd and how the authors of this system mitigates these effects.
Soylent is a new system for editing papers that allows an author to use other people to edit papers within their word processing interface. It uses two of the design principles detailed in last week's papers: "Groupware and Social Dynamics: Eight Challenges for Developers" by Jonathan Grudin and "Beyond Being There". Editing a paper requires the collaboration of other users, but in the end, most of the writing process is an individual task. Soylent acknowledges this by building their editing system into Microsoft Word, an interface people know how to use. Second, Soylent uses batch responses to papers to keep editors' opinions independent and avoid compounding of errors. An interesting use of this system is to utilize unused manpower in companies: thus reducing cost and protecting the privacy of the company. An interesting research avenue is delving more into the differences between human grammatical editing and Microsoft Word's editing and seeing if the two either can be combined or either system can be improved on. I.e. improving Microsoft Word's algorithms from user results, or teaching people how to edit either from other users' results or Microsoft Word's.
The paper does a good job of following good groupware design principles, as defined by papers we read last week. One concern of mine is the interference of Soylent with the ability to critique and teach English. A large amount of writing at the high school and undergraduate level is meant to be completely individual. This allows professors to point out flaws or areas that require improvements in the students' papers. While it is true that in many cases papers are proofread by others, such as in the case of college admission essays, it is usually not easy to access a large number of people who would make significant changes to a paper. Soylent allows for easy access to a wide range of people, and as its ability to succeed in editing increases, it feels that the motivation for individuals to learn and improve may decrease.
Designing Games With a Purpose
"Designing Games With a Purpose" is about games with a purpose, a type of game created that results in solved problems, not because the participants are interested in doing so, but because they are interested in being entertained. The paper describes three templates for creating such games in addition to metrics of effectiveness and how to improve on them.
This paper presents a new system for creating new games with a purpose by proposing three different templates: output-agreement, input-agreement, and inversion-problem games. It addresses the problem of judging the success of these games by proposing that expected contribution (success) is equal to throughput times average lifetime play. This metric allows for both the effectiveness of the game to get the job done and its ability to interest people into playing longer to factor into the equation. Future research can be done on how the data gathered can be useful in training AI systems. In certain cases, the training set of data might be unique but should not require a new medium or game to be built. A template that builds a general game for specific situations (that do not involve things like tagging photographs) might be something to look at in the future.
The paper does a good job of outlining of gameplay factors that might interest gamers to continue playing. I feel that the idea that a game with a purpose must be entertaining like a game rather than just an interface to a system is the most important of the paper. However, all three templates proposed by the paper for games are very similar in that they require a paired opponent or a simulation of one. Restricting the types of games than can be used to rules such as "rounds" or "having one opponent" limits the possible interesting systems that could be used. Many games that are successful today such as Zynga's Farmville allow for users to do repetitive tasks in short spurts of time, without the need to really compete with others. These games allow you to interact with your friends, which increases users' interest in the game. By promoting a competitive method, the paper's templates do not allow for this factor due to the possibility of collusion. I think the templates of the paper, while making a good effort to build games that are entertaining, are still just masking the system that they try to avoid doing. The result is an average lifetime play of just 91 minutes rather than the dozens of hours and a lasting relationship that other games can obtain.
Shiry Ginosar - 9/26/2011 8:31:19
The authors of this paper attempt to bring order to the filed of Human Computation in several ways. First, the authors define the term of Human Computation. Second, they differentiate it from similar fields such as Crowd Sourcing and human computing. Third, they review the body of work in Human Computation to produce a set of dimensions along which different approaches to Human Computation can be differentiated and compared.
While the authors set out to create this taxonomy in order to encourage diversity in future work, I find that the mere existence of such a taxonomy in an emerging amorphous field may in fact create the opposite effect and hinder researcher's creativity. Classifying the different mechanisms used in research rather than starting from first principles encourages advance forward in tiny steps rather than huge leaps. Such a course seems to be set by the authors themselves when they advise researchers to "pick two dimensions and list all combinations of values" in order to come up with unexplored methods to try out. This type of mechanism-first approach seems contrary to the goal of human computation which is to find solutions of hard computational problems, and was definitely not used when the leap that started the field of Human Computation to begin with was taken. One would hope that research time and money is spent primarily in order to solve difficult problems and not merely to test out mechanism combinations.
Vinson Chuong - 9/26/2011 8:33:41
Quinn and Bederson's "Human Computation" identifies six salient attributes under which existing and theoretical human computation, crowdsourcing, social computing, and data mining systems can be classified and compared. Bernstein, Little, Miller, Hartmann, Ackerman, Karger, Crowell, and Panovich's "Soylent" explores the viability of using crowdsourcing to perform routine tasks which are easily described but hard to compute in the context of a real-time user interface.
Out of current research and papers within and around the field of human computation, Quinn and Bederson offers definitions for commonly used terminology and extracts six "classification dimensions": motivation, quality control, aggregation, human skill, process order, and task-request cardinality, under which existing systems can be classified and new systems can be described. Their definitions and classifications serve to ground and unify the ideas and research of this new and growing field--they offer a way to compare and expand upon seemingly unrelated areas such as Data collection for machine learning algorithms, groupware systems, crowdsourcing, and protein-folding games.
Soylent is a crowdsourcing system for identifying errors and possible revisions in and performing free-form tasks related to written text. Under Quinn and Bederson's dimensions, it can be classified as:
Motivation - pay Quality Control - output agreement, defensive task design, statistical filtering, automatic check Aggregation - wisdom of crowds, iterative improvement Human skill - language understanding, human communication Process order - requester->computer->worker Task-Request Cardinality - many-to-many
It draws quality and speed comparisons between word processor functionality and crowdsourcing, and reveals that crowdsourcing can offer higher quality results at the cost of time and money (which presumably will decrease as more people become available to do work. This paper shows that crowdsourcing is indeed viable in the context of "real-time" tasks and motivates the use of its approach in other applications: bug fixing, software QA, simple user interface testing, and many others.
Manas Mittal - 9/26/2011 8:41:49
The soylent paper presents designs to integrate crowdsourcing inside traditional UI's. They introduce the Find-Fix-Verify pattern that splits complex tasks into intelligence tasks that utilize independent agreement to produce reliable results. They also introduce the terminology "The Lazy Turker" and "Eager Beaver". One key question about the paper is the level of generality of the 'find-fix-verify' pattern. Note that verification is a commonly accepted activity [in general] for such crowdsourced tasks. For example, what is the 'find' element in image labeling. Where does this pattern work other than text editing?
The Games of purpose paper attempts to provide a framework for constructing games where people end-up performing (a specific kind of) useful work while doing 'having fun'. They present 3 templates for GWAP - Output Agreement, Input Agreement and Inversion games. They also define a metric for evaluating these games - Throughput * ALP. This is an interesting paper. The metric should account for stickiness based on how many times people come back, and also perhaps if they share the game with their friend or are going to recommend it positively (This is called the Net Promoter Score).
One discussion point around GWAP is to contrast these with 'edutainment' where participants have 'fun while playing the game' as opposed to 'fun in playing the game'. GWAP seems to fall in the latter, which is a key contribution. I wonder if we can design games that benefit the participant in a plausible way and they have fun doing it. The mind-training games on the iPhone (think Lumos Labs) come to mind [Super popular!].
The Benderson et al. paper attempts to devise a taxonomy for the crowdsourcing space. One thing I would like researchers to think about is where such collective intelligence works or fails. For example, in things that require deep thinking, the submission of individual thinking isn't always effective in producing better answers.
Rohan Nagesh - 9/26/2011 8:59:50
The first paper, "Human Computation: A Survey and Taxonomy of a Growing Field" defines the term human computation, discusses the various categories of human computation, and delves into some classification dimensions to categorize various human computation methods. The second paper "Soylent: A Word Processor with a Crowd Inside" discusses the author's development of a new word processing application that has embedded API calls to Amazon's Mechanical Turk to aid in shortening text, catch nuanced spelling and grammar errors, and enable users to formulate custom requests. The last paper "Designing Games with a Purpose" discusses adding gaming elements to make human computation tasks more appealing and fun.
As the first paper focused more on a classification scheme rather than presenting novel human computation methods, I didn't have any major disagreements and found the framework the authors presented to be logical and coherent, albeit a bit too exhaustive to inhale everything at once. I liked how the authors started with discussing motivation, since I do think motivation is a key differentiator for human computation systems.
As for the Soylent paper, I believe the latency and cost to the application are not acceptable for most users. Taking stock of my own situation, if I so desperately need help on a paper so as to be willing to pay others money, I am probably working on a tight deadline. Additionally, I would never risk introducing new and potentially worse inaccuracies into my paper through Amazon Turk. All in all, while the authors of the paper present a technology that I believe is futuristic and has potential, I just don't see the current market being ready for the product.
In response to the last paper, I absolutely agree that turning mundane tasks into games will work wonders. The authors state that if properly designed the games will be challenging yet dooable, creative yet not completely weird, and will overall meet the gaming needs of many people. People will perform these mundane tasks out of love for the game, and I absolutely agree with this approach.