Subtitles section Play video Print subtitles MALE SPEAKER: Today we're very pleased, very happy, to have Luis Von Ahn here today, from Carnegie Mellon University. His talk is on human computation. Luis is a very new assistant professor in computer science at the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in 2005, and I'm told he was the hottest new graduate on the market, with offers from just about every university out there, including corporate offers, too. He received his B.S. from Duke University. He received a Microsoft Research Fellowship Award. His research interests include encouraging people to work for free, as well as catching and thwarting cheaters in online environments. His work has appeared in over a hundred news publications around the world. New York Times, CNN, USA Today, BBC, and the Discovery Channel. Luis holds four patent applications and has licensed technology to major internet companies. Please join me in welcoming Luis Von Ahn. [APPLAUSE] LUIS VON AHN: Can you hear me now? OK. So, I want to start by asking a question to the people in the audience. How many of you have had to fill out a registration form for something? Like Yahoo, Hotmail, or Gmail, or some sort of web form where you've been asked to read a distorted sequence of characters or a distorted word such as this one? How many of you found it annoying? Awesome. OK, well, that was part of my thesis. That thing is called a CAPTCHA, and the reason it's there is to make sure that you, the entity filling out the web form, are actually a human, and not some sort of computer program that was written to submit the form millions and millions of times. The reason it works is because humans-- at least non-visually impaired humans-- have no trouble reading distorted characters, whereas computer programs simply can't do it as well yet. More generally, a CAPTCHA is just a program that can tell whether its user is a human or a computer. OK, let me say that another way. A CAPTCHA is a program that can generate and grade tests that most humans can pass, but current computer programs can not. Notice the paradox here. A CAPTCHA is a program that can generate and grade tests that it itself cannot pass. So in that way, CAPTCHAs are a lot like some professors. [LAUGHTER] Just to make things crystal clear, let me give you an example of one of these programs that can generate and grade tests that most humans can pass, but current computer programs cannot. Here's how the program works. First, the program picks a random string of letters. O-A-M-G, in this case. Then the program renders the string into a randomly distorted image, and then the program generates a test, which consists of the randomly distorted image and the question, "What are the characters in this image?" CAPTCHAs are used all over the place, for all kinds of things, and I could spend the next hour talking about all the different applications of CAPTCHAs. But since I don't want to do that, I want to illustrate one of the applications through a little story. So a few years ago, Slashdot-- which is a very popular website-- put up this poll in their site, asking which is the best computer science graduate school in the United States? This is a very dangerous question to ask over the web. As with most online polls, IP addresses of voters were recorded to make sure that each person could only vote, at most, once. However, as soon as the poll went up, students at CMU wrote a program that voted for CMU thousands and thousands of times. The next day, students at MIT wrote their own program. And a few days later, the poll had to be taken down with CMU and MIT having, like, a gazillion votes and every other school having less than 1,000. I guess the poll worked in this case. [LAUGHTER] I'm just kidding. But in general, this is a huge problem. You simply cannot trust the results of an online poll, because anybody could just write a program to vote for their favorite option thousands and thousands of times. One solution is to use a CAPTCHA to make sure that only humans can vote. CAPTCHAs have many, many other applications. Another one is in free email services. For instance, there are several companies that offer free email services-- Yahoo, Microsoft, Google-- and up until a few years ago, all of them were suffering from a very specific type of attack. It was people who wrote programs to obtain millions of email accounts every day, and the people who wrote these programs were usually spammers. So if you're a spammer and you want to send spam from, say, Yahoo, you run into the problem that each Yahoo account only allows you to sound, like, 100 messages a day. So if you want to send millions of messages a day from Yahoo accounts, you have to own millions of Yahoo accounts. And this is why spammers wrote programs to obtain millions of Yahoo accounts. And the solution-- or one solution-- and this is what we originally suggested to Yahoo-- was to use a CAPTCHA to make sure that only humans can obtain free email accounts. Now, since CAPTCHAs are used all over the place to stop spammers from doing bad things, spammers have started coming up with all kinds of dirty hacks to get around the CAPTCHAs that are being used in practice. So let me explain a couple of them. Here's one. I'm sure a lot of you have heard of this. CAPTCHA sweatshops. Spam companies actually are hiring people to solve CAPTCHAs all day long. And they are usually being hired in other countries where the minimum wage is a lot lower, and this is currently happening. But there's at least two consolations. First, it's at least costing them some. So whereas before, they could get the accounts for free, now it costs them a fraction of a cent per account, so they can't get that many. Second, CAPTCHAs are actually generating jobs in underdeveloped countries. [LAUGHTER] So this is one dirty hack. There's an even dirtier hack, and I'm sure a lot of you have heard of it, and this is what some porn companies are allegedly doing. And I'm going to emphasize the word "allegedly." So, porn companies also want to send spam. They also want to break CAPTCHAs, and here's how they are allegedly doing it. They write a program the fills out the entire registration form, say, at Yahoo. And whenever the program gets to the CAPTCHA, it can't solve it. So what it does is it copies the CAPTCHA back to the porn page. Now, back at the porn page, there's a lot of people looking at porn. And suddenly, one of them gets this screen saying, "If you want to see the next picture, you got to tell me what word is in the box below." And you know what people do? They type the word as fast as possible. [LAUGHTER] And by doing so, they are effectively solving the CAPTCHA for the porn company bot. That is, they're effectively obtaining a free email account for them. So pornographers, they're really, really smart. So CAPTCHAs take advantage of human processing power in order to differentiate humans from computers, and it turns out that being able to do so has some very, very nice applications in practice. Now that I've told you about CAPTCHAs, now I can tell you what this talk really is about. This talk is not about CAPTCHAs. This talk is about human computation. Sort of the flipside of CAPTCHAs. The idea is there's a lot of things that humans can easily do that computers cannot yet do. I want to show you how we can solve some of these problems by just making good use of human processing power. And I think the best way to introduce the rest of the talk is with a little statistic, and the statistic is that over 9 billion human hours of Solitaire were played in 2003. 9 billion. Now, some people talk about wasted computer cycles. What about wasted human cycles? Just to give you an idea of how large this number really is, let me give you two other numbers. First is the number of human hours that it took to build the Empire State Building. Turns out it took 7 million human hours to build the entire Empire State Building. That's equivalent to about 6.8 hours of people playing Solitaire around the world. Now, in case you don't think the Empire State Building is a monumental enough task, let me give you another number. The Panama Canal. It turns out it took 20 million human hours to build the entire Panama Canal, and that's equivalent to a little less than a day of people play Solitaire around the world. I want to show how we can make good use of these wasted human cycles. And that is what I mean by human computation. In this talk, we're going to consider the human brain as an extremely advanced processing unit that can solve problems that computers cannot yet solve. Even more, we're going to consider all of humanity as an extremely advanced and large scale distributed processing unit that can solve large scale problems that computers cannot yet solve. I claim that the current relationship between humans and computers is extremely parasitic. We're parasites of computers. What I want to advocate for in this talk is more of a symbiotic relationship, a symbiosis. One in which humans solve some problems, computers solve some other problems, and together we work to create a better world. [LAUGHTER] OK, I'm getting freaky. But more seriously, I want to talk about some problems that computers cannot yet solve, and I want to show you how we can easily solve a lot of these problems by just making good use of human processing power. The first problem that I'm going to talk about is that of labeling images with words. So the problem is as follows. When inputting an arbitrary image, we want to output a set of key words that properly and correctly describe this image. [LAUGHTER] As you should all probably know, this is still a completely open problem in computer vision and artificial intelligence, in the sense that computer programs simply can't do this. However, a method that could accurately label images with words would have several applications, one of which you've probably already seen, and that is image search on the web. So Google, for instance, has Google Images. You can go there, type a word like "dog," and get back a lot of images related to the word "dog." Now, it is the case that there's no computer program out there that can tell you whether an arbitrary image from the web contains a dog or not, so the way Google Images works-- and image search on the web works, roughly-- is by using file names in html text. So if you search for "dog," you get back a lot of images named dog.jpg or dog.gif, or that have the word "dog" very near them. Of course, the problem with this method is that it doesn't always work very well. For instance, this is not any more, but it used to be the first page of results for the query "dog" on Google Images. There is an image of a rabbit, there. There's a guy in a blue suit. What the hell? But if we have methods such that for every image on the web could give us accurate textual descriptions of those images, we could potentially improve the accuracy of image search on the web. Such a method would have many other applications. Another one is inaccessibility. So it turns out that the majority of the web is not fully accessible to visually impaired individuals, and one of the biggest reasons is images. So blind people actually surf the web. The way they do it is they use screen readers, programs that read the entire screen to them out loud. But whenever a screen reader reaches an image, it can't do anything other than read the caption of that image. Of course, the majority of images on the web don't have proper captions associated to them. So again, if we had a method such that for every image on the web could give us accurate, textual descriptions of those images, we could improve the accessibility of the web. Such a method would have many other applications, and so what we want-- and what I'm going to tell you right now-- is a method that can label all images on the web. Not only that, it's a method that can label all images on the web in a way that's fast and cheap. How are we going to do it? Well, we're going to use humans, but we're going to use them cleverly. So normally, if you ask people to label images for you, you'd have to pay them to do so. And if you wanted to label all images on the web by paying people, you'd have to pay a lot of money. And even if you had a lot of money, if you wanted to label all images on the web by paying people fast, you'd have to find a lot of people who were willing to label images for living. Good luck with that. My approach is much better. Rather than paying people to label images for me, I get them to want to label the images for free. And in fact, they want to label the images so much that in some cases they're even willing to pay me to label the images for me. How do I do that? Well, I have an extremely, extremely enjoyable, multiplayer online game called the ESP game that people really, really like to play, and as people play, sort of as a side effect, they actually label images for me. Now, the ESP game has two very nice properties. First, as people play the game, the labels that they generate for images are accurate even if the players don't want them to be so. Second, as people play the game, they actually label images very, very fast. And in fact, using a conservative estimate, I'm going to show you later in the talk that if the ESP game is put on a popular gaming site, we could actually label all images on Google Image Search in just a few weeks. So how does the game work? Well, first and foremost, the ESP game is a two player online game. So there's a web site. You can go there to try to play the website. Whenever you go to the website, you get randomly paired with somebody else wanting to play the game. That's your partner. Now, you're not allowed to communicate with them, and you're not told who they are. It's just a complete stranger from the web. And the goal of the game is for both you and your partner to type the exact same word, given that the only thing you two have in common is an image. So you can both see the same image. You know you can both see the same image, and now you're told to type whatever the other guy's typing. Turns out that what people do, the best strategy, is just to type a lot of words related to the common image. So basically, both players are going to be typing a lot of words related to the common image until one of player one's words is equal to one of player two's words. They agree, they get points, and then they get happy. That's the basic idea of the game. Now, this word that the two players agree on is usually a very, very good label for the image, because it comes from two independent sources. Let me give you a better idea of the basic move of the game. Imagine you have two players, player one and player two. And they're both paired, so they can both see the same image. And now they're told, "Type whatever the other guy's typing." Notice, the players are not told, "Label the image," or even what labeling an image might mean. They're just told, type whatever the other guy's typing. So say at first, player one types "car," player two types "boy." It's not the same word, so the game still goes on. Say then, player one types "hat" and then "kid." Still none of player one's words is equal to one of player two's words, so the game's still going on. By the way, player one cannot see any of player two's guesses, and vice versa. So they're just typing words completely independently, until, say, player two types a word that player one had already entered. They agree and then they get a lot of points. This is the basic move of the game. The actual game looks a little more like this. Basically, both players have a certain amount of time to agree on as many images as they can. So in 2 and 1/2 minutes, they have to agree on as many images as they can. That's basically the game. Each time they agree on an image, they get a certain number of points. There's also a thermometer at the bottom that measures how many images the two players have agreed on, and if you fill the thermometer, you get like a gazillion points. There's also a pass button, so players can agree to pass on difficult images. And another really important component of the game is this thing we call "taboo words." If you've ever played the game Taboo, you should be able to guess what these are. Taboo words are words that are related to the image the players cannot use when trying to agree on that image. So in this case, for instance, you can't use "hat" or "sunglasses," or any plural or singular of these words. Now, where do taboo words come from? They come from the game itself. The taboo words are words that two other players have already agreed on for this particular image. So the nice thing about taboos words is that they guarantee that each time an image passes through the game, it gets a brand new, different label. The other nice thing about taboo words is they make the game more difficult, and therefore more fun. I'm not talking about fun. Is this game fun? Well, amazingly, it really is a lot of fun. So far, we've gotten over 15 million agreements-- that's over 15 million labels-- with about 75,000 players. Let me say that another way. 75,000 players have given us over 15 million agreements. That means that on average, each player is playing a lot. We have many people that play over 20 hours a week. That's like a full time job. We've had playing streaks that are longer than 15 hours straight. [LAUGHTER] I feel a little bad about this. So by now, the game has a mechanism that if you've been playing for longer than 15 hours, it will cut you off. And as a promise to my department head, it's 10 hours if you're from a .edu domain. [LAUGHTER] So, so far, over 15 million agreements. What if you wanted to label the entire web? Well, 5,000 people playing the game simultaneously could label all images on Google Images in about two months. The striking thing here is that 5,000 is not a very large number. In fact, individual games in popular gaming sites, such as Yahoo, Polo.com, or MSN average over 5,000 players at a time. So if you put the ESP game on a popular gaming site, you could potentially label a lot of the images on the web in just a few months. A few more things about the game. There's also a single player version of the game. It's important to have a single player version of the game for several reasons. For one of them is the number of people playing the game is not always even. But also, whenever a player drops, it's important to just basically have them keep on playing the single version player of the game. And how do you get a single player game? Well, you simply can pair up a single person with a prerecorded set of moves. The idea is as follows. Whenever you have two people playing, you record everything that they do and when they do it. So you record all the words they enter, along with timing information. And whenever we want to have a single player play, we simply pair them up with a prerecorded set of moves. So that single player is playing with somebody else, just not at the same time. One nice thing about this, notice this actually doesn't stop the labeling process. That single player is playing with somebody else, just not the same time, so everything that I've said about labeling remains true. In fact, we can even go one step further. We can do the zero player game. We can also pair up prerecorded games with each other to get more labels, and if you count all the extra labels that the ESP game has collected so far, you get that so far the ESP game has collected over 39 million labels for images on the web, if you count all these. Now, one thing that some of you may be wondering about is what about cheating? So for instance, could you try to cheat to screw up the labels? Something like, my office maid and I could try to log in to the game at exactly the same time. Maybe we'll get paired with each other, and if we get paired with each other, we can agree on any word we want for any image. Or even worse, somebody could go to Slash.dot and type, "Hey, everybody, let's all play the ESP game, and let's all agree on the word 'A' for every image." Could happen. Fortunately I've thought about this, and the ESP game has several mechanisms that fully prevent cheating. Let me tell you a few of the things that we do to prevent cheating. Here's one. At random, we actually give players test images. These are just images that are just there to test whether the players are playing honestly or not. And what they are is they are images for which we know all the most common things that people enter for them. And we only store a player's guesses and the words they agree on if they successfully label the test images. So if you think about it, in a way, this sort of gives a probabilistic guarantee that a given label is not corrupt. What's the probability that a given label is corrupt, given that the players successfully label all of their test images? And this probability can be boosted by using the next strategy, which is repetition. So we only store a label after n pairs of players have agreed on it, where it is a parameter that can be tweaked. So every now and then, we actually delete all the taboo lists for the images, and we put the image back into the game afresh. And we only store a label after n pairs of players have agreed on it. So if we let x be the probability of a label being corrupt given that players successfully labeled all of their test images, than after n repetitions, the probability of corruption is x to the n. This is assuming that the n repetitions are independent of each other, but if x is very small, x to the n is really, really small. I'm going to say so far, we've collected lots and lots of labels, and we have not seen cheating be able to screw up our labels. In fact, the quality of the labels that the ESP game has collected so far is very high. Let me now show you some search results. Let me show you what happens when we search for the word "dog," for instance. Here's some dogs. More dogs. More dogs. More dogs. And I could go on forever. Here's what happens when you sit for "Brittney Spears." You got to show this whenever you show search results. Here's what happens when you search for "Google." I prepared this for this talk. You get the founders. And one really nice thing about this is that this slide constitutes a proof that the word "Google" and the word "search" really are synonyms. On input that, people agreed on Google. OK. So let me now show you some sample labels. So what I'm going to show you right now are some images, along with the labels at the ESP game has collected for them so far. So here's an image, and here are the labels that the ESP game has collected for it so far. By the way, these could be ordered in terms of frequency. They're not. This is just the list of all the words that the ESP game has collected for this image so far. You should notice two things about this list of words. First of all, it's extremely accurate, meaning all of these words actually make sense with respect to the image. Second, it's extremely complete, meaning almost anything that you can imagine to describe something in this image is in this list. Not everything, but a lot of the things. And in fact, this is true in general of the word lists generated by the ESP game. They are as accurate and as complete as those generated by participants who are just paid to label images. Let me show you more sample labels. Here's another image. Anybody know who this is? Walter Matthau. He's an actor. And just to prep you for one of the labels, Walter Matthau was in the movie Dennis The Menace, and he played the character of Mr. Wilson. Some of the labels that the ESP game has collected for this image so far are-- [LAUGHTER] So that first one seems a little wrong, but actually, if you look carefully, you realize it's really not that bad. I like to tell people the ESP game has uncovered a major conspiracy. Now that we're on this topic, here's another image. By the way, I have no political affiliations whatsoever. I'm not a US citizen, and what I'm about to show you are simply the scientific results of what happens when you put this image on the ESP game. So some of the labels that the ESP game has collected for this image so far are-- [LAUGHTER] That last one, can you imagine how awesome the two players must have felt when they agreed on that last one? It must have felt great. And in fact, this brings us to one of the reasons why people really like the ESP game. It's because they can feel a special connection with their partner, especially when they agree on an off the wall word like "yuck" for an image of President Bush. In fact, it gets even better. A lot of the emails we get actually suggest that players feel a very, very, very special connection with their partner. Players like playing with partners of the opposite sex better. They want to know whether their partner's of the opposite sex, and a lot of the emails say things like, "My partner and I, we look at the world in exactly the same way. Can you tell me their email address?" This is great because, because I'm going to be rich soon. More seriously, this brings us to the question of why do people like the ESP game? I mean, it's true that the game was designed to be enjoyable, but what are the reasons that people like the ESP game so much? And to address that question, let me show you some of the most common things that people have said of why they like the game. Here's what one person said. By the way, I'm just going to let you read. So this is the sense of connection with your partner. Here are some of the other most common things that people said. [LAUGHTER] That last one, if you think about it, it makes perfect sense. Although that was not expected, it makes perfect sense. The ESP game helps people learn English, because you've got an image, you've got to say what it is in English. And that brings up the question, could you have the ESP game in multiple languages? The answer is sure, but I don't want to talk about that. So that's some of the most common things that people have told us of why they like the game. In addition, let me show you some of the things that people have said about the game in blogs. It was, at some point, in literally hundreds of blogs. Here's a couple of them. Here's what one guy said. Sense of achievement. But the best is the way this guy ends. [LAUGHTER] Here's another one. So this guy actually likes the concept of the game, but again, the best is the way this guy ends. So not everybody likes their partner, and it completely depends on whether you do well with them or not. If you do well with them, you fall in love with them. If you do badly, you think they're an idiot. Of course, you're not the idiot. They're an idiot. Even though the game is symmetric. But in addition to all those things that people have told us, we continually do measurement to try to figure out what are the things that make people play longer. So let me explain one of these measurements to you. At some point in the history of the game, I added this very small message in the corner of the screen alerting you whether your partner have already entered a guess or not. It's a very tiny message. This is just a magnification of it. It just tells you when your partner has already entered a guess or not. When this was added to the game, it wasn't added to all the players. It was just added to a small, random subset of the players. And then we measured whether the players who had this feature played longer than those who didn't. And it turns out that those who had this feature played a whopping 4% longer than those who didn't. Now, you might not think that 4% is very large, but actually, it's a statistically significant difference, and if you think about it, just a very tiny message in the corner of the screen makes people play 4% longer. Now, in a way, the ESP game is kind of like an algorithm. Much like an algorithm, it has an imput/output behavior. Its input is an image. It's output is a set of key words that properly describe the image. Much like an algorithm, you can analyze its efficiency. You can prove that its output is not corrupt with high probability, et cetera. So what I want to do now is I want to refer to all games-- like the ESP game, that are kind of like algorithms-- I want to refer to them as games with a purpose. And the idea that I want you to have in your mind is that games with a purpose is like running a computation in people's brains instead of silicon processors. And what do I do now is I want to give you other examples of games of a purpose. So the next problem that I'm going to talk about is that of locating objects in images. On input an arbitrary image, the ESP game tells us what objects are in the image, but it does not tell us where in the image each object is located. So what we would like to know is, we would like to know, yes, there's a man in the image, but the man is right there. There's a plant in the image, but the plant is right there. And not only that. We would like to know precisely which pixels belong to the man, which pixels belong to plant, et cetera. And we would like to have this information for a large percentage of the images on the web. If we could have this information, we could do a lot of really cool things. For instance, we could have an image search engine were the results are highlighted. It tells you this is where the man is in each one of your images. That would be pretty cool. But even better, if we have this information for a lot of images, we could use this for training computer vision algorithms. So computer vision has advanced significantly over the last 20 or 40 years, but so far, it hasn't been able to create a program that can, with high probability, figure out where in the image each object is located. And one of the major stumbling blocks is the lack of training data. But if we had this data for a lot of images on the web, we could use it to train better computer vision algorithms. So this is what the next game is going to do, and the next game is called Peekaboom. And here's how it works. It's a two player game. Much like the ESP game, both players don't know anything about each other, and they can't communicate with each other. At the beginning of every round-- oh, by the way, the player on the left, we're going to call him "Peek." The player on the right, we're going to call him "Boom" So Peek and Boom. At the beginning of every around, Boom gets an image along with a word. In this case, it's the image of a butterfly and the word is "butterfly." That image word pair comes directly from the ESP game. Peek, at the beginning of every round, gets nothing. Just a completely blank screen. And the goal of the game is for Boom to get Peek to guess the word "butterfly." And the only thing that Boom can do to help Peek guess the word "butterfly" is he can take his mouse, put it somewhere in the image, and click. And whenever Boom clicks, a circular area around that click is revealed to Peek. The actual circular area is a lot smaller than the one I revealed. I just didn't want to go through all the clicks. But basically, when Boom clicks, a circular area around the click is revealed to Peek. And then Peek, given only the circular areas, has to guess what word Boom is trying to make them guess. Whenever Peek guesses the correct word, both players get a lot of points, and then they switch roles. Peek becomes Boom, and Boom become Peek. Now notice, in this case the word was "butterfly," so Boom clicked on the butterfly. But had the word been "flower," Boom would have clicked on the flower. So by just watching where Boom clicks, we get information about where each object is located in every image. By the way, I'm brushing over a lot of details. For instance, there's also hints. So Boom can give hints to Peek about whether the word is a noun, is it a verb, is it text in the image, et cetera. Now, just to make things more clear, let's play a couple rounds of Peekaboom. So you guys have to guess what I'm trying to make you guess. Here we go. So now? "Bush," awesome. OK, you got it. Bush. Here's another one. It's a verb. "Pick." OK, very nice. I love this image. So, imagine we were back here, and I gave you a different hint. I told you it was a noun, and not only that, I started pointing there. What would you say this? Hair, exactly. So this is another mechanism of Peekaboom, and it's something called pings. So not only can Boom reveal part of the image. After something has been revealed, he can also point to somewhere, saying it's this, it's this. This gives us extra information about where each object is located in the image. So this is the basic idea of Peekaboom. This is what the Peekaboom screen looks like for one of the players. This is for the Boom player. And now the first question is, is this game fun? Well, it turns out it really is a lot of fun. By the way, the statistics I'm going to show you right now are a little outdated. This is just for the first four months of gameplay. So in the first four months of gameplay, 27,000 players gave us 2.1 million pieces of data. By a piece of data, I mean an image along with a word correctly analyzed by a pair of players. In the first 10 days after release, actually many people played over 120 hours. That's an average of over 12 hours a day. So it's a lot of fun. Here's the top scores list of Peekaboom, just to put things in perspective. This is for the first four months of gameplay. Each time you play Peekaboom, you get on average 800 points. So the top player there has 3.3 million points. Even the lowest player in this list, in the first four months, have played at least 270 hours of gameplay. So people really love this game. Now what about the data that it produces? Is it any good, or how do we get good data out of Peekaboom? So let me explain how we get good data out of Peekaboom. This is an image of Ronald. The word is "Ronald." By the way, I love this image. And the last three images were collected by searching for the word "funny" using the ESP game. So we get an image of Ronald, a word "Ronald," here's what we do to get good data out of this. We give the same image word pair to a bunch of pairs of players. And from each pair of players, we get a region of the image that is related to the word. Now we take all of these regions and intelligently combine them, and get a really good idea of where the object is located in the image. And on top of that, we can add sort of where the pings are to get more information about what the most salient parts are. And we can go even one step further. We can take this information and combine it with image segmentation algorithms to get pretty much the precise outline of where the object is in the image. Now I'll, say, this doesn't work for all objects in the images. It works like that perfectly for about 50% of the objects in the images that we have data for. For the rest, it works mostly, but it can miss like a foot or something. But even without using segmentation, we could just use the Peekaboom data in a really, really boneheaded way to come up with a search engine in which the results are sort of highlighted. And we've done this. We have a search engine where you can search for "man," "dog." And for each image, it tells you here's the man, here's the dog, here's the man, here's the dog. And more man, more dog. OK. Forget about Peekaboom. Brand new gain, Verbosity. So this next game that I'm going to talk about, by the way, has not yet been released, so I'm not going to be able to show you any statistics. But I'm just going to quickly explain what the idea is. So what does Verbosity do? The idea is it collects common-sense facts. So what's a common-sense fact? Here's an example of a common-sense fact. Water quenches thirst. It's a true fact that everybody knows. Here's another common-sense fact. Cars usually have four wheels. Now, the thing about common-sense facts is that it is estimated that each one of us has literally hundreds of millions of them in our head. And these are what allow us to act normal and navigate our world successfully. The other thing about common-sense facts is that computers don't yet have them. But if we could somehow put common-sense facts into computers, we could potentially make them more intelligent. And I'm not even talking about making computers as intelligent as humans. Just a little more intelligent. Like for instance, transforming our search query into something better, that works better, or something like that. So if we could somehow collect a lot of common-sense facts and put them into a computer, we could potentially use this to make computers more intelligent. And in fact, there's been a lot of projects that have tried to do this, including one at MIT, and so far they haven't been able to collect enough common-sense facts in order to really make a difference, because the process of entering common-sense facts into a computer is extremely tedious. So we're going to turn this into a game. So for the next game that I'm going to talk about, the input-output behavior of this game is as follows. On input a word, this game is going to output a set of common-sense facts about that word. By the way, I'm oversimplifying here. These common-sense facts are not just going to be common-sense facts in English. They're going to have some structure to them. So there's going to be logical operators inside them, et cetera. So this is the input-output behavior. On input a word, it's going to give common-sense facts about that word. And the way the game is going to work-- game called Verbosity-- and the way it's going to work is as follows. It's a two-player word guessing game. There's two players, a Narrator and a Guesser. Same idea as the ESP game. Basically, both players can't communicate with each other. They don't know anything about each other. At the beginning of every round, the Narrator gets a word and has to get the Guesser to guess that word. And what the Narrator can do to get to Guesser to guess that word is he can pick one among many sentence templates that they have. Which sentence templates are available to them at the time vary depending on the word. So he can pick one among many sentence templates, and fill it with an appropriate word. What's an appropriate word is a word that's not "milk," and it's also a word that fits in grammatically with the sentence template. Whenever the sentence template is filled in, it's sent to the Guesser. Then the Narrator can pick another sentence template, fill it with an appropriate keyword, and send it to the Guesser. And the Guesser, given enough hints about it, eventually has to guess what word it is, and whenever the Guesser guesses the correct word, both players get points. The way we get common-sense facts out of this game is by just watching what the Narrator says for each word. By the way, I'm brushing over a lot of details for this game. This is just the basic idea, so high level idea is it's a two-player game. Player one and player two. At the beginning of every round, player one gets a word, and because of the rules of the game, has to give some common-sense facts about the word. Then those common-sense facts are sent to player two, and player two, given only the common-sense facts, has to guess what word player one got as input. And if player two can guess the correct word, both players get points. This is the core mechanism of Verbosity. Now, I want you to notice two things about this core mechanism. First, it's fun. This is very similar to the core mechanism of a lot of popular party games. Basically, just word guessing games. Second, this core mechanism actually gives output that is already, in a way, verified. Notice, we're getting all the common-sense facts from player one. But what's player two doing? In a way, player two is verifying the output. Because if player two can guess the word given only the common-sense facts, then those common-sense facts must have something to do with the word. So in a way, it's giving output that is already verified. And the same core mechanism is exactly the same core mechanism that was used in Peekaboom. So in the case of Peekaboom, it's a two-player game. Player one and player two. At the beginning of every round, player one gets an image along with a word, then has to give a region of the image that is related to the word. Then that region is sent to player two, and player two, given only the region, has to guess what word player one got as input. The same mechanism as Verbosity. And again, it's fun, and also gives output that is, in a way, verified. We're going to call all games that satisfy this mechanism, we're going to call them asymmetric verification games. So this is a general mechanism for building games with a purpose. So in general, for an arbitrary input-output behavior, we could define a game as follows. It's a two-player game. We give the input to player one and have them give an output. Then we send the output to player two, and given only the output, player two has to guess what input player one got. If player two can guess the correct input, both players get points. This mechanism has two very nice properties, that for a lot of input-output behaviors, it's fun, and also, it gives output that is, in a way, verified. Of course, this doesn't work for all input-output behaviors, but it works for a large class of them. And these are asymmetric verification games, and it's asymmetric because both players are doing something slightly different than each other. And it's also asymmetric, as supposed to symmetric verification games, where you've already seen an example of a symmetric verification game, and that's the ESP game. So this is another general mechanism for creating games with a purpose. So for an arbitrary input-output behavior, you can give both players the same input, and ask them to guess what output the other player is going to give. So if they both give the same output, they get points. Again, this mechanism is fun for a lot of input-output behaviors, and also has the property that the output it gives is, in a way, verified because it comes from two independent sources. And now, we can start looking at the differences between symmetric and asymmetric verification games. So for instance, symmetric verification games, I claim, put a constraint on the number of inputs per output. The number of outputs per input, sorry. If a given input has too many outputs, than a symmetric verification game is never going to work, because both players are never going to agree on the same output. Asymmetric verification games put a constraint on the number of inputs that yield the same output. If there's too many inputs that yield the same output, then given only the output, you'll never be able to guess what input it came from. I'm going to finish, now. Hopefully, I've been able to convince you that there's a lot of power into looking for clever ways of utilizing human cycles. In fact, if you think about it, this talk hints at a paradigm for dealing with open problems in artificial intelligence. If you have something that you really can't solve in artificial intelligence, then maybe you can turn the problem into a test that distinguishes humans from computers. Turns out that being able to do so has some very nice applications in practice. Or alternatively, maybe you can turn the problem into a game, in which case you don't even need to solve your problem anymore. People will solve it for you. One nice thing about this whole research agenda is that it provides a much better motivation for the movie The Matrix. If you think about it, the motivation for the movie was that in the future, computers become a lot more intelligent than humans. But rather than killing us, they actually have to keep us around, because we generate power. That makes no sense. A much better motivation would be in the future, computers become a lot more intelligent than humans. But rather than killing us, they actually have to keep us around, because there's a couple of problems that we can solve that they cannot yet solve. My ultimate research goal is to transform our human existence to just eating, sleeping, drinking, playing-- never mind. [LAUGHTER] Www.captcha.net. Www.espgame.org. Peekaboom.org, and that's it. Thank you. [APPLAUSE] Yes? AUDIENCE: Does it concern you at all that the fact that you're using a game will automatically give you a very biased population of people that are giving us to problems we want answers to? And this population of people are the people that have way more time on their hands, and are not motivated to maybe get a job or do something [UNINTELLIGIBLE]? [LAUGHTER] LUIS VON AHN: Very good question. It's true that the population is biased. There's no question about that. But for a lot of really simple things, I mean, anybody can do it. But it's true that the population is biased. That's definitely true. AUDIENCE: Have you seen any results? LUIS VON AHN: I can tell you that the population is biased, but I have not seen anything that really can tell me because we're using gamers, it's like, this is happening instead of the general population. I have not seen that. Yes? AUDIENCE: I have a concern with asymmetric games where the input is very similar to the [UNINTELLIGIBLE]. For example, when you said milk is close to cereal. It's like a fraud question. What if someone types in milk and I come up with pail-- P-A-I-L. I think it would be very obvious for his partner to guess which question to ask. LUIS VON AHN: Sure. I didn't mention a lot of the mechanisms that we use to stop that sort of cheating, but there's a lot of mechanisms. For instance, we don't let them type anything that's not a dictionary word. Second, that word has to fit in with the template, grammatically. But still, I mean, there's a lot of mechanisms that try to prevent that. But you're right, that's a concern. AUDIENCE: So I think the popularity of a lot of games and these games in particular are [UNINTELLIGIBLE]. They're novel and different. This is a new thing, let's try it out. We might spend 100, 120 hours on this. There was a site I remembered called Am I Hot or Not? a few years ago. Maybe I'm confessing something I shouldn't confess. But you spend a few hours. And 5 years from now the game won't exist. The question is, if you view this as a strategic shift in how we use human cycles, you're kind of hindered by the fact that this will probably die out within a few months. LUIS VON AHN: The answer to that is yes and no. So, there are games whose popularity lasts for thousands of years. And there's a lot of these gaming sites that have games that the popularity only last like six months or a year. And what they do is very simple. They have the same game concept and just redress it with another name and something else, and all the people come back. This is also well known to nightclub designers. Just change the name. But it is true that popularity does die, but that is completely game-dependent. Some games, the popularity lasts longer than others. So the ESP game has been running for well over two years now, and the popularity has not died. I mean, there was definitely an initial surge, but it has not died. So the amount of time that it works varies, and hopefully we can find games that last for thousands of years. Yes? AUDIENCE: Towards the beginning of the talk, you talked about accessibility and how vision impaired people [UNINTELLIGIBLE] screen reader. But I don't really see how these games close the loop on that. LUIS VON AHN: That's a very good question. The way I explained, the ESP game only gives you keywords. That's not quite enough for accessibility. It's better than nothing, but it's not quite enough. AUDIENCE: But you're not putting them back into the websites. LUIS VON AHN: Very good point. I see what you're talking about. But that's an engineering problem. You could actually do that just with a server that they connect to. FEMALE SPEAKER: Plug an extension to the browser. Something like that. LUIS VON AHN: Sure. FEMALE SPEAKER: Some people are better than others in any game, so can you take people from the opposite end of the spectrum, and there will be people who try to disect your brain. And the really bad people, you try to see how they improve. LUIS VON AHN: I don't understand. Say that again? AUDIENCE: There's a spectrum of ability in any game. So we can look at either end of a spectrum, and find the really good people, and study what algorithm they use. The really bad people, when they improve, see if [UNINTELLIGIBLE] to your algorithm. LUIS VON AHN: Right. You can do that. Yes, I agree. Yes? AUDIENCE: So you said you've running ruinning this game for two years now, which means you must have an obscene amount of data. LUIS VON AHN: Yes and no. I do have an obscene amount of data, but I recycle the images, because I just don't want to have that many images. So there are 39 million labels, but it's not that many images. AUDIENCE: You said the facts you get out of Verbosity are not simply English sentences, but have some more logical structure to them. I wonder if you could say a few more words about that. LUIS VON AHN: The reason is because of the templates. We have templates. We don't just let people free flow write English. We have templates. So out of those templates, we know things like, well, this is for purpose. So things like that. AUDIENCE: Let's say I have a really boring job, like I'm looking for defects in a manufacturing process. How do I turn that into a fun game? [LAUGHTER] LUIS VON AHN: That's a very good question. That would be really cool if we could figure out how to do that for everything. I don't know how to do that for everything. I don't even know if it's possible to do it for everything. But that'd be really cool if we could figure out how to do it for everything. I don't know if it's possible to do it for everything. AUDIENCE: From an ethical point of view, is there any problem that people would probably be spending their work hours playing the game, rather than their free time hours? And so you're not really gaining any productivity in society as a whole. LUIS VON AHN: Well, depending. I mean, you're right about that. But imagine we could turn everybody's work into something fun. That'd be really cool. So, depending. But one thing I should say about ethical is all these games, they don't try to trick you into doing anything. I mean, everybody knows what the purpose is. AUDIENCE: Maybe "ethical" is the wrong word. LUIS VON AHN: Right. Yes? AUDIENCE: A lot of these games are very good at getting basic facts out of people. Have you thought about how to get stuff that's a little bit more nuanced? Like if you leave milk in the fridge for three weeks, it's going to go bad? LUIS VON AHN: That's a very good question. I mean, it depends a lot on the particular domain. I don't know how to do it in general, but for instance, for images, I can tell you. The ESP game for images, most of the stuff that you get out of the general ESP game is very general stuff. I mean, the first word is going to be like "dog." Then once "dog" becomes a taboo word, it's probably going to be the breed of the dog, or something. But very generally, usually things that everybody knows. If you want to start getting things that only a few people know, then you can do a few things. So for instance, you can have people tell you what they want to see images of. So for instance, I like cars. Can I see images of cars? Then I'll be an expert on that sort of thing, and then you can do that better. Or you can use collaborative filtering to try to give people-- you figure out what they're good at, and you give them more images like that. And so you can start getting better things like that. But yeah, that's a very good question. Yes? AUDIENCE: It seems like you could also use these games to solve problems that computers are already good at solving, like you could have people add up numbers, of things like that. But those games likely would not be very fun. LUIS VON AHN: They might be. So like Sudoku. AUDIENCE: I guess my question is-- right, but that's like a constraint propagation problem that is a little bit harder to solve. LUIS VON AHN: Sure. But given that Sudoku's human solved, computers are a lot better at those. AUDIENCE: Have you or anyone thought about the cognitive aspects of games that are fun in this model, versus that aren't? And the computational models that are associated with it? It seems like there's a lot of human cognition interests there. LUIS VON AHN: Yeah, definitely. So part of the problem-- I should say two things. There's a lot of research on trying to define and figure out how to make things more fun. Non-general computational things, but just how to make games more fun. But nobody really knows the answer to this. I mean, this is an open problem. AUDIENCE: In the future, if the market becomes competitive do you think you'll have to start paying people money? LUIS VON AHN: I don't know. AUDIENCE: It depends on how much you're making out of that. LUIS VON AHN: Yeah, I don't know. Yeah? AUDIENCE: In the asymmetric verification games, how do you eliminate when the first person makes a mistake in the net, that mistaken output is sent to the second player, and then after more output from the first guy that's then sent to the second guy, they get the right answer. How do you know what-- LUIS VON AHN: The way to do that is by using the single-player game. We take all these facts that we get and treat them all separately, and sometimes you're just playing with a computer and we're giving you certain facts that we want you to verify. And eventually, you just try to intersect which ones are good and which ones are not, and you try to figure out. Yeah, all the way in back. AUDIENCE: How and when is the concept of the game presented to the players? LUIS VON AHN: How and when? AUDIENCE: Yeah. Like before they start playing or after they finish, you just say, oh, by the way-- LUIS VON AHN: Oh, no. Before. Beforehand. Beforehand. Yeah. Yes? AUDIENCE: With the security issues now, with things like Clicker or Picasa Web, where the images are controlled by a specific entity? LUIS VON AHN: What do you mean? AUDIENCE: I could see implications coming from this such that, you know, well, we want to provide all these results, but we're going to basically try and have a monopoly on the best images available. Which seems kind of anti-- I don't know. It just seems like it would be more-- people trying to get more control over content. LUIS VON AHN: I don't understand what you mean. There are problems with copyright. I mean, Google knows about that. So, yeah, there are problems with that. But I don't know what else. I mean-- Yes? AUDIENCE: So, this talk was about generalizing past ESP games to all kinds of things with human computation. So you've looked at a bunch of these now. From what I can tell so far, it's opportunistic. Given a new task where you know people are better than computers, is there some procedure for coming to figure out what the right game is to get at that? LUIS VON AHN: That would be great. But I mean, in the same way that, for instance, if I give you a new task and you have to come up with an efficient algorithm for it, there's no procedure to coming up with an efficient algorithm to solve something. I don't think there will be a procedure, given a problem, here's a game for it. I think it's going to be an art, much like coming up with efficient algorithms. AUDIENCE: What does that mean for your research strategy and research agenda? Is it to just continue to find, opportunistically, more of these? LUIS VON AHN: Yes. So basically, it's similar to what happens in algorithm design. I mean, people try to come up with general things. So there's things like dynamic programming that works for a lot of things. So that's the best you can hope for, and that's sort of what I'm trying to do. But yeah, I don't think there'll ever be-- well, I don't know. I'm not holding my breath that there'll ever be a method that will just, given a problem, output a game. MALE SPEAKER: OK, I'll ask if nobody else wants to. So are you worried about the interface between these two things? Like, does the existence of these games and their popularity reduce the value of the CAPTCHAs? LUIS VON AHN: Oh, the CAPTCHA. Yeah, yeah. You can use these games to break the CAPTCHAs. Yeah, definitely. [LAUGHTER] It's good to do research that breaks each other. [LAUGHTER] MALE SPEAKER: Thanks, Luis. [APPLAUSE]
A2 Google image ahn player luis von Human Computation 390 11 wmh posted on 2014/05/28 More Share Save Report Video vocabulary