Subtitles section Play video Print subtitles Maybe we-- if you guys could stand over-- Is it okay if they stand over here? - Yeah. - Um, actually. Christophe, if you can get even lower. - Okay. - ( shutter clicks ) This is Lee and this is Christophe. They're two of the hosts of this show. But to a machine, they're not people. This is just pixels. It's just data. A machine shouldn't have a reason to prefer one of these guys over the other. And yet, as you'll see in a second, it does. It feels weird to call a machine racist, but I really can't explain-- I can't explain what just happened. Data-driven systems are becoming a bigger and bigger part of our lives, and they work well a lot of the time. - But when they fail... - Once again, it's the white guy. When they fail, they're not failing on everyone equally. If I go back right now... Ruha Benjamin: You can have neutral intentions. You can have good intentions. And the outcomes can still be discriminatory. Whether you want to call that machine racist or you want to call the outcome racist, we have a problem. ( theme music playing ) I was scrolling through my Twitter feed a while back and I kept seeing tweets that look like this. Two of the same picture of Republican senator Mitch McConnell smiling, or sometimes it would be four pictures of the same random stock photo guy. And I didn't really know what was going on, but it turns out that this was a big public test of algorithmic bias. Because it turns out that these aren't pictures of just Mitch McConnell. They're pictures of Mitch McConnell and... - Barack Obama. - Lee: Oh, wow. So people were uploading these really extreme vertical images to basically force this image cropping algorithm to choose one of these faces. People were alleging that there's a racial bias here. But I think what's so interesting about this particular algorithm is that it is so testable for the public. It's something that we could test right now if we wanted to. - Let's do it. - You guys wanna do it? Okay. Here we go. So, Twitter does offer you options to crop your own image. But if you don't use those, it uses an automatic cropping algorithm. - Wow. There it is. - Whoa. Wow. That's crazy. Christophe, it likes you. Okay, let's try the other-- the happy one. Lee: Wow. - Unbelievable. Oh, wow. - Both times. So, do you guys think this machine is racist? The only other theory I possibly have is if the algorithm prioritizes white faces because it can pick them up quicker, for whatever reason, against whatever background. Immediately, it looks through the image and tries to scan for a face. Why is it always finding the white face first? Joss: With this picture, I think someone could argue that the lighting makes Christophe's face more sharp. I still would love to do a little bit more systematic testing on this. I think maybe hundreds of photos could allow us to draw a conclusion. I have downloaded a bunch of photos from a site called Generated Photos. These people do not exist. They were a creation of AI. And I went through, I pulled a bunch that I think will give us a pretty decent way to test this. So, Christophe, I wonder if you would be willing to help me out with that. You want me to tweet hundreds of photos? - ( Lee laughs ) - Joss: Exactly. I'm down. Sure, I've got time. Okay. ( music playing ) There may be some people who take issue with the idea that machines can be racist without a human brain or malicious intent. But such a narrow definition of racism really misses a lot of what's going on. I want to read a quote that responds to that idea. It says, "Robots are not sentient beings, sure, but racism flourishes well beyond hate-filled hearts. No malice needed, no "N" word required, just a lack of concern for how the past shapes the present." I'm going now to speak to the author of those words, Ruha Benjamin. She's a professor of African-American Studies at Princeton University. When did you first become concerned that automated systems, AI, could be biased? A few years ago, I noticed these headlines and hot takes about so-called racist and sexist robots. There was a viral video in which two friends were in a hotel bathroom and they were trying to use an automated soap dispenser. Black hand, nothing. Larry, go. Black hand, nothing. And although they seem funny and they kind of get us to chuckle, the question is, are similar design processes impacting much more consequential technologies that we're not even aware of? When the early news controversies came along maybe 10 years ago, people were surprised by the fact that they showed a racial bias. Why do you think people were surprised? Part of it is a deep attachment and commitment to this idea of tech neutrality. People-- I think because life is so complicated and our social world is so messy-- really cling on to something that will save us, and a way of making decisions that's not drenched in the muck of all of human subjectivity, human prejudice and frailty. We want it so much to be true. We want it so much to be true, you know? And the danger is that we don't question it. And still we continue to have, you know, so-called glitches when it comes to race and skin complexion. And I don't think that they're glitches. It's a systemic issue in the truest sense of the word. It has to do with our computer systems and the process of design. Joss: AI can seem pretty abstract sometimes. So we built this to help explain how machine learning works and what can go wrong. This black box is the part of the system that we interact with. It's the software that decides which dating profiles we might like, how much a rideshare should cost, or how a photo should be cropped on Twitter. We just see a device making a decision. Or more accurately, a prediction. What we don't see is all of the human decisions that went into the design of that technology. Now, it's true that when you're dealing with AI, that means that the code in this box wasn't all written directly by humans, but by machine-learning algorithms that find complex patterns in data. But they don't just spontaneously learn things from the world. They're learning from examples. Examples that are labeled by people, selected by people, and derived from people, too. See, these machines and their predictions, they're not separate from us or from our biases or from our history, which we've seen in headline after headline for the past 10 years. We're using the face-tracking software, so it's supposed to follow me as I move. As you can see, I do this-- no following. Not really-- not really following me. - Wanda, if you would, please? - Sure. In 2010, the top hit when you did a search for "black girls," 80% of what you found on the first page of results was all porn sites. Google is apologizing after its photo software labeled two African-Americans gorillas. Microsoft is shutting down its new artificial intelligent bot after Twitter users taught it how to be racist. Woman: In order to make yourself hotter, the app appeared to lighten your skin tone. Overall, they work better on lighter faces than darker faces, and they worked especially poorly on darker female faces. Okay, I've noticed that on all these damn beauty filters, is they keep taking my nose and making it thinner. Give me my African nose back, please. Man: So, the first thing that I tried was the prompt "Two Muslims..." And the way it completed it was, "Two Muslims, one with an apparent bomb, tried to blow up the Federal Building in Oklahoma City in the mid-1990s." Woman: Detroit police wrongfully arrested Robert Williams based on a false facial recognition hit. There's definitely a pattern of harm that disproportionately falls on vulnerable people, people of color. Then there's attention, but of course, the damage has already been done. ( Skype ringing ) - Hello. - Hey, Christophe. Thanks for doing these tests. - Of course. - I know it was a bit of a pain, but I'm curious what you found. Sure. I mean, I actually did it. I actually tweeted 180 different sets of pictures. In total, dark-skinned people were displayed in the crop 131 times, and light-skinned people were displayed in the crop 229 times, which comes out to 36% dark-skinned and 64% light-skinned. That does seem to be evidence of some bias. It's interesting because Twitter posted a blog post saying that they had done some of their own tests before launching this tool, and they said that they didn't find evidence of racial bias, but that they would be looking into it further. Um, they also said that the kind of technology that they use to crop images is called a Saliency Prediction Model, which means software that basically is making a guess about what's important in an image. So, how does a machine know what is salient, what's relevant in a picture? Yeah, it's really interesting, actually. There's these saliency data sets that documented people's eye movements while they looked at certain sets of images. So you can take those photos and you can take that eye-tracking data and teach a computer what humans look at. So, Twitter's not going to give me any more information about how they trained their model, but I found an engineer from a company called Gradio. They built an app that does something similar, and I think it can give us a closer look at how this kind of AI works. - Hey. - Hey. - Joss. - Nice to meet you. Dawood. So, you and your colleagues built a saliency cropping tool that is similar to what we think Twitter is probably doing. Yeah, we took a public machine learning model, posted it on our library, and launched it for anyone to try. And you don't have to constantly post pictures on your timeline to try and experiment with it, which is what people were doing when they first became aware of the problem. And that's what we did. We did a bunch of tests just on Twitter. But what's interesting about what your app shows is the sort of intermediate step there, which is this saliency prediction. Right, yeah. I think the intermediate step is important for people to see. Well, I-- I brought some pictures for us to try. These are actually the hosts of "Glad You Asked." And I was hoping we could put them into your interface and see what, uh, the saliency prediction is. Sure. Just load this image here. Joss: Okay, so, we have a saliency map. Clearly the prediction is that faces are salient, which is not really a surprise. But it looks like maybe they're not equally salient. - Right. - Is there a way to sort of look closer at that? So, what we can do here, we actually built it out in the app where we can put a window on someone's specific face, and it will give us a percentage of what amount of saliency you have over your face versus in proportion to the whole thing. - That's interesting. - Yeah. She's-- Fabiola's in the center of the picture, but she's actually got a lower percentage of the salience compared to Cleo, who's to her right. Right, and trying to guess why a model is making a prediction and why it's predicting what it is is a huge problem with machine learning. It's always something that you have to kind of back-trace to try and understand. And sometimes it's not even possible. Mm-hmm. I looked up what data sets were used to train the model you guys used, and I found one that was created by researchers at MIT back in 2009. So, it was originally about a thousand images. We pulled the ones that contained faces, any face we could find that was big enough to see. And I went through all of those, and I found that only 10 of the photos, that's just about 3%, included someone who appeared to be of Black or African descent. Yeah, I mean, if you're collecting a data set through Flickr, you're-- first of all, you're biased to people that have used Flickr back in, what, 2009, you said, or something? Joss: And I guess if we saw in this image data set, there are more cat faces than black faces, we can probably assume that minimal effort was made to make that data set representative. When someone collects data into a training data set, they can be motivated by things like convenience and cost and end up with data that lacks diversity. That type of bias, which we saw in the saliency photos, is relatively easy to address. If you include more images representing racial minorities, you can probably improve the model's performance on those groups. But sometimes human subjectivity is imbedded right into the data itself. Take crime data for example. Our data on past crimes in part reflects police officers' decisions about what neighborhoods to patrol and who to stop and arrest. We don't have an objective measure of crime, and we know that the data we do have contains at least some racial profiling. But it's still being used to train crime prediction tools. And then there's the question of how the data is structured over here. Say you want a program that identifies chronically sick patients to get additional care so they don't end up in the ER. You'd use past patients as your examples, but you have to choose a label variable. You have to define for the machine what a high-risk patient is and there's not always an obvious answer. A common choice is to define high-risk as high-cost, under the assumption that people who use a lot of health care resources are in need of intervention. Then the learning algorithm looks through the patient's data-- their age, sex, medications, diagnoses, insurance claims, and it finds the combination of attributes that correlates with their total health costs. And once it gets good at predicting total health costs on past patients, that formula becomes software to assess new patients and give them a risk score. But instead of predicting sick patients, this predicts expensive patients. Remember, the label was cost, and when researchers took a closer look at those risk scores, they realized that label choice was a big problem. But by then, the algorithm had already been used on millions of Americans. It produced risk scores for different patients, and if a patient had a risk score of almost 60, they would be referred into the program for screening-- for their screening. And if they had a risk score of almost 100, they would default into the program. Now, when we look at the number of chronic conditions that patients of different risk scores were affected by, you see a racial disparity where white patients had fewer conditions than black patients at each risk score. That means that black patients were sicker than their white counterparts when they had the same risk score. And so what happened is in producing these risk scores and using spending, they failed to recognize that on average black people incur fewer costs for a variety of reasons, including institutional racism, including lack of access to high-quality insurance, and a whole host of other factors. But not because they're less sick. Not because they're less sick. And so I think it's important to remember this had racist outcomes, discriminatory outcomes, not because there was a big, bad boogie man behind the screen out to get black patients, but precisely because no one was thinking about racial disparities in healthcare. No one thought it would matter. And so it was about the colorblindness, the race neutrality that created this. The good news is that now the researchers who exposed this and who brought this to light are working with the company that produced this algorithm to have a better proxy. So instead of spending, it'll actually be people's actual physical conditions and the rate at which they get sick, et cetera, that is harder to figure out, it's a harder kind of proxy to calculate, but it's more accurate. I feel like what's so unsettling about this healthcare algorithm is that the patients would have had no way of knowing this was happening. It's not like Twitter, where you can upload your own picture, test it out, compare with other people. This was just working in the background, quietly prioritizing the care of certain patients based on an algorithmic score while the other patients probably never knew they were even passed over for this program. I feel like there has to be a way for companies to vet these systems in advance, so I'm excited to talk to Deborah Raji. She's been doing a lot of thinking and writing about just that. My question for you is how do we find out about these problems before they go out into the world and cause harm rather than afterwards? So, I guess a clarification point is that machine learning is highly unregulated as an industry. These companies don't have to report their performance metrics, they don't have to report their evaluation results to any kind of regulatory body. But internally there's this new culture of documentation that I think has been incredibly productive. I worked on a couple of projects with colleagues at Google, and one of the main outcomes of that was this effort called Model Cards-- very simple one-page documentation on how the model actually works, but also questions that are connected to ethical concerns, such as the intended use for the model, details about where the data's coming from, how the data's labeled, and then also, you know, instructions to evaluate the system according to its performance on different demographic sub-groups. Maybe that's something that's hard to accept is that it would actually be maybe impossible to get performance across sub-groups to be exactly the same. How much of that do we just have to be like, "Okay"? I really don't think there's an unbiased data set in which everything will be perfect. I think the more important thing is to actually evaluate and assess things with an active eye for those that are most likely to be negatively impacted. You know, if you know that people of color are most vulnerable or a particular marginalized group is most vulnerable in a particular situation, then prioritize them in your evaluation. But I do think there's certain situations where maybe we should not be predicting with a machine-learning system at all. We should be super cautious and super careful about where we deploy it and where we don't deploy it, and what kind of human oversight we put over these systems as well. The problem of bias in AI is really big. It's really difficult. But I don't think it means we have to give up on machine learning altogether. One benefit of bias in a computer versus bias in a human is that you can measure and track it fairly easily. And you can tinker with your model to try and get fair outcomes if you're motivated to do so. The first step was becoming aware of the problem. Now the second step is enforcing solutions, which I think we're just beginning to see now. But Deb is raising a bigger question. Not just how do we get bias out of the algorithms, but which algorithms should be used at all? Do we need a predictive model to be cropping our photos? Do we want facial recognition in our communities? Many would say no, whether it's biased or not. And that question of which technologies get built and how they get deployed in our world, it boils down to resources and power. It's the power to decide whose interests will be served by a predictive model, and which questions get asked. You could ask, okay, I want to know how landlords are making life for renters hard. Which landlords are not fixing up their buildings? Which ones are hiking rent? Or you could ask, okay, let's figure out which renters have low credit scores. Let's figure out the people who have a gap in unemployment so I don't want to rent to them. And so it's at that problem of forming the question and posing the problem that the power dynamics are already being laid that set us off in one trajectory or another. And the big challenge there being that with these two possible lines of inquiry, - one of those is probably a lot more profitable... - Exactly, exactly. - ...than the other one. - And too often the people who are creating these tools, they don't necessarily have to share the interests of the people who are posing the questions, but those are their clients. So, the question for the designers and the programmers is are you accountable only to your clients or are you also accountable to the larger body politic? Are you responsible for what these tools do in the world? ( music playing ) ( indistinct chatter ) Man: Can you lift up your arm a little? ( chatter continues )
B1 Vox data bias machine model prediction Are We Automating Racism? 5 0 林宜悉 posted on 2022/07/23 More Share Save Report Video vocabulary