【TED】凱西-奧尼爾：盲目相信大數據的時代必須結束（The era of blind faith in big data must end | Cathy O'Neil）。 (【TED】Cathy O'Neil: The era of blind faith in big data must end (The era of blind faith in big data must end | Cathy O'Neil))

Subtitles section Play video

Algorithms are everywhere.

譯者: Lilian Chiu 審譯者: NAN-KUN WU
They sort and separate the winners from the losers.

演算法無所不在。
The winners get the job

它們能把贏家和輸家區分開來。
or a good credit card offer.

贏家能得到工作，
The losers don't even get an interview

或是好的信用卡方案。
or they pay more for insurance.

輸家連面試的機會都沒有，
We're being scored with secret formulas that we don't understand

或是他們的保險費比較高。
that often don't have systems of appeal.

我們都被我們不了解的秘密方程式在評分，
That begs the question:

且那些方程式通常都沒有申訴體制。
What if the algorithms are wrong?

問題就來了：
To build an algorithm you need two things:

如果演算法是錯的怎麼辦？
you need data, what happened in the past,

要建立一個演算法，需要兩樣東西：
and a definition of success,

需要資料，資料是過去發生的事，
the thing you're looking for and often hoping for.

還需要對成功的定義，
You train an algorithm by looking, figuring out.

也就是你在找的東西、你想要的東西。
The algorithm figures out what is associated with success.

你透過尋找和計算的方式來訓練一個演算法。
What situation leads to success?

演算法會算出什麼和成功有相關性。
Actually, everyone uses algorithms.

什麼樣的情況會導致成功？
They just don't formalize them in written code.

其實，人人都在用演算法。
Let me give you an example.

他們只是沒把演算法寫為程式。
I use an algorithm every day to make a meal for my family.

讓我舉個例子。
The data I use

我每天都用演算法來為我的家庭做飯。
is the ingredients in my kitchen,

我用的資料
the time I have,

是我廚房中的原料、
the ambition I have,

我擁有的時間、
and I curate that data.

我的野心、
I don't count those little packages of ramen noodles as food.

我把這些資料拿來做策劃。
(Laughter)

我不把那一小包小包的拉麵條視為是食物。
My definition of success is:

（笑聲）
a meal is successful if my kids eat vegetables.

我對成功的定義是：
It's very different from if my youngest son were in charge.

如果我的孩子吃了蔬菜，這頓飯就算成功。
He'd say success is if he gets to eat lots of Nutella.

但如果我的小兒子主導時一切就不同了。
But I get to choose success.

他會說，如果能吃到很多能多益（巧克力榛果醬）就算成功。
I am in charge. My opinion matters.

但我能選擇什麼才算成功。
That's the first rule of algorithms.

我是主導的人，我的意見才重要。
Algorithms are opinions embedded in code.

那是演算法的第一條規則。
It's really different from what you think most people think of algorithms.

演算法是被嵌入程式中的意見。
They think algorithms are objective and true and scientific.

這和你認為大部份人對演算法的看法很不一樣。
That's a marketing trick.

他們認為演算法是客觀的、真實的、科學的。
It's also a marketing trick

那是種行銷技倆。
to intimidate you with algorithms,

還有一種行銷技倆是
to make you trust and fear algorithms

用演算法來威脅你，
because you trust and fear mathematics.

讓你相信並懼怕演算法，
A lot can go wrong when we put blind faith in big data.

因為你相信並懼怕數學。
This is Kiri Soares. She's a high school principal in Brooklyn.

當我們盲目相信大數據時，很多地方都可能出錯。
In 2011, she told me her teachers were being scored

這位是琦莉索瑞斯，她是布魯克林的高中校長。
with a complex, secret algorithm

2011 年，她告訴我，用來評分她的老師的演算法
called the "value-added model."

是一種複雜的秘密演算法，
I told her, "Well, figure out what the formula is, show it to me.

叫做「加值模型」。
I'm going to explain it to you."

我告訴她：「找出那方程式是什麼，給我看，
She said, "Well, I tried to get the formula,

我就會解釋給你聽。」
but my Department of Education contact told me it was math

她說：「嗯，我試過取得方程式了，
and I wouldn't understand it."

但教育部聯絡人告訴我，那方程式是數學，
It gets worse.

我也看不懂的。」
The New York Post filed a Freedom of Information Act request,

還有更糟的。
got all the teachers' names and all their scores

紐約郵報提出了一項資訊自由法案的請求，
and they published them as an act of teacher-shaming.

取得有所有老師的名字以及他們的分數，
When I tried to get the formulas, the source code, through the same means,

郵報把這些都刊出來，用來羞辱老師。
I was told I couldn't.

當我試著透過同樣的手段來找出方程式、原始碼，
I was denied.

我被告知我不可能辦到。
I later found out

我被拒絕了。
that nobody in New York City had access to that formula.

我後來發現，
No one understood it.

紐約市中沒有人能取得那方程式。
Then someone really smart got involved, Gary Rubinstein.

沒有人了解它。
He found 665 teachers from that New York Post data

有個很聰明的人介入：蓋瑞魯賓斯坦。
that actually had two scores.

他發現紐約郵報資料中有 665 名老師
That could happen if they were teaching

其實有兩個分數。
seventh grade math and eighth grade math.

如果他們是在教七年級
He decided to plot them.

及八年級數學，是有可能發生。
Each dot represents a teacher.

他決定把他們用圖畫出來。
(Laughter)

每一個點代表一個老師。
What is that?

（笑聲）
(Laughter)

那是什麼？
That should never have been used for individual assessment.

（笑聲）
It's almost a random number generator.

那絕對不該被用來做個人評估用。
(Applause)

它幾乎就是個隨機數產生器。
But it was.

（掌聲）
This is Sarah Wysocki.

但它的確被用了。
She got fired, along with 205 other teachers,

這是莎拉薇沙琪，
from the Washington, DC school district,

她和其他 205 名老師都被開除了，
even though she had great recommendations from her principal

都是在華盛頓特區的學區，
and the parents of her kids.

即使她有校長及學童家長的強力推薦，
I know what a lot of you guys are thinking,

還是被開除了。
especially the data scientists, the AI experts here.

我很清楚你們在想什麼，
You're thinking, "Well, I would never make an algorithm that inconsistent."

特別是這裡的資料科學家及人工智慧專家。
But algorithms can go wrong,

你們在想：「我絕對不會寫出那麼不一致的演算法。」
even have deeply destructive effects with good intentions.

但演算法是可能出錯的，
And whereas an airplane that's designed badly

即使出自好意仍可能產生毀滅性的效應。
crashes to the earth and everyone sees it,

設計得很糟的飛機墜機，
an algorithm designed badly

每個人都會看到；
can go on for a long time, silently wreaking havoc.

可是，設計很糟的演算法，
This is Roger Ailes.

可以一直運作很長的時間，靜靜地製造破壞或混亂。
(Laughter)

這位是羅傑艾爾斯。
He founded Fox News in 1996.

（笑聲）
More than 20 women complained about sexual harassment.

他在 1996 年成立了 Fox News。
They said they weren't allowed to succeed at Fox News.

有超過二十位女性投訴性騷擾。
He was ousted last year, but we've seen recently

她們說，她們在 Fox News 不被允許成功。
that the problems have persisted.

他去年被攆走了，但我們看到近期
That begs the question:

這個問題仍然存在。
What should Fox News do to turn over another leaf?

這就帶來一個問題：
Well, what if they replaced their hiring process

Fox News 該做什麼才能改過自新？
with a machine-learning algorithm?

如果他們把僱用的流程換掉，
That sounds good, right?

換成機器學習演算法呢？
Think about it.

聽起來很好，對嗎？
The data, what would the data be?

想想看。
A reasonable choice would be the last 21 years of applications to Fox News.

資料，資料會是什麼？
Reasonable.

一個合理的選擇會是 Fox News 過去 21 年間收到的申請。
What about the definition of success?

很合理。
Reasonable choice would be,

成功的定義呢？
well, who is successful at Fox News?

合理的選擇會是，
I guess someone who, say, stayed there for four years

在 Fox News 有誰是成功的？
and was promoted at least once.

我猜是在那邊待了四年、
Sounds reasonable.

且至少升遷過一次的人。
And then the algorithm would be trained.

聽起來很合理。
It would be trained to look for people to learn what led to success,

接著，演算法就會被訓練。
what kind of applications historically led to success

它會被訓練來找人，尋找什麼導致成功，
by that definition.

在過去怎樣的申請書會導致成功，
Now think about what would happen

用剛剛的成功定義。
if we applied that to a current pool of applicants.

想想看會發生什麼事，
It would filter out women

如果我們把它用到目前的一堆申請書上。
because they do not look like people who were successful in the past.

它會把女性過濾掉，
Algorithms don't make things fair

因為在過去，女性並不像是會成功的人。
if you just blithely, blindly apply algorithms.

如果只是漫不經心、盲目地運用演算法，
They don't make things fair.

它們並不會讓事情變公平。
They repeat our past practices,

演算法不會讓事情變公平。
our patterns.

它們會重覆我們過去的做法，
They automate the status quo.

我們的模式。
That would be great if we had a perfect world,

它們會把現狀給自動化。
but we don't.

如果我們有個完美的世界，那就很好了，
And I'll add that most companies don't have embarrassing lawsuits,

但世界不完美。
but the data scientists in those companies

我還要補充，大部份公司沒有難堪的訴訟，
are told to follow the data,

但在那些公司中的資料科學家
to focus on accuracy.

被告知要遵從資料，
Think about what that means.

著重正確率。
Because we all have bias, it means they could be codifying sexism

想想那意味著什麼。
or any other kind of bigotry.

因為我們都有偏見，那就意味著，他們可能會把性別偏見
Thought experiment,

或其他偏執給寫到程式中，
because I like them:

來做個思想實驗，
an entirely segregated society --

因為我喜歡思想實驗：
racially segregated, all towns, all neighborhoods

一個完全種族隔離的社會，
and where we send the police only to the minority neighborhoods

所有的城鎮、所有的街坊都做了種族隔離，
to look for crime.

我們只會針對少數種族住的街坊派出警力
The arrest data would be very biased.

來尋找犯罪。
What if, on top of that, we found the data scientists

逮捕的資料會非常偏頗。
and paid the data scientists to predict where the next crime would occur?

如果再加上，我們找到了資料科學家，
Minority neighborhood.

付錢給他們，要他們預測下次犯罪會發生在哪裡，會如何？
Or to predict who the next criminal would be?

答案：少數種族的街坊。
A minority.

或是去預測下一位犯人會是誰？
The data scientists would brag about how great and how accurate

答案：少數族裔。
their model would be,

資料科學家會吹噓他們的的模型
and they'd be right.

有多了不起、多精準，
Now, reality isn't that drastic, but we do have severe segregations

他們是對的。
in many cities and towns,

現實沒那麼極端，但在許多城鎮和城市中，我們的確有
and we have plenty of evidence

嚴重的種族隔離，
of biased policing and justice system data.

我們有很多證據可證明
And we actually do predict hotspots,

執法和司法資料是偏頗的。
places where crimes will occur.

我們確實預測了熱點，
And we do predict, in fact, the individual criminality,

犯罪會發生的地方。
the criminality of individuals.

事實上，我們確實預測了個別的犯罪行為，
The news organization ProPublica recently looked into

個人的犯罪行為。
one of those "recidivism risk" algorithms,

新聞組織 ProPublica 近期調查了
as they're called,

「累犯風險」演算法之一，
being used in Florida during sentencing by judges.

他們是這麼稱呼它的，
Bernard, on the left, the black man, was scored a 10 out of 10.

演算法被用在佛羅里達，法官在判刑時使用。
Dylan, on the right, 3 out of 10.

左邊的黑人是伯納，總分十分，他得了十分。
10 out of 10, high risk. 3 out of 10, low risk.

右邊的狄倫，十分只得了三分。
They were both brought in for drug possession.

十分就得十分，高風險。十分只得三分，低風險。
They both had records,

他們都因為持有藥品而被逮捕。
but Dylan had a felony

他們都有犯罪記錄，
but Bernard didn't.

但狄倫犯過重罪，
This matters, because the higher score you are,

伯納則沒有。
the more likely you're being given a longer sentence.

這很重要，因為你的得分越高，
What's going on?

你就越可能被判比較長的徒刑。
Data laundering.

發生了什麼事？
It's a process by which technologists hide ugly truths

洗資料。
inside black box algorithms

它是個流程，即技術專家用黑箱作業的演算法
and call them objective;

來隱藏醜陋的真相，
call them meritocratic.

還宣稱是客觀的；
When they're secret, important and destructive,

是精英領導的。
I've coined a term for these algorithms:

我為這些秘密、重要、
"weapons of math destruction."

又有毀滅性的演算法取了個名字：
(Laughter)

「毀滅性的數學武器」。
(Applause)

（笑聲）
They're everywhere, and it's not a mistake.

（掌聲）
These are private companies building private algorithms

它們無所不在，且不是個過失。
for private ends.

私人公司建立私人演算法，
Even the ones I talked about for teachers and the public police,

來達到私人的目的。
those were built by private companies

即使是我剛談到對老師和警方用的演算法，
and sold to the government institutions.

也是由私人公司建立的，
They call it their "secret sauce" --

然後再銷售給政府機關。
that's why they can't tell us about it.

他們稱它為「秘方醬料」，
It's also private power.

所以不能跟我們討論它。
They are profiting for wielding the authority of the inscrutable.

它也是種私人的權力。
Now you might think, since all this stuff is private

他們透過行使別人無法理解的權威來獲利。
and there's competition,

你可能會認為，所有這些都是私人的，
maybe the free market will solve this problem.

且有競爭存在，
It won't.

也許自由市場會解決這個問題。
There's a lot of money to be made in unfairness.

並不會。
Also, we're not economic rational agents.

從不公平中可以賺取很多錢。
We all are biased.

且，我們不是經濟合法代理人。
We're all racist and bigoted in ways that we wish we weren't,

我們都有偏見。
in ways that we don't even know.

我們都是種族主義的、偏執的，即使我們也希望不要這樣，
We know this, though, in aggregate,

我們甚至不知道我們是這樣的。
because sociologists have consistently demonstrated this

不過我們確實知道，總的來說，
with these experiments they build,

因為社會學家不斷地用他們建立的實驗
where they send a bunch of applications to jobs out,

來展現出這一點，
equally qualified but some have white-sounding names

他們寄出一大堆的工作申請書，
and some have black-sounding names,

都有同樣的資格，但有些用白人人名，
and it's always disappointing, the results -- always.

有些用黑人人名，
So we are the ones that are biased,

結果總是讓人失望的，總是如此。
and we are injecting those biases into the algorithms

所以，我們才是有偏見的人，
by choosing what data to collect,

且我們把這些偏見注入演算法中，
like I chose not to think about ramen noodles --

做法是選擇要收集哪些資料、
I decided it was irrelevant.

比如我選擇不要考量拉麵，
But by trusting the data that's actually picking up on past practices

我決定它不重要。
and by choosing the definition of success,

但透過相信這些資料真的能了解過去的做法，
how can we expect the algorithms to emerge unscathed?

以及透過選擇成功的定義，
We can't. We have to check them.

我們如何能冀望產生的演算法未受損？
We have to check them for fairness.

不能。我們得要檢查這些演算法。
The good news is, we can check them for fairness.

我們得要檢查它們是否公平。
Algorithms can be interrogated,

好消息是，我們可以檢查它們是否公平。
and they will tell us the truth every time.

演算法可以被審問，
And we can fix them. We can make them better.

且它們每次都會告訴我們真相。
I call this an algorithmic audit,

我們可以修正它們，我們可以把它們變更好。
and I'll walk you through it.

我稱這個為演算法稽核，
First, data integrity check.

我會帶大家來了解它。
For the recidivism risk algorithm I talked about,

首先，檢查資料完整性。
a data integrity check would mean we'd have to come to terms with the fact

針對我先前說的累犯風險演算法，
that in the US, whites and blacks smoke pot at the same rate

檢查資料完整性就意味著我們得接受事實，
but blacks are far more likely to be arrested --

事實是，在美國，白人和黑人抽大麻的比率是一樣的，
four or five times more likely, depending on the area.

但黑人被逮捕的機率遠高於白人，
What is that bias looking like in other crime categories,

四、五倍高的可能性被捕，依地區而異。
and how do we account for it?

在其他犯罪類別中，那樣的偏見會如何呈現？
Second, we should think about the definition of success,

我們要如何處理它？
audit that.

第二，我們要想想成功的定義，
Remember -- with the hiring algorithm? We talked about it.

去稽核它。
Someone who stays for four years and is promoted once?

記得我們剛剛談過的僱用演算法嗎？
Well, that is a successful employee,

待了四年且升遷至少一次？
but it's also an employee that is supported by their culture.

那就是個成功員工，
That said, also it can be quite biased.

但那也是個被其文化所支持的員工。
We need to separate those two things.

儘管如此，它也可能很有偏見。
We should look to the blind orchestra audition

我們得把這兩件事分開。
as an example.

我們應該要把交響樂團的盲眼甄選
That's where the people auditioning are behind a sheet.

當作參考範例。
What I want to think about there

他們的做法是讓試演奏的人在布幕後演奏。
is the people who are listening have decided what's important

我想探討的重點是
and they've decided what's not important,

那些在聽並且決定什麼重要的人，
and they're not getting distracted by that.

他們也會決定什麼不重要，
When the blind orchestra auditions started,

他們不會被不重要的部份給分心。
the number of women in orchestras went up by a factor of five.

當交響樂團開始採用盲眼甄選，
Next, we have to consider accuracy.

團內的女性成員數上升五倍。
This is where the value-added model for teachers would fail immediately.

接著，我們要考量正確率。
No algorithm is perfect, of course,

這就是老師的加值模型立刻會出問題的地方。
so we have to consider the errors of every algorithm.

當然，沒有演算法是完美的，
How often are there errors, and for whom does this model fail?

所以我們得要考量每個演算法的錯誤。
What is the cost of that failure?

多常會出現錯誤、這個模型針對哪些人會發生錯誤？
And finally, we have to consider

發生錯誤的成本多高？
the long-term effects of algorithms,

最後，我們得要考量
the feedback loops that are engendering.

演算法的長期效應，
That sounds abstract,

也就是產生出來的反饋迴圈。
but imagine if Facebook engineers had considered that

那聽起來很抽象，
before they decided to show us only things that our friends had posted.

但想像一下，如果臉書的工程師
I have two more messages, one for the data scientists out there.

決定只讓我們看到朋友的貼文之前就先考量那一點。
Data scientists: we should not be the arbiters of truth.

我還有兩個訊息要傳遞，其一是給資料科學家的。
We should be translators of ethical discussions that happen

資料科學家，我們不應該是真相的仲裁者，
in larger society.

我們應該是翻譯者，
(Applause)

翻譯大社會中發生的每個道德討論。
And the rest of you,

（掌聲）
the non-data scientists:

至於你們其他人，
this is not a math test.

不是資料科學家的人：
This is a political fight.

這不是個數學考試。
We need to demand accountability for our algorithmic overlords.

這是場政治鬥爭。
(Applause)

我們得要求為演算法的超載負責。
The era of blind faith in big data must end.

（掌聲）
Thank you very much.

盲目信仰大數據的時代必須要結束。
(Applause)

非常謝謝。