Subtitles section Play video
If you remember that first decade of the web, it was really a static place.
還記得網路剛發跡時是個多麼平靜的地方。
You could go online, you could look at pages, and they were put up either by organizations who had teams to do it
你可以上網瀏覽網頁,當時的網站可能是由機構中的專業團隊,
or by individuals who were really tech-savvy for the time.
或精通科技的人所架設。
And with the rise of social media and social networks in the early 2000s,
20 世紀初,隨著社群媒體及社交網絡的普及,
the web was completely changed to a place where now the vast majority of content we interact with is put up by average users,
網際網路已徹底改變,現在我們互相分享的內容大多由一般使用者發佈,
either in YouTube videos or blog posts or product reviews or social media postings.
不論是 YouTube 影片、部落格文章、產品心得、或社群網站 PO 文。
And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.
網路也變得更具互動性,人們在當中彼此交流,評論與分享,不再是單純地瀏灠。
So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers.
Facebook 不是唯一具備這些功能的網站,但它的使用人數最多,且有數據可證。
Facebook has 1.2 billion users per month.
Facebook 每月使用人數有 12 億。
So half the Earth's Internet population is using Facebook.
等於全球一半的網路人口都在使用 Facebook。
They are a site, along with others, that has allowed people to create an online persona with very little technical skill,
這類型的網站讓大眾創造個人的網路形象,而且幾乎不需電腦技能,
and people responded by putting huge amounts of personal data online.
民眾因此大量上傳個人資料。
So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history.
如此一來,我們掌握數百萬人口的行為喜好、地理位置等資訊,這是史無前例的。
And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you
身為一個電腦科學家,這代表我能藉此建立模型用來推測各種潛在行為,
that you don't even know you're sharing information about.
而你們卻完全不知道自己分享的資訊透露這些訊息。
As scientists, we use that to help the way people interact online, but there's less altruistic applications,
科學家運用此機制改善線上互動的方式,但也有些對他人益處不大的應用,
and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it.
這會產生問題,因為使用者不了解這些技術及運作方式,就算了解他們也無從控制。
So what I want to talk to you about today is some of these things that we're able to do,
所以今天我要跟大家談談我們所能採取的行動,
and then give us some ideas of how we might go forward to move some control back into the hands of users.
然後進一步思考如何將控制權交回使用者手中。
So this is Target, the company.
好,這是 Target 公司。
I didn't just put that logo on this poor, pregnant woman's belly.
我不是無聊把他們的 Logo 放在這位可憐孕婦的肚子上。
on this poor, pregnant woman's belly.
放在這位可憐孕婦的肚子上
You may have seen this anecdote that was printed in Forbes magazine
你可能在富比士雜誌讀過這則故事,
where Target
讀過這則故事
where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant.
Target 把傳單寄給一位 15 歲女孩,內容是嬰兒用品的廣告和折價卷,還是在女孩告知父母自己懷孕的兩週前。
Yeah, the dad was really upset.
還用說,她爸爸氣炸了。
He said, "How did Target figure out that this high school girl was pregnant before she told her parents?"
他說:「這高中女生都還沒告訴父母,Target 是怎麼知道她已經懷孕的?」
It turns out that they have the purchase history for hundreds of thousands of customers
原來 Target 紀錄成千上萬名顧客的購買資料,
and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is.
運用這些資料計算出「懷孕指數」,不但能推測女性顧客是否懷孕,還能推估預產期。
And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes,
系統計算出這些結果不是靠顯而易見的線索,例如買嬰兒床或寶寶的衣服
but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers.
而是她比平常買更多維他命、或她買了一個包包,這個包包大的放得進尿布。
And by themselves, those purchases don't seem like they might reveal a lot,
其實商品本身所透露的訊息並不多,
but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights.
但套用在廣大群眾的生活中便歸納出一種行為模式,開始透露出某些訊息。
So that's the kind of thing that we do when we're predicting stuff about you on social media.
這就是我們正在進行的事,我們在社群網站上推測關於你的一切。
We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.
我們從廣大群眾中尋找細微的行為模式,讓我們歸納出所有相關連的事。
So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things
我跟實驗室的同事們已發展出這種機制,讓我們能精準的預測事情,
like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence,
像是你的政治傾向、人格特質、性別、性向、宗教信仰、年齡、智商
along with things like how much you trust the people you know and how strong those relationships are.
其中也包含你對他人的信任程度,以及你們之間的親密程度。
We can do all of this really well.
我們能準確地預測這些事。
And again, it doesn't come from what you might think of as obvious information.
但如我言,這些結論並非來自於明顯的線索。
So my favorite example is from this study that was published this year in the Proceedings of the National Academies.
以下是我最喜歡的例子,這項研究被刊在今年《美國國家科學院院刊》中。
If you Google this, you'll find it.
Google 一下就找得到。
It's four pages, easy to read.
篇幅只有 4 頁,淺顯易讀。
And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones.
他們針對 Facebook 上的「讚」做研究,也就是你按讚的內容,藉此預測你的屬性,還有關於你的種種。
And in their paper they listed the five likes that were most indicative of high intelligence.
這份報告指出 5 種按讚的內容是高智商的象徵。
And among those was liking a page for curly fries.
其中之一是炸薯條的網站。
Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person.
炸薯條很好吃,但喜歡炸薯條可不一定表示你比一般人聰明。
So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted?
那為什麼「讚」會變成判斷智商的重要指標呢?特別是網站內容與預測特質毫不相干?
And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this.
事實上我們必須參考許多背後的理論,以了解為何得出這樣的結果。
One of them is a sociological theory called homophily, which basically says people are friends with people like them.
其中之一是社會學理論中的「同質性」,該理論指出人們傾向與自己相似的人交往。
So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people,
所以如果你聰明,你會和聰明的人交朋友,如果你年輕,你會和年輕人交朋友,
and this is well established for hundreds of years.
這個理論數百年來廣為人知。
We also know a lot about how information spreads through networks.
我們也相當清楚資訊如何在網路上傳播。
It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks.
不論是爆紅影片、Facebook 上的讚或其它資訊,全都像傳染病一樣迅速在社群網路上散播。
So this is something we've studied for a long time. We have good models of it.
我們研究這種現象已久,並建立出可靠的模型。
And so you can put those things together and start seeing why things like this happen.
當我們將所有模型拼湊起來,就能了解這些現象發生的原因。
So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test.
如果要我來假設這是怎麼一回事,應該是有個聰明的人建立這個頁面,或是剛開始按讚的一群人當中有人的智商比較高。
And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends,
接著他的朋友會看見他按讚,依據同質性理論,他的朋友也可能是聰明人,
and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them,
看見訊息的某些人按讚,而這些人也有聰明的朋友,所以訊息又被分享出去給聰明的人,
and so it propagated through the network to kind of a host of smart people,
相同的訊息在網路上不斷複製,在聰明的人之間流傳,
so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content,
到最後幫炸薯條按讚這件事變成高智商的指標,不是因為內容,
but because the actual action of liking reflects back the common attributes of other people who have done it.
而是因為按讚的動作反映出那些按讚的人的共同特質。
So this is pretty complicated stuff, right?
很複雜對吧?
It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it?
這件事難以跟一般使用者解釋清楚,即便如此,他們又能怎麼樣呢?
How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked?
你怎麼知道你按過的讚反映出你某種特質,而這個特質卻與按讚的內容毫無關連?
There's a lot of power that users don't have to control how this data is used.
使用者所知太少以致無法掌控資訊流向。
And I see that as a real problem going forward.
我認為這將造成更大的問題。
So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used,
所以有幾個面向我們必須好好考慮,當使用者想主宰他們的資料如何被使用,
because it's not always going to be used for their benefit.
因為現在資料使用的方式並不一定對他們有利。
An example I often give is that, if I ever get bored being a professor,
我常常舉一個例子,如果我厭倦了當教授,
I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic.
我會成立一間公司來推測不同特質,像是你在團體中的工作表現、你是否吸毒、是否酗酒。
We know how to predict all that.
我們知道如何預測這些事情。
And I'm going to sell reports to H.R. companies and big businesses that want to hire you.
接著我會把預測結果賣給人力資源公司以及想要雇用你的大企業。
We totally can do that now.
這些事對我們來說輕而易舉。
I could start that business tomorrow, and you would have absolutely no control over me using your data like that.
我立刻就可以成立這樣的公司,對於我怎樣使用你的資料,你完全沒輒。
That seems to me to be a problem.
在我看來這是個問題。
So one of the paths we can go down is the policy and law path.
所以解決的途徑之一是訂定制度及法律。
And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it.
某個程度上,我認為這是最有效的方式,前提是我們要有辦法實現。
Observing our political process in action makes me think it's highly unlikely
但縱觀政治歷程,我發現這多麼遙不可及,
that we're going to get a bunch of representatives to sit down, learn about this,
我們得聚集眾多議會代表讓他們了解來龍去脈,
and then enact sweeping changes to intellectual property law in the U.S., so users control their data.
並對美國智慧財產法規進行全面性改革,使用者才有權掌控資料。
We could go the policy route, where social media companies say, "You know what? You own your data."
我們可以走訂定制度這條路,像社群媒體所宣稱「資料是你的」。
You have total control over how it's used.
你能完全控制資料如何被使。
The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way.
問題是大部分社群媒體的經營模式多少仰賴於分享及利用使用者的資料。
It's sometimes said of Facebook that the users aren't the customer, they're the product.
有人說對 Facebook 而言,使用者不是顧客,而是產品。
And so how do you get a company to cede control of their main asset back to the users?
所以怎麼可能讓一間公司把他的主要資產還給用戶呢?
It's possible, but I don't think it's something that we're going to see change quickly.
這是有可能的,但我認為不是一蹴可幾。
So I think the other path that we can go down that's going to be more effective is one of more science.
所以,我想我們能試試另一條路,這個方法將更有效也更科學。
It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place.
關鍵在於,你一開始就允許我們運用你的個人資料來建立這些機制。
And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took."
事實上,這跟我們發展這些機制時所要做的研究類似,我們會向使用者說:「你採取的行動存在風險」
"Here's the risk of that action you just took."
「你採取的行動存在風險」
By liking that Facebook page, or by sharing this piece of personal information,
藉由你在 Facebook 頁面按的讚,或你分享出去的個人資訊,
you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace.
你將增進我的能力去預測你是否吸食毒品,或你的工作順不順利。
And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether.
如此一來,我認為會影響人們分享資訊的意願,可能不會公開分享,或乾脆不在線上分享了。
We can also look at things like allowing people to encrypt data that they upload,
我們也可以試試讓使用者加密他們上傳的資料,
so it's kind of invisible and worthless to sites like Facebook or third party services that access it,
所以這些資料對 Facebook 或有權限的第三方網站而言,既不存在也沒有價值。
but that selected users who the person who posted it want to see it have access to see it.
但對張貼資訊的特定使用者而言隨時都有權限瀏覽資訊。
This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it.
從知識的角度而言,這是一個令人興奮的研究計劃,所以科學家有意願投入。
So that gives us an advantage over the law side.
比起法律途徑,這給我們更多優勢。
One of the problems that people bring up when I talk about this is, they say,
當我談及這項計畫大家會問我一個問題,他們說:
when I talk about this is, they say,
大家會問我一個問題,他們說
"You know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail."
「如果人們開始不公開他們的資料,所有你發展出來用來預測人們特質的研究方法將功虧一簣。」
And I say, "Absolutely!", and for me, that's success,
我說:「那當然!」但對我來說這是成功,
because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online.
因為身為科學家,我的目的不是推測使用者的資訊,而是改善人們線上互動的方式。
And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that.
有時這包含推測使用者的資訊,但如果使用者不同意我利用他們的資料,我認為他們有權利這麼做。
I want users to be informed and consenting users of the tools that we develop.
我希望使用者知道且認同我們研發的使用者工具。
And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies
我認為鼓勵這種技術、支持想將控制權還給用戶並遠離社群媒體的研究人員,
means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base,
意味著使用者工具進化之際,人類也往前邁進,意味著我們將擁有受過良好教育、更強大的使用者資料庫,
and I think all of us can agree that that's a pretty ideal way to go forward.
我想我們都同意這是往前邁進最理想的方式。
Thank you.
謝謝各位。