Placeholder Image

Subtitles section Play video

  • America's favorite pie is?

  • Audience: Apple. Kenneth Cukier: Apple. Of course it is.

  • How do we know it?

  • Because of data.

  • You look at supermarket sales.

  • You look at supermarket sales of 30-centimeter pies

  • that are frozen, and apple wins, no contest.

  • The majority of the sales are apple.

  • But then supermarkets started selling

  • smaller, 11-centimeter pies,

  • and suddenly, apple fell to fourth or fifth place.

  • Why? What happened?

  • Okay, think about it.

  • When you buy a 30-centimeter pie,

  • the whole family has to agree,

  • and apple is everyone's second favorite.

  • (Laughter)

  • But when you buy an individual 11-centimeter pie,

  • you can buy the one that you want.

  • You can get your first choice.

  • You have more data.

  • You can see something

  • that you couldn't see

  • when you only had smaller amounts of it.

  • Now, the point here is that more data

  • doesn't just let us see more,

  • more of the same thing we were looking at.

  • More data allows us to see new.

  • It allows us to see better.

  • It allows us to see different.

  • In this case, it allows us to see

  • what America's favorite pie is:

  • not apple.

  • Now, you probably all have heard the term big data.

  • In fact, you're probably sick of hearing the term

  • big data.

  • It is true that there is a lot of hype around the term,

  • and that is very unfortunate,

  • because big data is an extremely important tool

  • by which society is going to advance.

  • In the past, we used to look at small data

  • and think about what it would mean

  • to try to understand the world,

  • and now we have a lot more of it,

  • more than we ever could before.

  • What we find is that when we have

  • a large body of data, we can fundamentally do things

  • that we couldn't do when we only had smaller amounts.

  • Big data is important, and big data is new,

  • and when you think about it,

  • the only way this planet is going to deal

  • with its global challenges

  • to feed people, supply them with medical care,

  • supply them with energy, electricity,

  • and to make sure they're not burnt to a crisp

  • because of global warming

  • is because of the effective use of data.

  • So what is new about big data? What is the big deal?

  • Well, to answer that question, let's think about

  • what information looked like,

  • physically looked like in the past.

  • In 1908, on the island of Crete,

  • archaeologists discovered a clay disc.

  • They dated it from 2000 B.C., so it's 4,000 years old.

  • Now, there's inscriptions on this disc,

  • but we actually don't know what it means.

  • It's a complete mystery, but the point is that

  • this is what information used to look like

  • 4,000 years ago.

  • This is how society stored

  • and transmitted information.

  • Now, society hasn't advanced all that much.

  • We still store information on discs,

  • but now we can store a lot more information,

  • more than ever before.

  • Searching it is easier. Copying it easier.

  • Sharing it is easier. Processing it is easier.

  • And what we can do is we can reuse this information

  • for uses that we never even imagined

  • when we first collected the data.

  • In this respect, the data has gone

  • from a stock to a flow,

  • from something that is stationary and static

  • to something that is fluid and dynamic.

  • There is, if you will, a liquidity to information.

  • The disc that was discovered off of Crete

  • that's 4,000 years old, is heavy,

  • it doesn't store a lot of information,

  • and that information is unchangeable.

  • By contrast, all of the files

  • that Edward Snowden took

  • from the National Security Agency in the United States

  • fits on a memory stick

  • the size of a fingernail,

  • and it can be shared at the speed of light.

  • More data. More.

  • Now, one reason why we have so much data in the world today

  • is we are collecting things

  • that we've always collected information on,

  • but another reason why is we're taking things

  • that have always been informational

  • but have never been rendered into a data format

  • and we are putting it into data.

  • Think, for example, the question of location.

  • Take, for example, Martin Luther.

  • If we wanted to know in the 1500s

  • where Martin Luther was,

  • we would have to follow him at all times,

  • maybe with a feathery quill and an inkwell,

  • and record it,

  • but now think about what it looks like today.

  • You know that somewhere,

  • probably in a telecommunications carrier's database,

  • there is a spreadsheet or at least a database entry

  • that records your information

  • of where you've been at all times.

  • If you have a cell phone,

  • and that cell phone has GPS, but even if it doesn't have GPS,

  • it can record your information.

  • In this respect, location has been datafied.

  • Now think, for example, of the issue of posture,

  • the way that you are all sitting right now,

  • the way that you sit,

  • the way that you sit, the way that you sit.

  • It's all different, and it's a function of your leg length

  • and your back and the contours of your back,

  • and if I were to put sensors, maybe 100 sensors

  • into all of your chairs right now,

  • I could create an index that's fairly unique to you,

  • sort of like a fingerprint, but it's not your finger.

  • So what could we do with this?

  • Researchers in Tokyo are using it

  • as a potential anti-theft device in cars.

  • The idea is that the carjacker sits behind the wheel,

  • tries to stream off, but the car recognizes

  • that a non-approved driver is behind the wheel,

  • and maybe the engine just stops, unless you

  • type in a password into the dashboard

  • to say, "Hey, I have authorization to drive." Great.

  • What if every single car in Europe

  • had this technology in it?

  • What could we do then?

  • Maybe, if we aggregated the data,

  • maybe we could identify telltale signs

  • that best predict that a car accident

  • is going to take place in the next five seconds.

  • And then what we will have datafied

  • is driver fatigue,

  • and the service would be when the car senses

  • that the person slumps into that position,

  • automatically knows, hey, set an internal alarm

  • that would vibrate the steering wheel, honk inside

  • to say, "Hey, wake up,

  • pay more attention to the road."

  • These are the sorts of things we can do

  • when we datafy more aspects of our lives.

  • So what is the value of big data?

  • Well, think about it.

  • You have more information.

  • You can do things that you couldn't do before.

  • One of the most impressive areas

  • where this concept is taking place

  • is in the area of machine learning.

  • Machine learning is a branch of artificial intelligence,

  • which itself is a branch of computer science.

  • The general idea is that instead of

  • instructing a computer what to do,

  • we are going to simply throw data at the problem

  • and tell the computer to figure it out for itself.

  • And it will help you understand it

  • by seeing its origins.

  • In the 1950s, a computer scientist

  • at IBM named Arthur Samuel liked to play checkers,

  • so he wrote a computer program

  • so he could play against the computer.

  • He played. He won.

  • He played. He won.

  • He played. He won,

  • because the computer only knew

  • what a legal move was.

  • Arthur Samuel knew something else.

  • Arthur Samuel knew strategy.

  • So he wrote a small sub-program alongside it

  • operating in the background, and all it did

  • was score the probability

  • that a given board configuration would likely lead

  • to a winning board versus a losing board

  • after every move.

  • He plays the computer. He wins.

  • He plays the computer. He wins.

  • He plays the computer. He wins.

  • And then Arthur Samuel leaves the computer

  • to play itself.

  • It plays itself. It collects more data.

  • It collects more data. It increases the accuracy of its prediction.

  • And then Arthur Samuel goes back to the computer

  • and he plays it, and he loses,

  • and he plays it, and he loses,

  • and he plays it, and he loses,

  • and Arthur Samuel has created a machine

  • that surpasses his ability in a task that he taught it.

  • And this idea of machine learning

  • is going everywhere.

  • How do you think we have self-driving cars?

  • Are we any better off as a society

  • enshrining all the rules of the road into software?

  • No. Memory is cheaper. No.

  • Algorithms are faster. No. Processors are better. No.

  • All of those things matter, but that's not why.

  • It's because we changed the nature of the problem.

  • We changed the nature of the problem from one

  • in which we tried to overtly and explicitly

  • explain to the computer how to drive

  • to one in which we say,

  • "Here's a lot of data around the vehicle.

  • You figure it out.

  • You figure it out that that is a traffic light,

  • that that traffic light is red and not green,

  • that that means that you need to stop

  • and not go forward."

  • Machine learning is at the basis

  • of many of the things that we do online:

  • search engines,

  • Amazon's personalization algorithm,

  • computer translation,

  • voice recognition systems.

  • Researchers recently have looked at

  • the question of biopsies,

  • cancerous biopsies,

  • and they've asked the computer to identify

  • by looking at the data and survival rates

  • to determine whether cells are actually

  • cancerous or not,

  • and sure enough, when you throw the data at it,

  • through a machine-learning algorithm,

  • the machine was able to identify

  • the 12 telltale signs that best predict

  • that this biopsy of the breast cancer cells

  • are indeed cancerous.

  • The problem: The medical literature

  • only knew nine of them.

  • Three of the traits were ones

  • that people didn't need to look for,

  • but that the machine spotted.

  • Now, there are dark sides to big data as well.

  • It will improve our lives, but there are problems

  • that we need to be conscious of,

  • and the first one is the idea

  • that we may be punished for predictions,

  • that the police may use big data for their purposes,

  • a little bit like "Minority Report."

  • Now, it's a term called predictive policing,

  • or algorithmic criminology,

  • and the idea is that if we take a lot of data,

  • for example where past crimes have been,

  • we know where to send the patrols.

  • That makes sense, but the problem, of course,

  • is that it's not simply going to stop on location data,

  • it's going to go down to the level of the individual.

  • Why don't we use data about the person's

  • high school transcript?

  • Maybe we should use the fact that

  • they're unemployed or not, their credit score,

  • their web-surfing behavior,

  • whether they're up late at night.

  • Their Fitbit, when it's able to identify biochemistries,

  • will show that they have aggressive thoughts.

  • We may have algorithms that are likely to predict

  • what we are about to do,

  • and we may be held accountable

  • before we've actually acted.

  • Privacy was the central challenge

  • in a small data era.

  • In the big data age,

  • the challenge will be safeguarding free will,

  • moral choice, human volition,

  • human agency.

  • There is another problem:

  • Big data is going to steal our jobs.

  • Big data and algorithms are going to challenge

  • white collar, professional knowledge work

  • in the 21st century

  • in the same way that factory automation

  • and the assembly line

  • challenged blue collar labor in the 20th century.

  • Think about a lab technician

  • who is looking through a microscope

  • at a cancer biopsy

  • and determining whether it's cancerous or not.

  • The person went to university.

  • The person buys property.

  • He or she votes.

  • He or she is a stakeholder in society.

  • And that person's job,

  • as well as an entire fleet

  • of professionals like that person,

  • is going to find that their jobs are radically changed

  • or actually completely eliminated.

  • Now, we like to think

  • that technology creates jobs over a period of time

  • after a short, temporary period of dislocation,

  • and that is true for the frame of reference

  • with which we all live, the Industrial Revolution,

  • because that's precisely what happened.

  • But we forget something in that analysis:

  • There are some categories of jobs

  • that simply get eliminated and never come back.

  • The Industrial Revolution wasn't very good

  • if you were a horse.

  • So we're going to need to be careful

  • and take big data and adjust it for our needs,

  • our very human needs.

  • We have to be the master of this technology,

  • not its servant.

  • We are just at the outset of the big data era,

  • and honestly, we are not very good

  • at handling all the data that we can now collect.

  • It's not just a problem for the National Security Agency.

  • Businesses collect lots of data, and they misuse it too,

  • and we need to get better at this, and this will take time.

  • It's a little bit like the challenge that was faced

  • by primitive man and fire.

  • This is a tool, but this is a tool that,

  • unless we're careful, will burn us.

  • Big data is going to transform how we live,

  • how we work and how we think.

  • It is going to help us manage our careers

  • and lead lives of satisfaction and hope

  • and happiness and health,

  • but in the past, we've often looked at information technology

  • and our eyes have only seen the T,

  • the technology, the hardware,

  • because that's what was physical.

  • We now need to recast our gaze at the I,

  • the information,

  • which is less apparent,

  • but in some ways a lot more important.

  • Humanity can finally learn from the information

  • that it can collect,

  • as part of our timeless quest

  • to understand the world and our place in it,

  • and that's why big data is a big deal.

  • (Applause)

America's favorite pie is?

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it