Subtitles section Play video
- I think one of the good new applications
of data science is in the medical field.
Like in drug delivery or cancer treatment.
- I think a very interesting one
is how now companies can use all the information
they're gathering from their customers
to actually develop new products
that respond to the needs
of the customers.
- A good new application of data science
was the high trending news of Pokémon Go.
So they used Ingress.
They used data of the Ingress app.
The last app of the same company
and they choose the locations
for Pokémons and gyms
according to data from the last app.
So they learned with their errors.
- Google Search is an application of data science.
The Google Search, whenever we want to search anything.
So I think its all because of data science.
Whatever Google is now, it's all because of data science.
- Augmented reality is my favorite
new implementation of data science.
I think you can't look at a new technology
and not see data science in there
but augmented reality is the one
I'm just the most excited about.
The ability to walk around and see things on walls
or around us that aren't really there.
Pokémon's just the start.
- So what has happened is that now the tools are available
and datasets are available,
people are applying them with not much diligence
and I think one of the strange cases
which got reported in the newspapers is about the story
of a father walking into a Target store in the US
and complaining about the fact
that the Target was sending mails to his teenage daughter
about diapers and milk, baby formula.
He was angry with them.
He said, "Why would you like
"for my teenage daughter to have a baby?"
And he was obviously disturbed
by this mail or the ad campaign.
And they obviously apologized
but then the father returned two weeks later
and he apologized to them
saying he didn't know his daughter was pregnant.
Now the question is, how did Target know this thing
before the father knew.
And what has happened is that they would look
at the purchasing behavior of individuals.
So if you're buying some sort of supplements or vitamins
then you know that this is the first trimester of pregnancy.
So they know what products to send to you
assuming that the person
who bought those supplements were pregnant.
Now this is a great story about data science
and how data science can forecast and predict
these consumer behaviors
even before the family would find out.
And I find it disturbing and strange and odd
for a variety of reasons.
First of all, for every correct prediction,
you have hundreds of incorrect predictions
which we call the false positives
and no data scientist actually advertises
his or her false positives.
We only advertise and promote what we got it right.
But when we got it wrong hundreds of times we don't tell it.
Second thing is, that's an abuse of data.
That's basically not really not giving you much insight.
You've just found a correlation
but someone could be purchasing the same material
for someone else.
So, and then the odds of getting it wrong
and the odds of getting false positives is much higher.
So I find it strange and I think it gives a false sense
of our ability to predict the future.
The reality is about data science
and the most important thing
for the budding data scientist to know
that all forecasts are wrong.
They're useful but they're wrong.
And so one should not put their faith
into the fact that now that we can do predictive analytics
that we can solve all problems.
I think a good example is the Google Search.
Google published a paper saying
they can predict flu epidemics
before the Center for Disease Control.
And what they did was they were looking
at what people were searching on Google so flu symptoms.
So Google saw the flu symptom searches
before anybody else and they were able to predict it.
The thing is these searches are good
and they are correlated with some outcomes
but not necessarily all the time.
So at that time, when Google announced,
it was a big thing and everybody really like it
and well that's a new era of predictive analytics.
Only that a few years later they realized
that Google started to predict false positives.
That they were predicting things that were not really there
or the predictions were not that accurate
for a variety or reasons.
They changed probably their algorithms
and the datasets were not really correlated
with the outcomes.
So what's the lesson to learn here?
One has to avoid what we call the data hubris.
That you should not believe in your models too much
because they can lead you astray.
Data science has tremendous potential to bring change
in parts of the world, in parts of our society
that have been disenfranchised for years.
One sees great examples of data science
especially in the developing countries
where they are targeting relief efforts.
They're targeting food
and other aid to individuals,
to places that have not been targeted in the past.
And the reason it is happening now
is the greater availability of data and models and analytics
to be able to pinpoint where the greatest needs are.
The ability to design and conduct experiments
to see if one were to give micro-credits,
small loans to very poor households
in developing parts of the world,
to see how they affect
the individual household's ability to get out a poverty
and also the local community's ability
to collectively improve their economic well-being
by just very small infusions of cash or credit.
So these experiments happening all over the world
are allowing that is a direct result
of our ability to analyze data
and be able to design experiments
and then roll out humongous efforts
in providing relief, providing credit,
providing an opportunity
to those who have been disenfranchised in the past
an opportunity to join the rest of the world
in prosperity and happiness and health.