Placeholder Image

Subtitles section Play video

  • [ MUSIC ]

  • [ APPLAUSE ]

  • KARASICK: You all know...now know why Dr. Bengio did not need much of an introduction.

  • Thank you.

  • Let me also introduce Fei-Fei Li from Stanford.

  • She's head of the AI division lab at Stanford.

  • I know her has the image net lady.

  • And I guess we'll talk a little bit about image net challenges as we go through the discussion.

  • And John Smith is an IBM Fellow.

  • Long history of research in visual information retrieval, deep learning.

  • And I'm going to be a little selfish.

  • So, I'm a trained computer scientist, but I'm not an AI person

  • by training; lots of other things.

  • So, I thought this was a real opportunity to educate me.

  • So, I'm going ask questions that I am curious about in order to get the discussion started.

  • And you know, the first thing I wanted to ask the three of you is what I would call kind

  • of the hype correction sort of a question.

  • Computer science has a kind of a shiny object property that we're all pretty familiar with.

  • And every so often, a group or individual comes up with a breakthrough.

  • And everybody kind of runs over and the teeter-totter tips to the side.

  • And so, I always worry, whenever I see that, it's oh, God, what aren't people working on?

  • So, I guess I was interested, Yoshua, your talk really got to this.

  • It was a discussion about what we know how do and we're trying to know how to do.

  • And I guess I'd like to ask the three of you, maybe starting with Fei-Fei,

  • what do we think the limits of deep learning technologies are?

  • What won't be able to do?

  • You weren't here at the beginning, but Terry Sejnowski,

  • in the previous session, said don't trust anybody.

  • So, what do you think is sort of beyond in kind of technology?

  • LI: Okay. So, thank you, first of all, for the invitation.

  • And great, talk, Yoshua.

  • I think already Yoshua mentioned the law.

  • So, first of all, deep learning is a dynamic changing area of research.

  • So, it's very hard to pinpoint what exactly is deep learning.

  • In computer vision, a lot of people who, when they talk about refer to deep learning

  • and the success of deep learning is really the specific evolution on your network model,

  • architecture, that's unsupervised training with big data,

  • meaning image that mostly that does object recognition.

  • And that is a very narrow definition of deep learning.

  • So, when you ask the limitation of deep learning, one way to answer,

  • there's no limitation if deep learning keeps evolving.

  • That's a little bit of an irresponsible answer, I recognize, and just to be brief,

  • but I also want to echo Yoshua, in my opinion, the quest towards AI is especially

  • in my own area of computer vision that goes from perception to cognition to reasoning.

  • And that whole path, we have just begun to get a grasp.

  • We're doing very well with perception, thanks to data and these high capacity models,

  • but beyond the basic building blocks of perception such as speech

  • and object recognition, the next thing is really a slew of cognitive tasks

  • that we're not totally getting our hands on yet.

  • We begin to see question and answering or QA.

  • We begin to see image captions with grounding.

  • We're beginning, just beginning to see these budding areas of research

  • and down the road, how do we reason?

  • How do we reason in novel situations?

  • How do we learn to learn?

  • How do we incorporate intentions, predictions, emotions?

  • So, all those are still on the horizon.

  • KARASICK: Yoshua.

  • BENGIO: So, I will repeat what Terry said.

  • Like, until we have a mathematical proof, we don't know that isn't possible.

  • That being said, for sure, if you look at the current technology, there are challenges.

  • I don't know if there are impossibilities, but they are clearly challenges.

  • One of the challenges I've worked on for more

  • than two decades is the long-term dependencies challenges.

  • So, as soon as you start dealing with sequences, there are optimization challenges

  • and that makes it hard to learn, to train those neural nets to do their job.

  • Even for simple tasks.

  • And we've been studying this problem for 20 years.

  • We're making incredible progress, but it's still an obstacle.

  • And there are other challenges, like I mentioned some of the challenges come up in inference,

  • in [INAUDIBLE] learning that seem intractable.

  • But of course, at the same time, we know brains do a pretty good job of these tasks.

  • So, there's got to be some approximate methods, and we have already some,

  • that are doing very well on these things.

  • And that's what the research is really about.

  • KARASICK: John?

  • SMITH: Yes.

  • I think this is a good question to ask, is there hype and if there's hype, why is there hype?

  • I think the one thing that's clear is there's a lot of attention being given to deep learning.

  • But to some extent, it is warranted because performance is there.

  • And it's hard to argue against performance.

  • So, for many years, my colleagues here, Professor Bengio has worked on neural nets,

  • and it actually took a timing I think of many things at once,

  • of Fei-Fei's work on image net sort of coming at the right time with computation

  • that actually let people realize that there's a class of problems

  • that were previously very difficult,

  • like classifying 1000 object categories, we are now essentially solvable.

  • So, I think what we're seeing, some of these tasks

  • which we thought were very difficult are now solved.

  • So, image net, you know, 1,000 categories is essentially solved.

  • Some other data sets, like labeled faces in the wild,

  • which is face recognition, essentially solved.

  • So, I think it's hard to argue against that kind of performance.

  • And I think the question for us now is, what else should we do?

  • So, it is a shiny object.

  • But there's a lot more out there, at least in the vision sense.

  • I think we know very little about what the world looks like

  • or how to teach a computer what the world looks like.

  • But I think we're in a very good time now that we have this shiny object and we can think

  • about scaling it to a much larger set of tasks.

  • KARASICK: Thanks.

  • One of the...this is a good segue to something else I wanted to talk about.

  • One of the things that cause me to have one of the most fun jobs on the planet,

  • which is managing a bunch of researchers and developers building up Watson have bridging

  • out between at least frankly...what did you say?...

  • constantly changing technologies and sort of the pragmatics of those pesky customers

  • who don't want to use the system to do things.

  • One of the biggest challenges we have is this whole area we talk

  • about called real world evidence.

  • And it really is a discussion about reasoning in a particular domain.

  • So, if you are going to put a system like Watson in front of an oncologist, and we have,

  • and they're going to ask questions, they're going to get answers.

  • The first thing they're going to want to know is why.

  • Why did the linguistic inference engine decide that this particular passage, phrase, document,

  • was a better answer to the question than that one.

  • And I also get this when I ask my team about how much fun it is to debug these things,

  • and you actually are [hat on a hat] on a whatever that was is maybe a good illustration

  • of some of the challenges of really trying

  • to get underneath how these things work fundamentally.

  • So, how about this notion of why as opposed to what these things do?

  • Anybody? BENGIO: It's interesting you ask this question.

  • It's a very common question.

  • What if we had a human in front of us doing the job?

  • Sometimes a human is able to explain their choice.

  • And sometimes they're not really able to explain their choice.

  • And the way we trust that person is mostly because he does the right thing most of the time

  • or we have some reasons to believe that.

  • So, I think there will be progress in our technical abilities to figure out the why,

  • why is it taking those decisions, but it's always going

  • to be an approximation to the real thing.

  • The real thing is very complicated.

  • You have these millions of computations taking place.

  • The reason why it's making this decision is hidden in those millions of computations.

  • And it's going to be true essentially of any complex enough system.

  • So, the why is going to be an approximation, but still,

  • sometimes it can give you the queues that you need to figure it out.

  • But ultimately we can't really have a completely clearing picture of why it's doing it.

  • One thing I want to add is I think there's going to be progress in that direction

  • as we advance on the natural language side.

  • For example, think of the example I gave with the images and the sentences.

  • So, maybe you can think of the task was not

  • to actually describe the image but do something with it.

  • But now you can ask the computer about what it sees in the image,

  • even though that was not the task, to get a sense of, you know,

  • why it's getting things wrong and even ask where it was seeing these things.

  • So, we can design the system so that we can have some answers,

  • and the machine can actually talk back in English about what's going on inside.

  • LI: And just to add, I think, you know, in most of our research,

  • the interpretability is what you call why.

  • And a lot of us are making effort into that.

  • In addition to the image captioning work that both of our labs have worked on in terms

  • of not only generating the sentence but grounding back the words

  • into the spatial region where the words make sense.

  • For example, we're recently working on videos and using a lot

  • of the attention-based LSTM models and there we're looking

  • at how we can actually explain using some of these attention models

  • where actions are taking place in the temporal spatial segment of a long video.

  • So, all these attempts are trying to understand the why question

  • or at least make the model interpretable.

  • SMITH: Yes, I think the question is why is actually a very important one in applications.

  • I think particularly as we look to apply deep learning techniques in industry problems.

  • I'll give one example.

  • So, one of the areas that we're applying deep learning techniques is

  • around melanoma detection.

  • So, looking at skin cancer, looking at skin lesion images

  • and essentially training the computer based on those types of lesions.

  • And what we know is possible, actually it's...the essential value proposition

  • of a deep learning is that we can learn a representation from those images,

  • from those pixels that can be very effective for then building discrimination and so on.

  • So, we can actually get the systems to be accurate using deep learning techniques.

  • But these representations are not easy for humans to understand.

  • They're actually very different

  • from how clinicians would look at the features of those images.

  • So, around melanoma, around skin lesions in particular,

  • doctors are trained to look at sort of ABCDE.

  • Asymmetry, border, color, diameter, evolution, those kinds of things.

  • And so when our system is making some decisions about these images,

  • it's not conveying that information in ABCDE.

  • So, it actually can get to a better result in the end,

  • but it's not something that's easily consumable by that clinician,

  • ultimately who needs to make the decision.

  • So, I think we have to...we do have to think about how we're going to design these systems

  • to convey not only final classifications, but a set of information,

  • a set of features in some cases that make sense to those humans who need...

  • BENGIO: You could just train to also output...

  • SMITH: You can do that, yes, absolutely.

  • Yes. Right.

  • So, I think there are thing that can be done, but the applications may give these requirements

  • and it may influence how we use deep learning.

  • KARASICK: Yes.

  • I think there's going to be kind of a long interesting discussion as you look at the use

  • of these algorithms in regulated settings, how to characterize them in such a way

  • that the regulators are happy campers, whatever the technical term is.

  • So, let's continue on this discussion around, if you like, domains.

  • One of the things that I've seen about systems like this, you know,

  • the notion of what's an application is a function of who you are.

  • So, an application of deep learning, talk about image, speech,

  • question and answer, natural language processing.

  • When you climb up into a, if you like, an industrial domain, the things that people

  • who give IBM money understand, banks, governments, insurance companies,

  • now increasingly folks in the healthcare industry, there's really a lot of very,

  • very deep domain knowledge that we have used to train systems like Watson.

  • One of the things that's both a blessing and a curse

  • with deep learning is this...you get taken away from some of the more traditional things

  • like feature engineering that we've all seen.

  • But on the other hand, the feature engineering

  • that you see really embeds deep understanding and knowledge of the domain.

  • So, to me, and I'm pretty simple-minded about this stuff, we are going to have

  • to see how these two different worlds come together so that we can mix understanding

  • and knowledge and reasoning and a domain with the kinds of things that we're beginning to see,

  • you know, starting the classification and lifting up on deep learning.

  • So, research in this area?

  • What are people...

  • BENGIO: So, first of all, if you have features that you believe are good,

  • there's nothing that prevents you from using them as extra input.

  • KARASICK: Absolutely.

  • BENGIO: You can use the raw thing.

  • You can use your features.

  • You can use both.

  • That's perfectly fine.

  • But you have to sometimes think of it, where are you going to put them in the system.

  • But typically, there's nothing that prevents you from using them.

  • Also, researchers working on deep learning have been very creative in ways

  • of incorporating prior knowledge.

  • So, in computer vision, they could tell you

  • about the different approaches that people have used.

  • There are lots of thing we know about images, we can use essentially

  • to provide more data, more examples.

  • Like transformations of images.

  • And of course, the architectures themselves we're using by the convolutional nets,

  • they also incorporate prior knowledge.

  • And we can play with that if we have other kinds of knowledge,

  • we can sometimes change the architecture accordingly.

  • And one of the most power ways in which we can incorporate prior knowledge is

  • that we have these intermediate presentations and we can preassign meaning

  • to some of these representations.

  • You could say, well, okay, so that part of the representation is supposed

  • to capture this aspect of the problem and that part is supposed to capture this aspect

  • of the problem and we're going structure the architecture so that it really takes advantage

  • of this interpretation we're giving.

  • Even though they don't tell it precisely what the output should be for these heated units,

  • we can wire the network in such a way that it takes advantage of this in priority notion.

  • And there's a lot of work in that kind of...and many other creative ways

  • to put in...just...it's just different from the ways it's been done before.

  • KARASICK: Absolutely.

  • BENGIO: But, there are many things that can be done to put in [INAUDIBLE].

  • In fact, a lot of the papers in machine learning are about exactly doing that.

  • LI: Yes, so, I also want to add that first of all, knowledge,

  • this word knowledge doesn't come into one level.

  • If you think about [VARS] description of the visual system, you know, there's many layers.

  • I. Think there is a common misperception and it's a sexy, easy story to tell,

  • but it's kind of misguided, is that a lot of people think before the reentrance

  • or the re-success of deep learning convolution or neural network,

  • the entire field of computer vision is a bunch of us engineering features.

  • And it's true that feature engineering was a big chunk of computer vision research,

  • and that some of you might know those famous features called SIFT or HOG.

  • But really, as a field, not only we were looking at features,

  • we were also looking at other forms of knowledge.

  • For example, camera models.

  • To this day, the knowledge about perspective, about transformation are important.

  • And as you look at other aspects of knowledge, there is relationships,

  • there is physical properties of the world, there's interactions, there is materials,

  • there is affordances and there's a lot of structure.

  • So, I agree with Yoshua, a lot of our current research, even past,

  • but with a power of deep learn something to think about how we continue

  • to make models expressive or capable of encoding interesting knowledge structures

  • or acquiring knowledge structures to serve the end task.

  • And one of the things I'm excited about is these blending

  • between more expressive generative model or knowledge based or relationship based encoding

  • with the power of deep learning and so on.

  • So, it's a really rich research area.

  • KARASICK: Thank you.

  • John? SMITH: Yes, so I would agree with certainly Yoshua and Fei-Fei.

  • I do think that deep learning has brought some degree

  • of soul searching by computer division scientists.

  • I think particularly, you know, because feature engineering, actually, that was the chase.

  • That was the race.

  • Until 2012, you know, it was who has the best feature.

  • And SIFT was getting a huge amount of attention.

  • So, I think the field was really hyper focused on creating the next best feature.

  • But, I think now we know with deep learning, the computer can come up with features better

  • than the best computer vision scientists.

  • And that's even on problems like image net.

  • And yet there are many even more complex problems out there that deal with video

  • and motion and dynamic scenes where I think it's even harder for humans

  • to know what is the right feature in those situation.

  • I think there's even more potential here for deep learning to win out in the end.

  • But I think knowledge is actually coming into multiple places.

  • I think Yoshua talked about sort of transfer learning, you know, types of uses,

  • so when you train from data, those sets

  • of representations are often...they can become knowledge

  • that gets reused in other applications.

  • So, there are lots of examples of learning on image net,

  • then taking those features essentially and going and doing a completely different task

  • and actually doing very well on that task.

  • So, I think there's a lot of knowledge there in these networks that get trained.

  • And that can be stored and shared and reused and so on.

  • And I think there's still a lot of work that has to get imposed from the outside

  • in around the semantic structures.

  • So, we seem to have a very good start on objects and how they should be organized.

  • There's work happening around places.

  • But there's search other categories related to people and actions and activities and events

  • and in domains, you know, food and fashion and how to organize all of that, I think,

  • takes some human effort, but if we do that, then the computer can come take over.

  • KARASICK: Okay.

  • One more question from me and then we'll open it up to the audience.

  • It would have you -- actually, I guess you've all talked about challenges.

  • So, John, with your background in image processing and, Fei-Fei with your work

  • on image net, the same thing, Yoshua, you talked about the challenges recently.

  • Are we doing enough of those?

  • BENGIO: No.

  • KARASICK: So, throw one out.

  • BENGIO: So, so, I believe that scientific and technical progress has been much slower

  • than it could have been if we had been less obsessed by the short term objectives.

  • Since I started my [INAUDIBLE] research, I've seen this, that in my own work,

  • I've been spending too much time going for the low-hanging fruits and the big ideas

  • that require more thinking and more time have been sort of paying the price for that.

  • And I think as a community of researchers and people who explore that field,

  • I think we are still way too much into the short-term questions

  • and not enough spending time on understanding.

  • And when we understand something, we get much more powerful.

  • We can build better bridges, better AI, if we understand what we do.

  • Of course, understanding doesn't give you immediate benefits,

  • but it's really an important investment.

  • Of course you have to do both, right?

  • There's a right balance.

  • But I think especially with what's happening right now in deep learning, people are jumping

  • on it immediately to build products and that's useful but we shouldn't forget

  • that the science needs to continue because these challenges are still there.

  • KARASICK: So, Fei-Fei, image net has been probably a pretty interesting journey.

  • I mean, if you think back to where you start and where it is now, you know,

  • John says I think rightly for a thousand categories, it's essentially solved.

  • So, you're thinking about what next.

  • But, you know, do you think this really helped push the state of the art forward quickly?

  • I know it's been kind of entertaining for spectators like me, you know,

  • notwithstanding companies cheating.

  • It's always been interesting to read about.

  • But where is it going?

  • BENGIO: All right.

  • So, I agree with Yoshua.

  • I think that there's still a huge place for doing the right challenges.

  • Not for the sake, from standing where we are in that continuum,

  • not for the sake of immediate products,

  • but really pushing scientific advanced to the next stage.

  • You know, I remember in 2007, when my student and I, now a professor at Michigan, [INAUDIBLE]

  • and I started with Professor Kiley at Princeton image net,

  • we were asking ourself the question, why do we want to do this?

  • Is this really, you know, the bigger the better?

  • Because there was CalTech 101.

  • And I really remembered that we had this conversation and convinced ourselves

  • that we're going to challenge ourselves in the field with a data set of that size

  • that will have to push for new machine learning.

  • That was what our goal is.

  • I even told my student my prediction is the first users,

  • successful users of image net will be coming from machine learning community rather

  • than our own core computer vision community.

  • So, following that same kind of hypothesis, I'm putting a lot of thought right now again

  • about what we, my own lab as well as our field can do with the next set of challenges.

  • And there's some really budding interesting challenges coming up.

  • I really, really like [Alla] Institute's challenge.

  • That's very NOP-oriented about, you know,

  • they're going through the different grades of high school.

  • I think right now it's eighth grade science.

  • I think eighth grade science exam challenge where you get the Q

  • and A going, the reasoning going.

  • I think we more of that.

  • In my own lab, probably within a month or two,

  • we'll roll out...I would not call it image net 2.0.

  • We'll roll out the different data set called visual genome data set which is really going

  • after deeper knowledge structure in the image world focusing on grounding and relationships

  • and we hope that the next kind of set of challenges surrounding pictures is

  • about more focusing on relations, affordances, attributes,

  • that kind of challenge, rather than categorization.

  • And one more thing probably John can comment on is another area of challenge

  • that I hope some people can take the task of putting together is dynamic syncs.

  • I think there is a huge space to be explored in videos and either the Internet kind of videos

  • as well as videos that's more situational like robotic videos.

  • So, I believe that there's a huge opportunity still for different challenges.

  • KARASICK: So, John, I mean, you ask I work for a company where throwing challenges in front

  • of researchers is part of our DNA.

  • So, you must think about this a lot.

  • SMITH: Absolutely.

  • I think we're talking about two things here.

  • One is evaluations and the other is challenges.

  • I think with evaluations, you know,

  • we've seen that they're absolutely essential to making progress in this area.

  • Everything from what [Trek] has done, [NIST], you know, for information retrieval

  • and language translation and speech transcription

  • and image and video retrieval and so on.

  • These things have been the way to sort of gauge and measure progress

  • as these technologies develop, so we need those.

  • But, yes, I think we need also, you know, big ideas around grand challenges.

  • I think video is still ripe here.

  • I think we need a lot more progress on that.

  • And certainly I see opportunities to put things together like questioning and answering on video

  • and captioning of video and all of this.

  • I think we're starting to see.

  • I think multi-modal, multi-media data is actually going to be very important.

  • So, how do we combine the audio and the language and what we see

  • and all of these modalities together with sensors.

  • It could be inertial sensors on a mobile device to give an understanding of the world.

  • And I think where it's all heading is actually just there

  • which is how can we really completely understand the world.

  • So, not just know what objects may be where or what landmarks but really what happens where.

  • I mean, just really be able to sort of capture sense and understand

  • and model what going on in our daily lives.

  • So, lots of challenges, you know, I think the real challenge is getting enough time

  • in people and effort.

  • KARASICK: It's not boring.

  • SMITH: Yes.

  • KARASICK: So, with that, I'd like to throw it open.

  • I don't know where the microphones are.

  • Where are they?

  • They're coming.

  • In the front here.

  • >> Thank you.

  • Before asking my question, I want to echo one thing you said,

  • that the low-hanging fruit sometimes is the one that's starting to rot,

  • so you should not all be content with just helping yourselves to that.

  • My question is, I noticed something that I think is a good thing

  • and my question is whether the three of you agree or disagree or what,

  • is that the original...I don't know, purity or narrowness depending upon your point of view

  • of neural nets is starting to fade and you're incorporating all kind of ideas,

  • I think good ideas from other areas, for example in your image and caption description,

  • you get two different kinds of networks, a convolutional network

  • and another one that's more like an HMM or CRS type structure around the output side.

  • And now you're talking about multimedia where you have different kinds of inputs

  • and so you have to treat them somewhat different ways.

  • So, do you think that these sound like more hybrid approaches

  • that give power also has a drawback in terms of you have to train each part somewhat separately

  • or do you think that this is a great idea because it reflects maybe the fact

  • that different parts of a brain don't necessarily work in exactly the same way?

  • BENGIO: So, first of all, we can train these things together.

  • They don't have to be trained separately.

  • Second, they're called recurrent nets and they're very different from CRS and HMMs.

  • Now, for the bigger question you're asking, of course it is important to play

  • with these architectures and combine the appropriate types of pieces for the task.

  • If you're dealing with images, you don't necessarily want the same kind

  • of complications as if we're dealing with words.

  • Now, it's all implemented with neurons.

  • It's just that they're wired in different ways and in the case of images,

  • we use these translation variants properties of images

  • that save us training data and training time.

  • So, and will we are adding more components

  • so I mentioned the memory components, heard a little bit about it.

  • That's something that also present in the brain.

  • We have a lot of these things in the brain, and there are many more things in the brain.

  • They are we were saying it's incredibly complex, has many different kinds of components.

  • At the same time, what's interesting is that for all these components that we're currently using

  • in deep learning, they all use the same learning principle and this is probably true

  • to some extent, at least in core text that there's sort of one sort of general recipe.

  • If this is an hypothesis, we don't know.

  • And then that's really interesting because that means we can reuse

  • that principle for many types of jobs.

  • LI: I don't think I can add.

  • I totally agree with you, Yoshua.

  • And, you know, if you think about the evolution of the brain,

  • brain not...the evolution is not that pristine, right?

  • Like nature just patch up parts together, in a way.

  • So, I think there is some, you know, just some fact of life there that these kind of models,

  • whether it's the biological model here

  • or the actual computer algorithm grows in a dynamic way.

  • SMITH: Yes, I think this is in part where we're heading a Watson developer cloud which is

  • to essentially put out all of these individual trainable components that address everything

  • from language to speech to vision to question and answering.

  • And then these become the building blocks for creating potentially much more complex systems

  • and solutions, adapting to industry problems and so on.

  • So, I think we definitely see all of these pieces coming together.

  • KARASICK: Next question.

  • Who's got a microphone.

  • Anybody? Go ahead.

  • >> Yes. [INAUDIBLE] theoretical challenges that lie in truly understanding these networks.

  • I was wondering if the [INAUDIBLE] can be expanded upon as well as what kinds

  • of new mathematics going to be required to really solve these problems.

  • KARASICK: I didn't quite catch that.

  • LI: Can you repeat the question?

  • >> I didn't hear everything.

  • >> Yes. So, [INAUDIBLE] to truly understand,

  • we need to have a [INAUDIBLE] foundational mathematical understanding of the networks,

  • what kinds of new mathematics are [INAUDIBLE]

  • and [INAUDIBLE] specifically state some outstanding theoretical challenges.

  • BENGIO: I think the theoretical understanding of deep learning is really something interesting

  • and because of the success of deep learning, there are more and more mathematicians,

  • applied math people mostly, getting into this field.

  • And it's great.

  • There's something interesting with neuro net research in general,

  • which is the theory tends to come after the discoveries.

  • And because they're very complex.

  • And mostly, we move forward using our intuitions and we play with things and they work

  • and then we start asking why and maybe a different set

  • of people are able to answer these questions.

  • Right now, we're using the same old mathematics.

  • Although there are people who are answering some of the questions with aspects of math

  • that I know much less, like topology and things like that.

  • So, there's a lot of potential for understanding,

  • there's a lot of complex questions that we don't understand.

  • But there's been a lot of progress recently in theory.

  • So, we understand a lot better the problem of local [INAUDIBLE].

  • We understand a lot better the expressive flower of deep net and we understand a lot better some

  • of the probabilistic properties of different algorithms.

  • So, by far, you know, we haven't answered all the questions.

  • But there's a lot of work and a lot of progress.

  • KARASICK: So, none of you mind being characterized as experimentalists.

  • It's something I've observed too.

  • There's a lot of intuition and then...

  • >> Yes. KARASICK: ...

  • [INAUDIBLE] exactly?

  • BENGIO: Yes.

  • Intuition is crucial in this because, right now, you know, mathematical analysis has...is limited

  • in the power that...what it can predict.

  • So, a lot of it is intuitions and experiments to validate hypothesize.

  • So, there's a lot of the scientific sort of cycle of trying to propose hypothesis,

  • explain what we observe and then testing

  • for experiments rather than proving it mathematically.

  • LI: And just to add, at Stanford, you see this because first it's the people

  • as CS department making these algorithms and now there's a lot more interaction

  • between statisticians and the machine learning communities here at Stanford.

  • Some of our youngest hires, faculty hires, are actually paying the most attention from the side

  • of statistics department or even applied math looking

  • at potential theories behind these algorithms.

  • So, I predict we'll see more statisticians coming to explain some of what we do here.

  • KARASICK: Next question.

  • Over here.

  • >> [INAUDIBLE] from UC San Diego.

  • So, neural networks as the universal function approximators theoretically have been known

  • for several decades, perhaps even almost half a century by some accounts,

  • and even deep architectures have been explored perhaps even many, many decades ago.

  • And so, I guess, what I'm trying to say is seems like the idea

  • of supervised learning theoretically being solved had been well known for a long time

  • and now, thanks to competing powers and lots and lots of data, label data,

  • we're seeing amazing performances, right.

  • But I'm not sure where a similar guarantees in the unsupervised learning realm, right?

  • And I think that unsupervised learning is especially powerful in the case

  • of what we talked about earlier, this 90 percent of dark data.

  • Some...I guess we don't have that...question we don't have...has to be supervised learning.

  • So, is there any theory for the direction we need to go for unsupervised learning?

  • BENGIO: There is.

  • There is. So, first of all, regarding the universal approximation properties,

  • we have similar theorems for a number of [INAUDIBLE].

  • So, we can prove...so I worked on universal approximation theorems

  • for most machines and DBNs and RBMs.

  • And you can reuse results from the supervised learning framework

  • for other unsupervised learning methods like [INAUDIBLE] encoders and things like that.

  • So, that question is pretty much settled.

  • That being said, there are other theoretical questions to which we don't have answers

  • in unsupervised learning that have to do with these intractabilities

  • that are inherent in solving this problem.

  • So, there's a lot more to be done there for sure.

  • LI: And just to add to that, there is some evidence from nature that this might also exist.

  • I mean, the paper by Olshausen and Fields in 1996, you know, unsupervised ICA model

  • for V1 receptive field is kind of a very nice evidence to show that these kind of features

  • and emerges in just unsupervised training of natural statistics of things.

  • KARASICK: Do we have time for one more, or are we done?

  • One more, okay.

  • Make it good.

  • Last question.

  • In the back.

  • >> Make it good.

  • I don't know if it's a good question.

  • I'm mostly an outsider.

  • I am...I heard a lot of machine learning applications to classification [INAUDIBLE]

  • and whatnot, but I'm wondering if deep networks are a good match for modeling complex systems.

  • For example, to predict the reading of a sensor in a complex piece of machinery

  • to predict the trajector of something moving in a compression situation.

  • So, continuous problems rather than discrete problems.

  • KARASICK: Hell, yes.

  • Anybody? BENGIO: I could say something about this.

  • There's actually substantial working done on continuous value signals.

  • Such as handwriting, the trajectories of handwriting, such as [INAUDIBLE] signals.

  • And so and now, there's a lot of work as well in control.

  • So, I don't see any [INAUDIBLE] reason why these kinds of applications would be not feasible.

  • There's lots of evidence that, you know, some work may be needed.

  • But I don't see any problem in particular.

  • SMITH: Yes.

  • And we're definitely seeing that also looking at objects

  • in this trajectory in video is one example.

  • I think there's a lot of work around manifolds and real value data

  • and so on where there's applications.

  • So, I don't know that there's any fundamental limitation there.

  • KARASICK: Thank you.

  • So, everybody please join me in thanking the panel.

  • [APPLAUSE]

[ MUSIC ]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it