Placeholder Image

Subtitles section Play video

  • [ MUSIC ]

  • [ APPLAUSE ]

  • BENGIO: Thank you.

  • All right.

  • Thank you for being here and participating in this colloquium.

  • So, I'll tell you about some of the things that are happening in deep learning,

  • but I only have 30 minutes so I'll be kind of quickly going through some subjects

  • and some challenges for scaling up deep learning towards AI.

  • Hopefully you'll have chances to ask me some questions during the panel that follows.

  • One thing I want to mention is I'm writing a book.

  • It's called Deep Learning, and you can already download most of the chapters.

  • These are draft versions of the chapters from my web page.

  • It's going to be an MIT Press book hopefully next year.

  • So, what is deep learning and why is everybody excited about it?

  • First of all, deep learning is just an approach to machine learning.

  • And what's particular about it, as Terry was saying, it's inspired by brains.

  • Inspired, we're trying to understand some of the principles, computational

  • and mathematical principles that could explain the kind of intelligence based

  • on learning that we see in brains.

  • But from a computer science perspective,

  • the idea is that these algorithms learn representations.

  • So, representations is a central concept in deep learning, and, of course,

  • the idea of learning representations is not new.

  • It was part of the deal of the original neural nets,

  • like the Boltzmann machine and the back prop from the '80s.

  • But what's new here and what happened about ten years ago is a breakthrough that allowed us

  • to train deeper neural networks, meaning that have multiple levels of representation.

  • And why is that interesting?

  • So already I mentioned that there are some theoretical results showing

  • that you can represent some complicated functions that are the result of the many levels

  • of compositions efficiently with these deep networks, whereas you might --

  • or in general, you won't be able to represent these kinds of functions

  • with a shallow network that doesn't have enough levels.

  • What does it mean to have more depth?

  • It means that you're able to represent more abstracts concepts,

  • and these more abstract concepts allow these machines to generalize better.

  • So, that's the essence of what's going on here.

  • All right.

  • So, the breakthrough happened in 2006 where, for the first time,

  • we were able to train these deeper networks and we used unsupervised learning for that,

  • but it took a few years before these advances made their way

  • to industry and to large scale applications.

  • So, it started around 2010 with speech recognition.

  • By 2012, if you had an Android phone, like this one, well,

  • you had neural nets doing speech recognition in them.

  • And now, of course, it's everywhere.

  • For speech, it's changed the field of speech recognition.

  • Everything uses it, essentially.

  • Then about two years later, 2012, there was another breakthrough using convolution networks,

  • which are a particular kind of deep networks that had been around for a long time

  • but that have been improved using some

  • of the techniques we discovered along these -- in recent years.

  • Really allowed us to make big impact in the field of computer vision

  • and object recognition, in particular.

  • So, I'm sure [Faye Faye] will say a few words later about that event and then the role

  • of the image net dataset in this.

  • But what's going on now is that neural nets are going beyond their traditional realm

  • of perception and people are exploring how to use them for understanding language.

  • Of course, we haven't yet solved that problem.

  • This is where a lot of the action is now and, of course,

  • continues a lot of research and R&D and computer vision.

  • Now, for example, expanding to video and many other areas.

  • But I'm particularly interested in the extension of this field in natural language.

  • There are other areas.

  • You've heard about reinforcement learning.

  • There is a lot of action there, robotics, control.

  • So, many areas of AI are now more and more seeing the potential gain coming

  • from using these more abstract systems.

  • So, today, I'm going to go through three of the main challenges that I see

  • for bringing deep learning, as we know it today, closer to AI.

  • One of them is computational.

  • Of course, for a company like IBM and other companies

  • that build machines, this is an important challenge.

  • It's an important challenge because what we've observed is

  • that the bigger the models we are able to train,

  • given the amount of data we currently have, the better they are.

  • So, you know, we just keep building bigger models

  • and hopefully we're going to continue improving.

  • Now, that being said, I think it's not going to be enough so there are other challenges.

  • One of them I mentioned has to do with understanding language.

  • But understanding language actually requires something more.

  • It requires a form of reasoning.

  • So, people are starting to use these recurrent nets you heard about, recurrent networks

  • that can be very deep, in some sense, when you consider time in order

  • to combine different pieces of evidence, in order to provide answers to questions.

  • And essentially, displayed in different forms of reasoning.

  • So, I'll say a few words about that challenge.

  • And finally, maybe one of the most important challenges that's maybe more fundamental even is

  • the unsupervised learning challenge.

  • Up to now, all of the industrial applications of deep learning have exploited supervised learning

  • where we have labeled the data we've said in that image, it's a cat.

  • In that image, there's a desk, and so on.

  • But there's a lot more data we could take advantage of that's unlabeled,

  • and that's going to be important because all of na information we need to build these AIs has

  • to come from somewhere, and we need enough data, and most of it is not going to be labeled.

  • Right. So, as I mentioned, and I guess as my colleague,

  • Ilya Sutskever from Google keeps saying, bigger is better.

  • At least up to now, we haven't seen the limitations.

  • I do believe that there are obstacles, and bigger is not going to be enough.

  • But clearly, there's an easy path forward with the current algorithms just

  • by making our neural nets a hundred times faster and bigger.

  • So, why is that?

  • Basically, what I see in many experiments with neural nets right now is that they --

  • I'm going to use some jargon here.

  • They under fit, meaning that they're not big enough or we don't train them long enough

  • for them to exploit all of the information that there is in the data.

  • And so they're not even able to learn the data by heart, right,

  • which is the thing we usually want to avoid in machine learning.

  • But that comes almost for free with these networks, and so we just have to press

  • on the pedal of more capacity and we're almost sure to get an improvement here.

  • All right.

  • To just illustrate graphically that we have some room to approach the size of human brains,

  • this picture was made up by my former student, Ian Goodfellow, where we see the sizes

  • of different organisms and neural nets over the years so the DBN here was from 2006.

  • Of the AlexNet is the breakthrough network of 2012 for computer vision,

  • and the AdamNet is maybe a couple of years old.

  • So, we see that the current technology is maybe between a bee and a frog in terms of size

  • of the networks for about the same number of synapses.

  • So, we've almost reached the kind of average number of synapses you see in natural brains,

  • between a thousand and ten thousand.

  • In terms of number of neurons, we're several orders of ranking away.

  • So, I'm going to tell you a little bit about a stream of research we've been pushing in my lab,

  • which is more connected to the computing challenge and potentially part

  • of our implementation, which is can we train neural nets that have very low precision.

  • So, we had a first paper at ICLR.

  • By the way, ICLR is the deep learning conference, and it happens every year now.

  • Yann Lecun and I started it in 2013 and it's been an amazing success

  • that year and every year since then.

  • We're going to have a third version next May.

  • And so we wanted to know how many bits do you actually require.

  • Of course, people have been asking these kinds of questions for decades.

  • But using sort of the current state of the art neural nets and we found 12,

  • and I can show you some pictures how we got these numbers on different data sets

  • and comparing different ways of representing numbers with fixed point or dynamic fixed point.

  • And also, depending on where I use those bits, you actually need less bits

  • in the activations than in the weights.

  • So, you need more rescission in the weights.

  • So, that was the first investigation.

  • But then we thought -- so that's the --

  • for the weights, that's the number of bits you actually need to keep the information

  • that you are accumulating from many examples.

  • But when you actually run your system during training, especially,

  • maybe you don't need all those bits.

  • Maybe you can get the same effect by introducing noise

  • and discretizing randomly those weights to plus one or minus one.

  • So, that's exactly what we did.

  • The idea is -- the cute idea here is that we can replace a real number by a binary number

  • that has the same expected value by, you know, sampling those two values with a probability

  • such as that the expected value is the correct one.

  • And now, instead of having a real number to multiply,

  • we have a bit to multiply, which is easy.

  • It's just an addition.

  • And why would we do that?

  • Because we want to get rid of multiplications.

  • Multiplications is what takes up most of the surface area on chips for doing neural nets.

  • So, we had a first try at this, and this is going to be presented at the next NIPS

  • in the next few weeks in Montreal.

  • And it allows us to get rid of the multiplications in the feed forward computation

  • and in the backward computation where we compute gradients.

  • But we remained with the multiplication -- even if you discretize the weights,

  • there is another multiplication at the end of the back prop

  • where you multiply -- you don't multiply weights.

  • You multiply activations and gradients.

  • So, if those two things are real valued, you still need regular multiplication.

  • So, we -- yes, so that's going to be in the NIPS paper.

  • But the new thing we did is to get rid of that last multiplication that we need for the update

  • of the weight, so the delta W is a change in the weights,

  • DC DA is the gradient that's propagated back, and H is the activations.

  • It's some jargon.

  • But anyway, we have to do this multiplication, and so, well, the only thing we need

  • to do is take one of these two numbers and replace it again by a stochastic quantity

  • that is not going to require multiplication.

  • So, instead of binarizing it, we quantize it stochastically to its mantissa.

  • In other words, we get rid of -- to its exponent.

  • We get rid of the mantissa.

  • In other words, we represent it, we -- we represent it in a log scale.

  • So, if you do that, again, you can map the activations

  • to some values that are just powers of two.

  • And now multiplication is just addition.

  • This is an old trick.

  • I mean, the trick of using powers of two is an old trick.

  • The new trick is to do this stochastically so that you actually get the right things

  • in average and stochastic gradient works perfectly fine.

  • And so we're running some experiments on a few data sets showing that you get a bit

  • of a slowdown because of the extra noise.

  • But so the green and yellow curve here are where this strict with binarized weights

  • and quantized, stochastically quantize the calculations.

  • And the good news is, well, it learns even better, actually,

  • because this noise acts as a regularizer.

  • Now, this -- yes, this is pretty good news.

  • Now, why is this interesting?

  • It's interesting because we can probably -- for two reasons.

  • One is for hardware implementations, this could be useful.

  • The other reasons is that it connects with what the brain -- with spikes, right.

  • So the idea with -- you can think of, if I go back here, when you replace activations

  • by some stoke tick binary values that have the right expected value, you're introducing noise.

  • But you're actually not changing that much the computation of the gradient.

  • And so it would be reasonable for brains to use the same trick

  • if they could save on the hardware side.

  • Okay. So now let me move on to my second challenge, which has to do with language and,

  • in particular, language understanding.

  • There's a lot of work to do in this direction,

  • but the progress in the last few years is pretty impressive.

  • Actually, I was part of the beginning of that process of extending the realm

  • of application of neural networks to language.

  • So, in 2000, we had a NIPS paper where we introduced the idea of learning

  • to represent probability distributions over sequences of words.

  • In other words, being able to generate sequences of words that look like English

  • by decomposing the problem in two parts.

  • That's a kind of a central element that you find in neural nets and especially in deep learning,

  • which is think of the problem not as going directly from inputs to outputs,

  • but breaking the problem into two parts.

  • One is the representation part.

  • So, learning to represent words here by mapping each word to a fixed size, real valued vector.

  • And then taking those representations and mapping them to the answers you care about.

  • And here, that's predicting the next word.

  • It turned out that those representations of words

  • that we learned have incredibly nice properties and they capture a lot

  • of the semantic aspects of words.

  • And there's been tons and tons of papers

  • to analyze these things, to use them in applications.

  • So, these are called word vectors, word embeddings, and they're used all over the place

  • and becoming like commonplace in natural language processing.

  • In the last couple of years, there's been a kind of an exciting observation

  • about these word embeddings, which is that they capture analogies,

  • even though they were not programmed for that.

  • So, what do I mean?

  • What I mean is that if you take the vector which is for each word and you do operations on them,

  • like subtract and add them, you can get interesting things coming up.

  • So, for example, if you take the vector for queen and you subtract the vector for king,

  • you get a new vector, and that vector is pretty much aligned with the vector that you get

  • from subtracting the representation for woman from the representation for man.

  • So, that means that you could do something like woman minus man, plus king and get queen, right.

  • So, it can answer the question, you know,

  • what is to king what woman is to man, and it would find queen.

  • So, that's interesting, and there is some nice explanations that we're starting

  • to understand why this is happening.

  • Basically, directions in that space of representations correspond to attributes

  • that have been discovered by the machine.

  • So, here, the difference between man and woman, they have all the same attributes somehow,

  • in some semantic space, except for gender.

  • The same is true for queen and king.

  • They have lots of different attributes, but they essentially have all the same except for gender.

  • So, when you subtract them, the only thing you get in your hand is the direction for gender.

  • Okay. So the progress with representing the meaning of words has been really amazing.

  • But, of course, this is by no means sufficient to understand language.

  • So, the next stage has been, well, can we represent the meaning of sentences or phrases.

  • And in my group, we worked on machine translation as a case study to see

  • if we could bring up that power of representation that we've seen

  • in those language models to a task

  • that was a bit more challenging from a semantic point of view.

  • And I guess the thing we're doing now, and many other groups are also doing,

  • is pushing that to an even harder semantic task, which is question answering.

  • In other words, read a sentence or read a paragraph or a document and then read a question

  • and then generate a natural language in answer.

  • So, it's a bit more challenging, but you can see that it's a kind of translation as well.

  • You have a sequence in input and you produce a sequence in output.

  • In fact, we used very similar techniques.

  • So, now let me tell you about that machine translation approach

  • that we created about a year and a half ago.

  • And it uses these recurrent networks that you've heard about,

  • because as soon as you start dealing with sequences, it's kind of the natural thing to do.

  • It uses something fairly new that has been incredibly successful in the field

  • in the last year, which is the idea

  • of introducing attention mechanisms within the computation.

  • So, sometimes we think of attention as, like, visual attention, so deciding where to look.

  • But here we're talking about a different kind of attention.

  • It's a kind of internal attention.

  • So, choosing which parts of your neural network are you going to be paying attention to.

  • And here, let me go through this architecture a little bit.

  • What's going on is -- do I have a pointer?

  • All right.

  • You have an input sentence in English, say, and there's a recurrent net that reads it,

  • meaning that it sees one word at a time.

  • As it goes through it, it builds a representation of the words that it has seen.

  • Actually, there are two recurrent nets, one reading from left to right

  • and the other from right to left.

  • Then at each position, you have a representation of what's going on around that word.

  • So, that's the reading network.

  • Then there is a writing -- an output network, which is going to produce a sequence of words.

  • More precisely, it's going to produce a probability distribution for each word

  • in the vocabulary at each stage and then we're going to pick, according to the distribution,

  • we're going to pick the next word.

  • The choice of that word is going to condition the computation for the next stage.

  • The state of the network is going to be different,

  • depending on what words you've said before.

  • And that whole output sequence is going to be influenced by what we have read, of course,

  • because we want to translate the input sequence.

  • Now, the way that that input sequence and that output sequence are related is important.

  • That's where the attention mechanism comes in.

  • Because when you're doing translation, for example,

  • the input sequence has a different length from the output sequence.

  • So, which word or which part of the sequence here corresponds to which part

  • in the output sequence, that's the question

  • that the attention mechanism is helping us figure out.

  • And we found a way to do that doing a mechanism that allows

  • to us train using normal techniques with back prop.

  • We can compute exact gradients to this process.

  • And the idea that is for each position in the output sequence,

  • our network looks in the input sequence at all possible positions and computes a weight.

  • And it's going to multiply the representation it's getting at each position by that weight

  • to form a linear combination which is going to be a context that's going

  • to drive the update at the next stage.

  • So, in a sense, you're choosing where to look at each stage

  • to decide what the next word is going to be.

  • So, this has actually worked incredibly well.

  • And in the space of one year, we went

  • from dismal performance to state of the art performance.

  • And at the last WMT, 2015, we got the first place on two of the language pairs,

  • English to German and English to Czech.

  • And now there's like a bunch of groups around the world

  • that are pushing these kinds of systems.

  • So, this is kind of a new way of doing machine translation, which is very,

  • very different in nature from the state of the art that's been around for 20 years.

  • So, the next thing we did is use the same, almost the same code for translating not

  • from English to French but from -- or from French to English, but from image to English.

  • So, the idea is, it's almost the same architecture, except that instead

  • of having a recurrent network that reads the French sentence,

  • we have what's called a convolutional net that we've heard about that looks at the image

  • and computes for each location or for each block

  • of pixels a feature vector, a gain or representation.

  • Similarly that we had representations for words,

  • now we have representations for parts of the image.

  • And then the attention mechanism, as it generates the words in the sentence

  • that it's producing, at each stage chooses where to look in the image.

  • So, Terry showed you some pictures from my lab.

  • You've seen this.

  • And what we see with each pair of images is on the left,

  • the image that the system sees an input.

  • On the right, we see where it's putting its attention for a particular word.

  • That's the word that's underlined.

  • So, when it says little girl, when it says girl,

  • we see that it's putting attention around the face of the girl.

  • The other one, on top, for example, a woman is throwing a frisbee in the park.

  • So, the underlined word is frisbee, and we show the second image in the pair

  • where it's putting its attention in the image.

  • So, these are cases where it works quite well.

  • But it wouldn't be fair if I only showed you those cases.

  • I need to show you those where it fails.

  • So, here are examples where it fails.

  • That's where we learn the most.

  • First of all, you realize immediately that we haven't solved the eye,

  • and that it's making mistakes both on the visual side and on the language side.

  • So, on the visual side, you see things like on the top left, it thinks that it's a bird.

  • It's two giraffes.

  • Maybe if you squint you can think it's a bird.

  • On the second one, it thinks that the round shape on the shirt is a clock, which, you know,

  • again, if you squint, you might think it's a clock.

  • Now, the third one is totally crazy.

  • A man wearing a hat and a hat on a skateboard.

  • So, it's wrong visually.

  • It's wrong, you know, linguistically.

  • You wouldn't do a hat on a hat, and so on.

  • So, it's fun and instructive to use these attention mechanisms

  • to understand what's going on inside the machine.

  • To see, you know, at each step of the computation, what was it paying attention to.

  • So, it's pretty interesting.

  • Now, it turns out that this attention mechanism is at the part of another revolution that going

  • on right now in deep learning that has to do with the notion of memory

  • that Terry also mentioned during the panel.

  • And neural nets up to recently have been considered as purely sort

  • of pattern recognition devices that go from input to output.

  • As soon as you start thinking about dealing with reasoning and sequential processing,

  • comes the idea that it would be nice to have a short-term memory or even a long-term memory

  • that is different from the straight sort of kind of representation building computation

  • that we have in those feed forward neural nets.

  • So, the idea is that in addition to the recurrent net

  • that does the usual computation, we have a memory.

  • So, here, each of the cells, think of it as a memory cell.

  • A memory needs simple concepts like where are you going to be reading and writing

  • and what are you going to be reading and writing.

  • So, we can generalize these concepts to neural nets that you can join by back prop by saying

  • that at each time stamp, you basically have a different probability of choosing where to read

  • and where to write and then you're going to put something there

  • with some weight that's proportional for that probability.

  • So, these kinds of systems, they started less than a year ago at about the same time

  • from a group in Facebook and a group at DeepMind using the same kind of attention mechanism

  • that we had proposed just a few months earlier.

  • And so they're able to do things like this, like read sentences like this and answer questions.

  • So, Joe went to the garden and Fred picked up the milk.

  • Joe moved to the bathroom and Fred dropped the milk and then Dan moved to the living room.

  • Where is Dan?

  • You're not supposed to read the answer.

  • Or other things like -- I have other examples down there, like Sam walks into the kitchen.

  • Sam picks up an apple.

  • Sam walks to the bedroom.

  • Sam drops the apple.

  • Where is the apple.

  • So, these are the kinds of things we're able to do now.

  • Of course, these are toy problems.

  • But it's not something we would imagine just a few years ago

  • that neural nets would be able to do.

  • So, by using recurrence and by using new architectures that allow these recurrent nets

  • to keep information for a longer time,

  • so dealing with this challenge that's called long-term dependencies,

  • we're able to push the scope of applications

  • of deep learning well beyond what was thought possible just a few years ago.

  • So, in my lab, we're working on using these ideas for knowledge extraction.

  • So, the idea is to be able to read pages in Wikipedia and fill that memory

  • with representations, semantic representations for nuggets

  • of fact can be then used to answer questions.

  • Of course, if we can do that, that would be extremely useful.

  • Yes. I'm going to skip that and just use a little bit of time for the last challenge,

  • which is maybe the most difficult one and has to do

  • with how computers could form these abstractions without being told ahead of time a lot

  • of the details of what they should be in the first place.

  • So, that's what unsupervised learning is about.

  • And I mentioned that unsupervised learning is important because we can take advantage of all

  • of the knowledge implicitly stored in lots and lots

  • of data hasn't been tagged and labeled by humans.

  • But there are also reasons why it could be interesting

  • for other applications in machine learning.

  • For example, in the case of structured outputs where you want the machine to produce something

  • that is not a yes or a no, or it's not a category,

  • but it's something more complicated, like an image.

  • Maybe you want to transform an image or you want to produce a sentence like you've seen before.

  • It's also interesting because if you start thinking

  • about how machines could eventually reach the kind of level of performance of humans,

  • we have to admit that in terms of learning ability, we're very, very far from humans.

  • Humans are able to learn from very few examples, new tasks.

  • Right now, if you take a machine learning system out of the box, it's going to take --

  • it's going to need, depending on the task, maybe tens of thousands or hundreds of thousands

  • or millions of examples before you get a decent performance.

  • Humans can learn a new task with just a handful

  • for sometimes even a single example or even zero examples.

  • You don't even give them an example.

  • You give them the linguistic description of the task, right.

  • So, we're thinking, you know, what are plausible ways that we could address this,

  • and it all has to do with the notion of representation that's been central

  • to what I've been telling you about.

  • And now, we're thinking about how those representations become meaningful

  • as explanations for the data.

  • In other words, what are the explanatory factors that explain the variations we see in the data.

  • And that's what unsupervised learning is after.

  • It's trying to discover representations where each element of the representation you can think

  • of as a factor or a cause that could explain the things we're seeing.

  • So, in 2011, we participated in a couple of scientific challenges on transfer learning,

  • where the idea is you're seeing examples from some tasks.

  • Maybe they're labeled.

  • But the end goal is to actually use the representation that you've learned

  • to do a good job on new tasks for which you have very few labeled examples.

  • And basically, what we found is that when you use these unsupervised learning methods,

  • you're able to generalize much faster with very few labeled examples.

  • So, all these curves have on the X axis the log of the number of labeled examples.

  • And on the Y axis, accuracy.

  • As you build deeper systems that learn actually in an unsupervised way from all the other tasks,

  • but just looking at the input distribution, you're able on the new tasks

  • to extract information from the very few examples you have much faster.

  • Faster meaning you need less examples to get high accuracy.

  • That's what these curves tell us.

  • Now, there are really big challenges to why is it that unsupervised learning hasn't been

  • as successful as supervised learning.

  • At least as we look at the current industrial applications of deep learning.

  • I think it's because there are really hard fundamental challenges because you're trying

  • to model something that's much higher dimensional.

  • When you're doing supervised learning, usually the output is a small object.

  • It's in one category or something like that.

  • In unsupervised learning, you're trying to characterize a number of configurations

  • of these variables that's exponentially large.

  • And for a number of mathematical reasons, that makes the sort of more natural approaches based

  • on probabilities automatically intractable for reasons

  • that I won't have time to explain in detail.

  • But there has been a lot of research recently to try

  • to bypass these limitations, these intractabilities.

  • And what's amazing about the research currently in unsupervised learning is there's

  • like ten different ways of doing unsupervised learning.

  • There's not one way.

  • It's not like the supervised learning where we have basically back prop with small variations.

  • Here we have totally different learning principles that go and try to bypass

  • in different ways the problems with [maximum light hue] and probabilistic modeling.

  • So, it's moving pretty fast.

  • Just a few years ago, we were not able to generate, for example, images of anything

  • but digits, images of digits, black and white.

  • So, just last year we were able to move to sort of more realistic digits.

  • These are images of street view house numbers that were generated

  • by some of these recent algorithms.

  • And these are more natural images that were generated

  • by paper presented just a few months ago where the scientists who did this at Facebook

  • and NYU asked humans whether the images were natural or not.

  • So, is this coming from the machine or is this coming from real world?

  • And it turned out that 40 percent of the images generated

  • by the computer were fooling the humans.

  • So, you're kind of almost passing the train test here.

  • Now, these are, you know, particular class of images.

  • But still, that's, you know, there's a lot of progress and so it's very encouraging.

  • One thing I'm interested in, as a last bit here,

  • as we're exploring all these different approaches to unsupervised learning,

  • some of these look like they might also explain how brains do it and the thing

  • that is a very interesting source of inspiration for this research.

  • All right.

  • So, why is it interesting to do unsupervised learning?

  • As I mentioned, because it goes at the heart of what deep learning is about,

  • which is to allow the computer to discover good representations, more abstract representations.

  • So, what it does mean to be more abstract?

  • It means that we essentially go to the heart of the explanations

  • of what's going on behind the data.

  • Of course, that's the dream, right?

  • And we can measure that.

  • We can do experiments where we can see that the computer automatically discovers

  • through in its [healing] units features that we haven't programmed explicitly in

  • but that are perfectly capturing some of the factors that are present as we know them.

  • So, yes, I'm going to close there and show you pictures of the current state

  • of my lab, which is growing too fast.

  • Thank you.

  • [ APPLAUSE ]


Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it


大規模機器學習 (Large Scale Machine Learning)

  • 209 10
    eddy posted on 2021/01/14
Video vocabulary