Placeholder Image

Subtitles section Play video

  • I wanted to talk a little bit more about deep learning and some of a kind of slightly more,

  • Large and interesting architectures that have been coming along in the last couple of years, last few years.

  • So just a very brief recap, right? We've got videos on this

  • I'm going to draw my network from the top down this time. So rather than there being a square input image

  • I'm just going to draw a line which is the image from the top

  • So you can work with your animation magic and sort this all out for me. Brilliant.

  • So I'm going to be talking about deep learning and convolutional neural networks.

  • So a convolutional neural network is one where you have some input like an image.

  • You filter it using a convolution operation.

  • And then you repeat that process a number of times.

  • To learn something interesting about that image.

  • Some interesting features.

  • And then you make a classification decision based on it.

  • That is usually what you do, right?

  • So you might decide, well this have got a cat in it or this one's got a dog in it.

  • Or this one's got a cat and a dog in it and that's very exciting.

  • So from the top down right because I've always

  • My pens gonna run out of ink if I start trying to draw too many boxes.

  • You've got an input image, but it's quite large usually.

  • So here's an input image and I'm gonna draw it like this.

  • This is from the top.

  • So if this is my image, I'm gonna go to the top and look at it straight down.

  • Which I realized sort of like that. Does that work?

  • Now there's three input channels because of course we had usually red green and blue, right?

  • So in some sense, this is multi-dimensional. We're gonna have our little filtering so I'm going to draw a couple of kernels.

  • Let's maybe draw four. We're gonna do a convolution operation using this one on here

  • So it's going to look over all of these three channels

  • it's going to scan along and it's going to calculate some kind of features like an edge or something like this and that's going to

  • Produce another feature right and now there's four kernels of each gonna do this. So we're gonna have four outputs. Don't worry

  • I'm not going to do an 800 layer deep network this way

  • So each of these gets to look at all of the three something that's a bit a bit of a sort of quirk of deep

  • Learning but maybe isn't explained

  • Often enough, but actually these I'll have an extra dimension that lets them. Look at these

  • so the next layer along will look at all four of these ones and so on what we also then do and I'm going to

  • Sort of get why not? Why not use multiple colors?

  • We then sometimes also spatially down sample. So we take the maximum of a region of pixels.

  • So that we can make the whole thing smaller and fit it better on our graphics card.

  • We're gonna downsample this so it's gonna look like this and then okay, I'll just do a yellow one. Why not?

  • Can we see yellow on this? We'll soon find out. Yeah. Yeah

  • So let's say there's two kernels here and you can kind of see it.

  • I think we need to go pink here. Pink? Pink! Alright pink, forget yellow.

  • No yellow on white. That was what I was told when I first started using PowerPoint.

  • I like pink. Yeah, that kinda, that can work.

  • It kinda looks a bit like the red.

  • So that's going to look at all these four so and there's two of them. So there's going to be two outputs, right?

  • Just think of in terms of four inputs two outputs. So that's going to be sort of like this

  • I'm just going to go back to my blue and forget the colors now and you just repeat this process for quite a while

  • Right depending on the network. There are more advanced architectures like resinates, but let this become very very deep you

  • Know hundreds of layers sometimes but for the sake of argument

  • Let's just say it's into the dozens

  • usually so we're gonna down sample a bit more and so on and then we'll get some kind of

  • final feature vector

  • Hopefully a summary of everything that's in all these images sort of summarized for us

  • And that's where we do our classification

  • so we attach a little neural network to this here and that all connects to all of these and then this is our reading of

  • Whether it's a cat or not, that's the idea the problem with this is that these number of connections here are fixed

  • This is the big drawback of this kind of network

  • You're using this to do this very interesting feature calculation and then you've got this fixed number of it's always three here

  • There's always one here

  • So this always has to be the same size which means that this input also has to always be the same size. Let's say

  • 256 pixels by 256 pixels, which is not actually very big

  • So what tends to happen is that?

  • We take our image that we were interested in and we shrink it to 256 by 256 and put that in you know

  • and so when we train our network

  • We make a decision early on as to what kind of appropriate size we should use now, of course, it doesn't really make any sense

  • Currently because we have lots of different kinds of sizes image, obviously

  • They can't be too big because we're run out of RAM

  • But it would be nice if we if it was a little bit flexible

  • The other issue is but this is actually taking our entire image and summarizing it in one value

  • So all spatial information is lost right?

  • you can see that the spatial information is getting lower and lower as we go through this network to the point where all we

  • Care about is if it's a cat not where is the cat? What if we wanted to find out where the cat was or?

  • Segment the cat tutor or somet in a person or count a number of people right to do that

  • This isn't gonna work because it always goes down to one. So that's kind of a yes or no is yeah

  • Yeah, yes or no. You could have multiple outputs

  • If it was yes, dog, no cat, you know different outputs

  • Sometimes instead of a classification you output an actual value like the amount of something

  • But in this case, that's not that's not worry about it now

  • You've told me that this is an amazing market so I'm gonna have a go at this

  • I said anyone ever raised your marker in your videos. I mean, this is a first that

  • Okay, it's work he's just gonna take quite a while because it stuck this rubber is tiny you know what I qualities Marcus

  • All right. There we go

  • All right. So the same input still produces this little feature vector

  • But now instead of a fixed size neural network on the end

  • We're just going to put another convolution of one pixel by one pixel

  • So it's just a tiny little filter

  • but it's just one by one and that's going to scan over here and produce an image of

  • Exactly the same size but this of course we'll be looking for all of these and working out in detail what the object is

  • So it will have much more information than these ones back here

  • So, you know

  • this could be outputting a heat map of where the cats are or where the dogs are or

  • You know the areas of disease in sort of a medical image or something like this

  • And so this is called a fully convolutional network because there are no longer any

  • Fully connected or fixed size layers in this network. So normal deep learning in some sense or at least up until so 2014-2015

  • Predominantly just put a little new network on the end of this. That was a fixed size now

  • We don't do that

  • And the nice thing is if we double the size of this input image, I mean we're using more RAM

  • But this is going to double little double and in the end

  • This will also double and we'll just get the exact same result which is bigger so we can now put in different size images

  • the way this actually works in practice is that when one your deep learning library like

  • Cafe 2 or pi stalks or tensorflow will allocate memory as required

  • So you put in an input image and it goes well, ok with that input image

  • We're going to need to allocate this much RAM to do all this and so the nice thing is that this can now have information

  • On where the objects are as well as what they are picks output. So

  • We'll show a few examples of semantic segmentation on the screen so you can see the kind of thing

  • We're talking about the obvious downside here, which is what I'm going to leave for

  • Another video is that this is very very small, you know

  • maybe this is only a few pixels by a few pixels or something like this or

  • You haven't done that much down sampling and so it's not a very deep network and you haven't learned a whole lot if you are

  • Looking for where is the carrier's image? You have kind of it's down in the bottom left. It would be very very general

  • So it would be you know bit sort of area. Maybe there's something else going on over here

  • It depends on the resolution of this image looks great with different colors in line. But what are you actually using this stuff?

  • Alright, so, I mean we have to extend this slightly, which I'm you know

  • Normally going to postpone for another video because this is too small for us to be practical, right?

  • What we could do is just up up sample this we could use linear or bilinear interpolation

  • to just make this way bigger like this and have a bigger output image and

  • It would still be very low resolution you'd get the rough idea of where something was but it wouldn't be great

  • Right, so you could use this to find

  • Objects that you're looking for. So for example in our lab, we're using this for things like analysis of plants

  • So where are the wheat is how many are there that can be useful in a field to try and work out

  • What the yield or disease problems are going to be you can do it for medical images where the tumors in this image

  • Segmenting x-ray images we're also doing it on human pose estimation and face

  • Estimation so you know, where is the face in this image? Where are the eyes?

  • What shape is the face this kind of thing so you can use this for a huge amount of things?

  • But we're going to need to extend it a little bit more to get the best out of it

  • And the extension we'll call an encoder decoder Network

  • Are you tying it up now? What are you doing? It's not neat enough this there's little bits of unwrapped out bits

  • Bear with me I start on the next video in a minute. Yeah

  • That's as good as it's getting it

I wanted to talk a little bit more about deep learning and some of a kind of slightly more,

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it