Placeholder Image

Subtitles section Play video

  • ANDREW SELLE: So today I'm going to talk about TensorFlow Lite,

  • and I'll give you an intro to what that is.

  • My name's Andrew Selle.

  • I'm a software engineer down in Google in Mountain View land.

  • All right, so introduction.

  • We're going to be talking about machine learning on-device.

  • So how many of you are interested on running

  • your machine learning models on-device?

  • How many already do that?

  • OK, so a lot.

  • So that's great.

  • So you already know that machine learning on-device

  • is important.

  • So I'm going to give you a scenario that's

  • perhaps a little bit out of the ordinary.

  • Suppose I'm going camping.

  • I don't have any power.

  • I don't have any network connectivity.

  • I'm out in the mountains.

  • And I want to detect bears.

  • So I want to hang my cell phone on the outside of my tent

  • so that if I'm in the middle of the night

  • and the bear is coming for me, I'll know.

  • I don't know what to do if that happens,

  • but that's a great example of what you do with on-device ML.

  • So you basically want to have low latency.

  • You don't want to wait for the bear to already be upon you.

  • You want some early warning.

  • You want to make sure it works without a data connection.

  • The data can stay on the device and you have access

  • to the sensors.

  • So there's a lot more kind of practical and important use

  • cases than that.

  • But that kind of sets the stage for it.

  • And so people are, of course, doing a lot more ML on-device.

  • So that's why we started TensorFlow Lite.

  • So in short, TensorFlow Lite is our solution

  • for running on-device machine learning

  • with low latency and a small binary size

  • but on many platforms.

  • So here's an example of what TensorFlow Lite can do.

  • So let's play our video.

  • So, suppose we have these objects

  • and we want to recognize them.

  • Well, I know what they are, but I want my phone

  • to be able to do that.

  • So we use an image classification model.

  • So here we have a marker.

  • In the bottom you see kind of the confidence.

  • So this is an image recognition.

  • It's happening in real time.

  • It can do kind of ordinary office objects,

  • but also important objects like TensorFlow logos.

  • And it also runs on multiple platforms.

  • So here we have it running on an Android phone and an IOS phone.

  • But we're not just limited to phones.

  • We can also do things like this Android Things Toolkit.

  • And there is going to be a couple more later, which

  • I'll show you some more.

  • So now that we've seen it in action,

  • let's talk a little bit more about what TensorFlow Lite is.

  • Do you want to see the video again?

  • I don't know.

  • OK, there we go.

  • The unfortunate thing about on-device

  • though is it's much harder.

  • It has tight memory constraints, you need to be low energy,

  • and you don't have as much computation as you do available

  • on the cloud.

  • So TensorFlow Lite has a few properties

  • that are going to sort of deal with these problems.

  • So the first one is it needs to be portable.

  • You know, normal PCs we run on, but we also

  • run on mobile phones, we run on Raspberry Pi, or other Linux

  • SOCs, IoT type devices.

  • And we also want to go to much smaller devices

  • like microcontrollers.

  • So, next slide.

  • The internet doesn't like me today.

  • Well, we'll skip that slide, whatever that was.

  • But in any case, basically, this portability

  • is achieved through the TensorFlow Lite file format.

  • So once you have a trained TensorFlow model, like you

  • author, you could author it with Swift,

  • you could author it with Python, whatever you do,

  • you produce a saved model in a graph form.

  • The serialized form is then converted to TensorFlow Lite

  • format, which then is your gateway to running on all

  • these different platforms.

  • And we have one more special converter

  • which allows you to go to Core ML

  • if you want to target IOS in this especially way.

  • The second property is that it's optimizable.

  • We have model compression, we have quantization,

  • we have CPU kernel fusion.

  • These are all optimization techniques

  • to ensure that we have the best performance, as well

  • as a small size.

  • And this is achieved through the architecture.

  • So we have a converter, which we already talked about,

  • and then we have an interpreter core.

  • The interpreter core delegates to kernels

  • that know how to do things, just like in TensorFlow.

  • But unlike in TensorFlow these are

  • optimized for mobile and small devices with NEON on ARM.

  • An additional thing that we have that TensorFlow

  • doesn't have is the notion of delegation.

  • So we can delegate to GPUs or Edge TPUs or accelerators.

  • And this basically gives TensorFlow Lite

  • the chance to give part of the graph

  • to a hardware accelerator that can do processing

  • in a special way.

  • So one of those we've talked about before is in an API.

  • We're excited to announce that in 2018 Q4

  • we're looking to release our OpenGL-based GPU

  • delegate, which will give us better performance on GPUs,

  • and will also accelerate things like MobileNet

  • and another vision-based models.

  • So that's really exciting.

  • In addition, at Cloud Next there was an announcement

  • about Edge TPUs.

  • And Edge TPUs are also special as well

  • because they give us the ability to do

  • high performance per watt, and also

  • fit into a small footprint.

  • So for example, the device is there on that penny there,

  • but it's on a development board.

  • So you can put it in many different form factors as well.

  • Then the third property is that it's parametrically sized.

  • So we know the TensorFlow Lite needs

  • to fit on small devices, especially very small devices

  • like MCUs, And there, you might need to only include

  • the ops that you need.

  • So our base interpreter is about 80 kilobytes,

  • and with all built-in ops it's 750 kilobytes.

  • We're moving to a world where you can parameterize

  • what you put into the TensorFlow Lite interpreter,

  • so you can trade off the ability to handle

  • new models that use new ops, and the ability to only ship what

  • you need in your application.

  • So, we introduced TensorFlow Lite last year

  • and we've been asking users what they think of TensorFlow Lite.

  • And they really love the cross platform deployment

  • that they can deploy to IOS, to Android

  • with the same kind of format.

  • They like that they can decouple the distribution of the binary

  • from the distribution of the model.

  • And they like the inference speed increases.

  • And they're really excited about the hardware acceleration

  • roadmap.

  • But the biggest feedback that we got

  • is that we should focus on ease of use, we should add more ops,

  • and we should work on model optimization,

  • and we should provide more documentation.

  • And so we've listened.

  • So what I want to do in this talk

  • is focus on the user experience.

  • But before we do that, let's look

  • at some of the users that are already using it so far.

  • We have a lot of first party users

  • and a lot of third party users that are excited about it.

  • And I hope after this talk you'll be interested as well.

  • So, the user's experience.

  • I'm a new user and I want to use TensorFlow Lite.

  • How do I do it?

  • Well, I think of it as kind of like learning to swim.

  • You can think of two things you might do.

  • You might wade, where you don't really have to swim.

  • But it's really easy to get started

  • and you get to cool off.

  • The second thing is that you can swim

  • where you can do custom models.

  • So we're going to talk about both of those.

  • But before we get into that, there's

  • an easier thing and then a harder thing.

  • So the easier thing is to dip your toes, which are demos.

  • And the harder thing is to just dive full in

  • and have full kind of mastery of the whole water,

  • and that would be optimizing models.

  • And we'll talk about those as well.

  • So as far as using demo apps, you can go to our website,

  • you can download the demos and compile it and run it.

  • It'll give you a flavor of what can be done.

  • I showed you one of those demo apps.

  • You can try it for yourself.

  • The next step is to use a pre trained model.

  • So the demo app uses a pre trained model.

  • So you can use that model in your application.

  • So if you have something that could benefit from say ImageNet

  • style classification, you can just take that model

  • and include it.

  • Another thing that's really useful is retraining.

  • So let me show you a retraining workflow, which

  • is you take a pre trained model and you kind of customize it.

  • Great, so we're running this video.

  • We're showing you the scissors and the Post-It Notes

  • as before, and here's an application

  • that I built on PC that allows me to do retraining.

  • But we're running inference with TensorFlow Lite.

  • So it knows the scissors, it knows the Post-It Notes,

  • but what if we got to a really important object, one

  • that we haven't seen before, perhaps

  • like something everybody has, the a steel TensorFlow logo.

  • How is it going to do on that?

  • Well, not well is the unfortunate thing.

  • But the good thing about machine learning is we can fix it.

  • We just need some examples.

  • So here, this app allows us to collect examples.

  • And you could imagine putting this into a mobile app

  • where you just move your phone around

  • and it captures a bunch of examples.

  • I'm going to do the same thing, except on Linux.

  • It's a little bit more boring, but it does a job.

  • So we want to get a lot of examples

  • to get a lot of generalization.

  • So once we're happy with that, we'll hit the Train button

  • and that's going to do some machine learning.

  • And once it's converged, we're going to have a new model.

  • It's going to convert it to TensorFlow Lite, which

  • is going to be really great.

  • We can test it out.

  • See if it's now detecting this.

  • And indeed it is.

  • That's good.

  • The other cool thing about that is now

  • that we have this TF Lite flat buffer model,

  • we can now run it on our device and it works as well.

  • All right, great.

  • So, now that we've done pre train models and we've done--

  • let's get into full on swimming.

  • There's basically four steps that we need to work on.

  • The first one is building and training

  • the model, which we've already talked about.

  • You could do that with, again, Swift would be a great way

  • to do that.

  • The second one is converting the model.

  • Third one is validating the model.

  • And the fourth is deploying model.

  • Let's dive into them.

  • Well, we're not going to dive yet.

  • We'll swim into them.

  • OK.

  • Build and train the model.

  • So the first thing to do is to get a saved model of your model

  • and then use the converter.

  • This can be invoked in Python.

  • So you could have your training script

  • and you could have the last thing

  • you do is to just always convert it to TensorFlow Lite.

  • I in fact recommend that, because that

  • will allow you to make sure that it's

  • convertible right from the start.

  • So you give it the saved model in,

  • and you provide the TF Lite buffer out.

  • Great.

  • And then when you're done with that, it will convert.

  • Except we don't have all the ops.

  • So sometimes it won't.

  • So sometimes you want to visualize the TensorFlow model.

  • So a lot of models do work, but some of them

  • are going to have missing ops.

  • So as we've said, we've listened to your feedback

  • and to address this we've provided these visualizers

  • so you can understand your models better.

  • They're kind of analogous to TensorBoard.

  • In addition, we've also added 75 built-in ops,

  • and we're announcing a new feature,

  • which will allow us to run TensorFlow

  • kernels in TensorFlow Lite.

  • So basically, this will allow you

  • to run normal TensorFlow kernels that we

  • don't have a built-in op for.

  • There is a trade to that, because that increases

  • your binary size considerably.

  • However, it's a great way to get started,

  • and it will kind of allow you to get into using TensorFlow Lite

  • and deploy your model if binary size is not

  • your primary constraint.

  • OK, great.

  • Once you have your model working and converted,

  • you will definitely want to validate it.

  • Every step of machine learning it's

  • extremely important to make sure it's still

  • running the way you think.

  • So if you have it working in your Python test bench

  • and it's running, you need to make sure it's also

  • running in your app.

  • This is just good practice, that end-to-end things

  • are producing the right answer.

  • In addition, you might want to do profiling

  • and you might want to look at what your size is.

  • Once you've done that, you want to convey this model

  • to the next phase, which is optimization.

  • We're not going to talk about that later,

  • but that's what you would do with those results.

  • OK, so how do you deploy your Model we have several APIs.

  • In the previous time I talked about this,

  • we had c++ and Java.

  • Kind of in May or so we introduced a Python API.

  • And I'm excited to talk about our C API, which

  • is a way in which we're going to implement

  • all of our different APIs, similar to how TensorFlow does

  • it.

  • In addition, we're also introducing an experimental C

  • Sharp API, which allows you to use it in a lot of toolkits

  • that are C Sharp based.

  • The most notable of which--

  • which is a feature request--

  • was Unity.

  • So if you want to integrate it into say a game.

  • OK, and then third, Objective C to get a more idiomatic

  • traditional IOS experience.

  • Great.

  • Let me just give you an example of some code here.

  • So the basic idea is you give it the file

  • name of the flat buffer model, you fill in the inputs,

  • and you call invoke.

  • Then you read out the outputs.

  • That's how all the APIs has work, no matter

  • what language they are.

  • So this was Python.

  • The same is true in Java.

  • The same is true in c++.

  • Perhaps the C one is a little bit more verbose,

  • but it should be pretty intuitive.

  • OK, now that we know how to swim,

  • let's go into diving into models.

  • How do we optimize a model?

  • So once you have your model working,

  • you might want to get the best performance possible,

  • and you might want to leverage custom hardware.

  • This traditionally implies modifying the model.

  • So the way in which we're going to do this is we're

  • going to put this into our [INAUDIBLE] loop.

  • We had our four steps before as part of swimming,

  • but now we have the additional diving model

  • where we do optimization.

  • And how does optimization work?

  • We're introducing this thing called the model optimization

  • toolkit, and the cool thing about it

  • is it allows you to optimize your model either post

  • training or during training.

  • So that means that you can do a lot of things

  • without retraining the model, but to get the--

  • let me just give an example, which is right now

  • we're doing quantization.

  • So there's two ways to do quantization.

  • One is, you take your model and look at what ranges it uses

  • and then just say we're going to quantize this model right now

  • by just setting those ranges.

  • So that's called post training quantization.

  • So all you need to do that is add a flag to the conversion.

  • I showed you the Python converter before.

  • There's also a command line one.

  • But both of them have this option to quantize the weights.

  • In addition, if you want to do a training time quantization,

  • we introduced the tool kit for doing this.

  • This is now kind of put under the model optimization toolkit,

  • and this will basically create you a new training graph

  • that when you run it, it will give you

  • the most optimal training quantized graph that you could.

  • It kind of takes advantage-- it takes into account that it

  • is going to be quantized.

  • So basically, the loss function is aware of the quantization.

  • So that's good.

  • OK.

  • So, just one more thing that I want to talk about,

  • which is roadmap.

  • We're looking actively into things like on-device training,

  • which will, of course, require us to investigate control flow.

  • We're adding more techniques to the optimization toolkit.

  • We'd also like to provide more hardware acceleration support.

  • And the last thing is for TensorFlow 2.0,

  • we're moving out of contrib and into TensorFlow.

  • So we'll be under TensorFlow slash Lite.

  • OK, so a couple demos.

  • I wanted to show a couple of models that were using--

  • so TensorFlow Lite.

  • So for example, here's one model that

  • allows you to see the gaze.

  • So it's running in real time.

  • It basically puts boxes around people

  • and kind of gives a vector of which direction

  • they're looking.

  • And this is running in real time on top of TensorFlow

  • Lite on an Edge TPU.

  • Let me show you another one.

  • Oh, sorry.

  • OK.

  • It's very tricky.

  • There we go.

  • Here's another one that's kind of interesting.

  • Again, this is using a variant of SSD.

  • It's basically three autonomous agents,

  • or two autonomous agents and one human driven agent.

  • Two of the agents are trying to catch the other one

  • and the other one's trying to avoid.

  • And they're all input in this SSD.

  • Basically, the upshot of this is that it uses SSD that's

  • accelerated with Edge TPUs.

  • It's about 40% faster using to Edge TPUs and TF Lite.

  • And I have one more demo, which is an app that's

  • using TF Lite called Yummly.

  • And basically, this is able to give you

  • recipes based on what it sees.

  • So let's just see it in action.

  • So this was originally built on TF Mobile,

  • but then moved to TF Lite.

  • So, this is their demo.

  • So essentially, you point your phone at what's in your fridge

  • and it will tell you what to make with it.

  • This is good for me, because I don't

  • have any creativity on cooking and I have

  • a lot of random ingredients.

  • So we're really excited by what people are using TF Lite for.

  • I want to show you one more demo,

  • which I just made last week with some of my colleagues.

  • And it's basically running on a microcontroller.

  • So this is basically a microcontroller

  • with a touch screen that has only one

  • megabyte of flash memory and 340 kilobytes of RAM.

  • So this is sort of pretty small, and we're

  • doing speech recognition on it.

  • So I say, yes.

  • And it says, yes.

  • It says no.

  • It says no.

  • And if I say some random thing, it says unknown

  • So pre-recorded.

  • Unfortunately, I don't have the sound on yet.

  • But this is just showing that we can run the same interpreter

  • code on these really small devices.

  • So we can go all the way to IoT, which I think is super exciting

  • and will introduce a whole new set of applications

  • that are possible.

  • So with that, I already told you this, which

  • we're moving out of contrib.

  • I'd like you guys to try out TensorFlow.

  • Send us some information.

  • If you're interested in discussing it,

  • go on to our mailing list, tflite@tensorflow.org.

  • And we're really excited to hear about new use cases

  • and to hear feedback.

  • So thank you.

  • We're going to-- both of us are going to be at the booth

  • over in the grand ballroom.

  • So if you want to talk to us more about either Swift

  • or TensorFlow Lite, that would be a great time to do it.

  • And thank you.

  • [APPLAUSE]

ANDREW SELLE: So today I'm going to talk about TensorFlow Lite,

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it

B1

TensorFlow Lite (TensorFlow @ O'Reilly AI Conference, San Francisco '18) (TensorFlow Lite (TensorFlow @ O’Reilly AI Conference, San Francisco '18))

  • 2 0
    林宜悉 posted on 2021/01/14
Video vocabulary