Subtitles section Play video
NICK KREEGER: How's it going, everybody?
I'm here to talk about TensorFlow and JavaScript
today.
My name is Nick, and this is my colleague Ping.
And we work on TensorFlow.js here in Mountain View.
So the traditional thinking is machine learning only
happens in Python, right?
That's kind of what everybody thinks about.
But is that always the case?
Has anybody seen this before?
This is something we host on our TensorFlow documentation.
This is the machine learning playground, the TensorFlow
playground.
And it was actually built by our colleagues in the East Coast.
And it was just a visual to put into some of our ML classes.
And it kind of shows how data flows
throughout a connected neural network
with different activation functions.
And this was a really popular project we built,
and it was a lot of fun to make.
And we've gained a lot of traction from it.
So we started to think maybe it makes sense
to do ML in the browser.
There's a lot of opportunities for doing ML directly
in the browser.
We don't need any drivers.
There's no CUDA installation or anything.
You could just run your code.
The browser has a lot of interactive features.
Especially with over the last several years of development,
there is access to things like sensors and cameras
you can easily hook up to that type of data stream.
And the other great part about doing ML directly
in the browser is it's a good privacy use case.
You don't have to send any user-facing data or any user
data over the wire, over an RPC.
To do inference behind the scene in your infrastructure,
you could actually just do that directly on the client.
So coming back to the TensorFlow playground,
this is about 400 lines of JavaScript code,
and it was very specifically typed for this project.
So our team kind of took this prototype
and started to build a linear algebra
library for the browser.
This project was initially started.
It was all open source under--
it was called deeplearn.js.
And we took deeplearn.js and aligned it
with what we're doing with TensorFlow internally
with eager execution and that type of alignment and launched
TensorFlow.js last April already.
And once we launched it, we had a lot of really great community
and Google-built products.
And I want to highlight a couple.
This is one that we built at Google.
It's called the Teachable Machine.
This is all done in the browser.
There's like three labels you can give what
you're training in the webcam.
There's like a green, purple, and red.
And it sort of highlights how a basic image recognition
model can run directly in the browser.
So this stuff all exists online.
You can still find it.
Another community built a self-driving car
all in the browser called Metacar.
And this is cool.
You can watch it train, and learn the inference
in what the car is driving.
People built games.
So this is a web game that somebody trained with
TensorFlow.js to avoid--
it's kind of a funny animation, but there's a little dude
running back and forth.
And he's hiding from those big balls.
And the model is learning to avoid the balls
all through using TensorFlow.js and continuing to play.
This one is really cool.
This is a Google project called Magenta, which does a lot of ML
with audio.
We have a large library called Magenta.js,
which is built on TensorFlow.js to do in-browser audio.
This is a cool demo somebody built.
It's a digital synthesizer that learns how to play music
and can drive with it.
Another cool example that just came out is--
this is all community-built open source.
It's called face-api.js.
So it's a library that sits on top of TensorFlow.js.
It has a few different type of image recognition.
It can detect faces and facial features,
so even, like, toddlers work pretty well.
So I want to kind of showcase how
our library pieces together.
There's sort of two main components to TensorFlow.js.
There's a Core API and then a Layers API.
And that is all powered in the browser by WebGL.
That's how we did the linear algebra aspect for the browser
is we bootstrap all the linear algebra all through WebGL
textures.
And on the server side, we actually ship our C code
that we run Python--
or I'm sorry, that powers TensorFlow Python.
So you get the high-end CPU, GPU.
And then eventually, we're working on the TPU integration
story for server side.
And those who have used Keras, the Layers API
is almost the same as Keras, very similar syntax.
The Core APIs are OP level.
And anyone who's worked with TensorFlow SaveModels,
that API will be pretty similar.
OK, what can you do today with TensorFlow.js?
Well, you can actually just author small models directly
in the browser.
There is a limited amount of resources the browsers have,
so we kind of get into that a little bit later.
But right now, you can do pure model training in the browser.
You can import pretrained models,
so this is a model that has been trained somewhere else, usually
in the cloud or on some Python device.
And we have a tool to serialize the model
and then run that inference in Node or on the browser.
And we have the ability to retrain models, so very
basic transfer learning.
We can bring in a model.
Anyone who's seen TensorFlow for Poets,
it's a very similar exercise.
So to get started with the Core API,
I want to do just a very simple, basic fitting a polynomial.
So this is a scatter of some data we have.
And we're going to write a really simple model
to try to find the best fit for this plot of data.
It's the classic fx equals ax squared plus bx plus c.
Excuse me.
So the first line-- this is all ES6-style JavaScript,
for those who are familiar.
So we're going to import--
@tensorflow/tfjs is the name of our package.
And we namespace it as tf.
Now, our first step is to include
three different variables-- a, b, and c.
And we actually initialize those as 0.1.
This is going to be passed into our training sequence.
The next step to do is declare our function.
So this is all using the tfjs APIs are doing that f of x
equals ax squared plus b to the power of x plus c.
And we have some sugar to make that a little bit more
readable using chainable APIs, which is a very
common pattern in JavaScript.
Next step is to declare a loss function,
just have a mean squared loss.
And then we declare the SGD optimizer
with a default learning rate we've declared somewhere
in this code.
And then finally we loop it through our training.
So EPOCHS we pass through, and every step, we
minimize our loss through the SGD optimizer.
This is very similar to eager-style Python,
for those who have done that in the Python plane.
Next thing I want to highlight is
the next step up-- that layers, that Keras-style API.
And to do so, we've been working on doing audio recognition
directly in the browser.
And I want to highlight just simply how that kind of works.
So really simple spoken commands like up,
down, left, right can be run through FFT
to build a spectrogram.
So we take audio in, and we build
a spectrogram as an image, and we train our model on that.
And we can actually build that convolutional network pretty
simply with our Layers API.
And the first step is just the same as our fitting polynomial.
We'll include the package tfjs.
And then we're going to build a sequential model.
This is very Keras-style.
Excuse me.
Our first step is to do a conv2d,
a couple of different filters, and kernelSize.
ReLU activation functions-- again,
this is very familiar for those who have used Keras.
Then we have a pooling layer.
And then we're going to go ahead and do
some more conv2ds, and another max-pooling level, and so on.
We repeat as we work our way down the funnel.
And finally, we flatten out our layers, add some dropout,
add a large dense layer at the very end, one more dropout
layer, and then finally, our softmax for audio labeling.
And finally, let's compile the model.
So this is again, very similar to Keras.
We're going to compile our model that we built.
We'll know any errors that we have as the model
is constructed.
Give it an optimizer, and then we
call model.fit to start passing in our training data
with our labels.
And once the model has trained, we can save it to disk.
We have options for saving directly in the browser
and on a Node.js to file.
And then finally, we can use that model to do prediction.
So we model.predict, and we pass in our spectrogram.
OK, so those were two quick passes at some of the APIs we
use-- the higher-level Core and then the lower-level--
I'm sorry, the higher-level Layers API and the lower-level
Core API.
But one of the cool parts of doing the browser
is we can take in models that have already been trained today
and build interactive demos.
And for that, I want to showcase a small video.
This is actually built by a collaboration
of the TensorFlow.js team and a Google internal design firm.
And we've built this game for mobile devices.
And it uses MobileNet to do an emoji scavenger hunt.
So in the game, the game will suggest an emoji.
And you have to run around the office
and find it with your webcam on your phone.
And this is all doing inference that's powered directly
with the browser.
For this one, I'll play a quick video,
and it will kind of give you a better
highlight of what's going on.
[VIDEO PLAYBACK]
- This is an umbrella.
So is this.
This is a slice of pizza.
So is this.
Emojis have become a language of their own.
We use them everyday to communicate in our texts
and emails, so much so that it's easy to forget
about the real-world objects they're based on,
which got us thinking.
Could we create a game that challenges people
to find the real-world versions of the emojis we use everyday?
Introducing "Emoji Scavenger Hunt."
"Emoji Scavenger Hunt" uses TensorFlow.js.
Open source meets machine learning
meets JavaScript meets fun.
It works like this--
we show you an emoji.
Use your phone's camera to find it before the clock runs out.
- Hey, you found pen.
- Find it in time and you advance the next emoji.
While you're searching, you'll hear our machine
learning system doing its thing.
- Do I see a toilet tissue?
- Hey, do you have a watch?
- Was that a tub?
Is that a glove?
Hey, you found a watch.
- See if you can find all the emojis before the timer
runs out.
- Do I spy a broom?
- "Emoji Scavenger Hunt"--
powered by machine learning.
Start your search at g.co/EmojiScavengerHunt.
- Do I see a URL?
[END PLAYBACK]
NICK KREEGER: Cool.
So that sort of showcases where someone has already
done the hard work of training a model.
Now, we can build that great interactive demo.
OK, so I want to highlight how we actually
do that behind the scenes.
So the first step is taking that pretrained model.
This is MobileNet.
It's been trained under Python TensorFlow.
Internally in MobileNet, those who have used MobileNet
will know there's an object detector that you can
tune for a specific labeling.
And then we import that into our JavaScript app--
the scavenger hunt.
The first step once the model's been trained, we save it.
There is a few different paths for doing this.
There's the traditional TensorFlow saved_model API.
And we also support Keras as well.
So there's a sequential MobileNet model for Keras.
Then we have a conversion step.
So this is a tool that TensorFlow.js
ships over Python.
It's a pip install tensorflowjs, and you can
use the tensorflowjs_converter.
For interacting with a saved_model,
there's a couple different options
for finding the output of the inference graph
and where we want to serialize our artifacts.
And then we also support the Keras-style converter
as well for HDF5 file format.
And finally, we would like to load those artifacts
into the browser.
So this is all JavaScript code.
For that saved_model, it's tf.loadSavedModel.
And we have two different artifacts
that our script creates.
There's a weights link and then a link to the JSON file,
which describes the inference graph.
And again, there's the Keras-style one.
Keras actually ships all-in-one JSON file,
which has one downside of it avoids some of the caching
that we provide for saved_models.
What happens in that model conversion step?
So the first thing we do--
especially, like saved_model has a lot
of different paths for the graph.
There is an inference graph, which is the one we want.
There's steps for training.
And a lot of time, if you're using the tf.data pipeline,
there's actually graphs for all the data ops.
So we actually pull out the graph for inference,
and then collapse ops as needed, and run some optimization.
And the one other great thing we do for a saved_model
is sharding of weights into 4 megabyte chunks, which cache
nicely with modern browsers.
So it's only a one-time fetch for those larger models.
And we support about 120 plus of today's TensorFlow ops
in that conversion step, and we're always adding more.
And again, the TF/Keras Layers are supported
with this conversion step.
I also wanted to showcase one more demo.
This is a newer demo that we've just shipped this summer.
And it's using PoseNet, which is a human estimation demo.
And for this, I'm going to hand it
over the Ping, who is going to highlight this.
PING YU: All right, guys.
So PoseNet is another example of converting
a Python-trained model and loading to the browser.
So on the right side, you can see a lot of control
that can fine-tune the model.
And on the left side is a live feed of a video.
So in the video, you can see you can detect my face features as
well as my body parts.
So this is a collaboration between a Google research team
as well as external contributor.
So this model is in our model repository.
You can check them out.
On the left side, you can see actually it
has about 15 FPS, so Frame Per Second.
You can build some cool application,
like build recognizing motions for sports, et cetera.
We also have other models in that repository,
so like audio command model that Nick mentioned earlier.
And also we're adding some other, like object detection
model.
So all of that is available for you to use in the browser.
Just go ahead, check them out, and let us know
if you build any cool apps.
Thanks.
NICK KREEGER: And the great part about that is it's
feeding directly off of the camera feed in real time.
And we're doing about 15 frames a second
and presenting over the USB-C, so it does pretty well.
OK, so I did mention earlier about training directly
in the browser.
This is the retraining-- the transfer learning step.
And for this, we have another cool demo
that we want to showcase.
Again, we're using MobileNet.
And Ping is going to pull this demo up while I'm talking.
So we built this demo where we have a baseline MobileNet
model that we've loaded in the browser.
And we're going to train Ping's face to play Pac-Man.
So he's going to start collecting samples
from the webcam of what his up, down, left is.
And so for this, he's going to use his face.
So as he's moving his face around,
he's collecting different samples
that we're going to pass into that retraining step.
So there's an up, down, left, and right.
And with this demo, as you hold down,
we're collecting more and more frames.
He's getting close, OK.
And then now that he's collected his frames,
he's going to click the Train Model button.
And we'll watch our last shoot straight down.
It only takes a couple seconds.
And now, he's ready to play Pac-Man.
So go ahead and hit Play there, Ping.
PING YU: All right, let's go.
NICK KREEGER: All right.
There you go.
So the model is running directly in the browser.
We've retrained it to those pictures of his face.
And the controls are lighting up left, down, right based
on what the model is doing.
PING YU: Aw, come on.
NICK KREEGER: All right.
So this is a great use case of what
you could do with taking advantage
of some of the stuff the browser provides in doing accessibility
for machine learning and building cool apps.
OK, Ping, we can play Pac-Man all day.
PING YU: I got this.
NICK KREEGER: These demos are all
available on our site, which we'll showcase at the end.
You can actually just run this today.
No drivers, no anything to install.
OK.
PING YU: Let's go.
NICK KREEGER: Cool.
All right, so I've showed off a bunch of demos
using our Core API, using that Layers API,
bringing in pretrained models and doing
some basic retraining.
So where does performance kind of step up--
stand for TensorFlow.js for our browser runtime,
that WebGL-powered runtime?
So this is some benchmarks we've done
using Python and MobileNet.
So there's two computers that we use for these benchmarks.
The top one is a high-end workstation with a 1080
GTX, the high-end NVIDIA card.
So it's super fast, a little under 3 milliseconds
for our inference time.
And then we used a 13-inch MacBook with a nonintegrated
graphics card--
or with a integrated graphics card, not a standalone graphics
card.
And that was using the CPU build so there's no--
it's just the default AVX instruction
step we ship with TensorFlow.
And we doing a little under 60 milliseconds
for inference time.
So where does the TensorFlow.js benchmark stand up?
Well, it kind of depends.
On that super beefy 1080 card, we're really close--
about 11 milliseconds per inference time.
The CPUs they're running on this laptop is a little bit slower.
It's a little under 100 milliseconds
per inference time, but that was still giving us that
15 to 20 frames per second, which allows you to still build
interactive demos.
So this discussion leads us to our next part,
which is where does TensorFlow.js on server-side
come into?
We think there's a lot of great opportunities
for going with JavaScript ML on the server-side under Node.js.
The ecosystem for Node packages is really awesome.
There's tons of prebuilt libraries
off the shelf for npm.
You can build applications really quickly
and distribute them on all these different cloud services.
The default runtime for Node.js V8 is super fast.
It's had tons of resources put into it
by companies like Google.
And we've seen benchmarks where the JS interpreter in Node
is 10 times faster than the Python.
By enabling TensorFlow with Node.js,
we actually get access to that high-end hardware--
so those Cloud TPUs, the GPU, and so on.
So those are all exciting things.
I wanted to showcase one real simple use
case of Node.js and TensorFlow.
The code snippet I have up here on the screen
is actually a really simple Express app.
If anyone's used it, it's just a request response handler.
And we just handle the endpoint /model,
which has a request and a response that we'll write out
to.
So this model, right now actually,
we have a model that we've defined.
And we're going to do some prediction on input that's
been passed into this endpoint.
Now to turn on TensorFlow.js with Node,
it's one line of code.
It's just importing the binding.
So this is a binding we ship over npm.
And it gives you the high-end power
of TensorFlow C library, all executed under the Node.js
runtime.
And what can you do today with server-side?
So all those demos we showed of running
the model in the browser, those actually
just run under Node as well.
You can use our conversion script.
We ship the three major platforms--
MacOS, Linux, and Windows CPU.
And we also have GPU and CUDA for Linux and Windows.
We just launched Windows late last week.
And all of the full library support--
so the Layers API and our Core API all work today right
out of the box with Node.js.
And to kind of highlight how we can bring all these components
of npm and TensorFlow.js and Node.js together,
we built a little interactive demo.
So I know not everybody is super familiar with baseball,
but Major League Baseball Advanced Media
has this huge dataset where they record using sensors
that all the stadiums, the different types of pitches
that players throw at games.
So there's pitches that are really
fast that have a high velocity and low movement.
And then there are pitches who are a little slower that
have more movement.
So we curated this dataset and built a model
all in TensorFlow.js that trains against this data
and detects, I think, seven or eight
different types of pitches.
And it renders it through a socket.
So don't get too hung up on the intricacies of baseball.
This is just really solving a bread-and-butter ML
problem of taking sensor data and drawing up
a classification.
So for this, I'll have Ping run through the demo.
PING YU: OK.
All right, so for web developers,
you could really use TensorFlow.js
to build a full-stack kind of ML application.
So on the left side is a browser that I started a client
on the browsing side of the browser, which
is trying to connect to the server through socket.io.
On the right side, I have my console.
I'm going to start my Node.js server.
Immediately, you see that it's binded to our TensorFlow CPU
runtime.
And as it goes, the model is getting trained.
And the training stats are fed back to the client-side.
As the training progresses, you can see the accuracy increase
for all labels.
The curve ball has about 90% accuracy right now.
With server-side implementation, it's easy to feed new data,
not like inside a browser.
It's much harder.
Let me try and click this button.
What this will do is I will load live MLB pitch
data into this application.
And we will try to run inference on that.
So let me click on that.
So immediately, you can see the orange bar
is the prediction accuracy for all of these labels.
Some of them we actually did better with the live data.
It's 90% for changeup.
Something we did a little bit less accurate--
fastball 2-seam is only 68%.
So overall, I think it should demonstrate
that the model actually generalized pretty
well for the live data as well.
So, yeah, back to you.
Cool.
NICK KREEGER: All right.
I'm going to actually kill that demo or my laptop will die.
Great.
So just highlighting exactly what is going on there,
there was the Node.js server which was doing our training.
There was a training dataset and an eval dataset,
which we were reporting back over socket.io
how good we are at each class through our evaluation.
And then we had the ability to just easily reach out
to MLB Advanced Media through Node
and parse through their data, and then
send in that to the model, which was the orange prediction.
So kind of a cool use case of training my model, how
does it stack up to real-world data
and doing like a quick visualization.
And that was all plain JavaScript, plain HTML.
And all the source code we've shown you today--
all the examples we've shown you today are open source.
And we'll link to them at the end here.
So performance-- so I highlighted
where the WebGL runtime kind of stacked up
with that Python benchmark.
So let's step in and look at Python benchmark
against the Node.js runtime.
So again, these are those initial benchmarks
that I highlighted.
The Node runtime itself is just as fast as the Python runtime
for inference of MobileNet.
This is because we're using the same library that Python uses,
and there's not much code to get to,
and then we're running all that high-end code.
OK, I've highlighted a lot of stuff
we've built since basically this year and launched in April.
The Node stuff has been out since the end of May.
So what's next?
What's the direction that TensorFlow.js
is looking to go in?
We have some high-level bets that we're doing.
There's a project that's going on
that we're going to release here very soon in the next month
or so.
It's our visualization library.
So it's the ability to pull in through the browser
and do quick visualizations of your model in the data we have.
So look for that coming soon.
We also have a full data API, so very similar to the tf.data.
It'll be browser and node-specific,
so there will be convenience functions
for I just want to read data off of my webcam
and not convert it to tensors.
This API will provide that for you.
And on the server-side, it will be giving highly-optimized data
pipelines for doing Node.js training.
And so those are our two high-level things.
Those are the big projects that kind of
cross both of our runtimes.
Looking forward for the browser, we're working on performance.
So those benchmarks that I showed with WebGL,
a lot of them are the bottlenecks or limitations
for WebGL.
So we use 2D textures that render the tensor data.
There's some bottlenecks for downloading those textures,
reusing those textures.
So we're working on WebGL optimization.
We're also adding more and more ops.
Lately, the focus has been audio and text-based models.
So we're adding a lot more ops to help with that.
We have a great stable library of image recognition ops
and the audio stuff is coming.
And the other thing we're looking at
is helping push that spec.
So the WebGL runtime was really interesting,
and it kind of helped bootstrap ML in the browser.
But WebGL isn't the best use case for this.
And we're looking at a few different options.
One is Compute Shaders, which is much more
similar to a CUDA-like, where I can
allocate the right amount of GPU memory
I need to use and do that.
And we're also following closely the WebGPU spec.
So there's a bunch of different offerings from Chrome,
and Internet Explorer, and the browser
vendors for what we want to do.
We're sort of helping watch that space
and provide guidance as needed.
And on the Node.js side, cloud integration
is a thing we're looking at.
This includes the serverless-type integration
points, integration with our TPUs, and so on.
We're actually working on generating
op code to provide all the core TensorFlow ops in Node.js.
The Python version of TensorFlow, most of the code
is actually generated from our op registry internally,
so we're writing that for TypeScript for JavaScript users
too.
And we're providing a better async support with libuv.
So libuv is the underpinning in Node.js
for asynchronous programming.
We're working better to make that scheduler work much nicer,
so we're not blocking as much main application threads.
OK, wrapping up-- we've shown you a lot of stuff.
I kind of want to step back and highlight a couple of things.
First one is our Core API.
That's the bread and butter of the TensorFlow.js suite.
It's our op library.
It allows you to interact with tensors.
And we also have our Layers API, which
is our Keras-style API for training.
And we also support SavedModel and Keras model conversion
today through our converter script.
And the newest runtime we have is Node.js.
We've just got done talking about a bunch of that.
And with that, I want to thank you guys for attending.
Everything I've shown you is on js.tensorflow.org.
We have quite a bit of stuff up there.
There's all those demos that I showed.
They are linked in that page, so you can find them as well as
the source code.
We also have a variety of GitHub repos.
Everything we do is on GitHub.
tensorflow/tfjs is our root one.
That's our union package.
We keep track of all of our GitHub issues there.
It also links out to a variety of things we have now.
We have an examples repository, which
has maybe 10 to 15 examples you can just run.
There's also a link to our models view.
So this is models that we've pretrained, packaged up
for JavaScript use, and published over npm.
A lot of them actually have wrapper APIs,
where you don't even have to take data and convert it
to tensors and then pass it in for inference.
It just says here's an image HTML canvas.
Can you do a prediction?
So those are really cool.
All that stuff is linked on tfjs.
We also have a gallery too of community-built stuff,
and it's always growing.
This is our community mailing list you could also
find on our website.
There's a lot of good discussion of for how do I do X, Y, and Z,
or I need this feature.
Can you please help?
The gallery repo I just mentioned,
that's where all of our community-built examples live
and models repo.
And that's all.
[APPLAUSE]