TensorFlow Extended (TF Dev Summit '20) - VoiceTube: Learn English through videos!

Subtitles section Play video

[MUSIC PLAYING]
EWA MATEJSKA: Hi, everyone.
Thank you for joining us.
I'm Ewa Matejska, and I'm technical program manager
on the TensorFlow team.
ZHITAO LI: Hi, my name is Zhitao.
I'm a software engineer from Google's TensorFlow Extended
team, TFX.
EWA MATEJSKA: Today, we'll be talking
about the brand new feature of the addition of native Kera
model support through TFX pipelines.
So could you tell me what's TFX, what are TFX pipelines,
and what is native Keras model support?
ZHITAO LI: Happy to do that.
TFX is Google's production-ready machine learning platform.
TFX pipelines is something we released the last year
to bring the pipeline experience to open source users,
as well as the Google Cloud users.
And the native Keras support is something
we started working from last October
to making sure our TensorFlow 2 users
can use the native Keras API inside TFX
to train their machine learning models.
EWA MATEJSKA: What can I do with TFX pipelines?
ZHITAO LI: So you can ingest data into TFX,
do data processing and the data understanding
to feature engineering on top of your data,
train the TensorFlow model, do model analysis,
and the model validation on your model,
and then finally, when everything is ready,
push the model onto production-ready [INAUDIBLE]
solutions.
EWA MATEJSKA: Awesome.
I'm excited to see the native Keras support.
So what do I do?
ZHITAO LI: Let me show that in this notebook.
So this is a public notebook from TFX team
to demonstrate how to use various components in TFX.
This notebook's also retaining native Keras.
I'm going to show how to do it that way.
So to do that, we first go-- we first
need to install TFX and the various softwares,
including TensorFlow and TensorBoard.
We're making sure all the packages are preloaded
and then making sure the version of software is correct.
After that, we set our pipeline path
to making sure we can correctly access all the data we need.
EWA MATEJSKA: MK, and what kind of model will you be using?
What kind of data?
ZHITAO LI: So the data set here is the public data--
public taxi data set from Chicago city.
And the problem they're going to solve
is try to predict whether the driver will receive
a tape more than 20% of the fare, which we call it
[INAUDIBLE].
So we are going to download the example data to the path,
making sure the data here is loadable.
Check the first couple of lines.
Then we create the interactive context,
helping us to be able to run each component of TFX pipelines
in the notebook.
EWA MATEJSKA: Is Interactive context a new API?
ZHITAO LI: Interactive context is an API from last October.
This can help us to run each component of the TFX pipeline
in a notebook.
So we first start with the ExampleGen.
This ingests the data into the pipeline
and transform them to a [INAUDIBLE] examples.
We can check the first couple of examples,
making sure they're correct.
Then we can use the StatisticsGen component
to generate some statistics for the data.
EWA MATEJSKA: Can you tell me a little more
about the statistics?
ZHITAO LI: Sure.
The statistics tell us, for each of the features in the data
set, what's the distribution?
How many [INAUDIBLE] records are there?
Minimum value, maximum value, medium value, et cetera,
et cetera.
EWA MATEJSKA: OK, cool.
ZHITAO LI: And we can also generate a schema out
of the data, which will tell us, on the aggregated view, what
the data is really--
what the data looks like.
And we can se-- we can list out all the schemas from here.
We can also use the example validator
to making sure the data is correct.
Now, we can use transform to do feature engineering
top of our existing data.
To do that, people simply write a pre-processing function,
which takes the raw--
which takes the original inputs and then
using Python functions to define the transform on them.
And we can easy capture all these transforms in the result.
Now, to support native Keras, we need
to-- we ask users to write their TensorFlow training codes as
if they're just writing the-- writing the Keras [? space ?]
code in the normal environment.
The model type we are solving here
is a wide and a deep model.
We simply ask people to write their training code.
This is a--
EWA MATEJSKA: Wide and deep model, you said?
ZHITAO LI: Yes.
Build a Keras model.
People can build a wide and deep classifier.
And once this classifier is defined using the native Keras
API, they can rub that in the red function.
The red function will be then fed
into the TFX trainer executor.
And we expect the function to expand our saved model.
After that, we take off the training component.
And we can see the training happens.
EWA MATEJSKA: OK, awesome.
ZHITAO LI: The training really happened
in the Jupyter Notebook.
We see these are the features we are using.
These are the advanced features.
These are the layers we used in the model.
And we train them for 10,000 steps.
And then we exported model [INAUDIBLE]..
EWA MATEJSKA: So this is a lot of meaty content.
How can I follow along at home?
ZHITAO LI: Sure.
So feel free to check out the TensorFlow.org/TFX page.
That is our Home page.
We have all the tutorials, API docs,
as well as component guides available there.
And feel free to reach out to us on either GitHub or the TFX
Google Group.
EWA MATEJSKA: And I have one last question
for you, a high level question.
How do I take this out to production?
ZHITAO LI: Oh, sure.
Happy to do that.
So to do that, you can simply use the pusher component
to push the model onto various types of production-ready
serving solutions, including TensorFlow Serving,
off mobile devices using TensorFlow Lite or TensorFlow
Hub.
EWA MATEJSKA: Thank you so much for showing me
a little bit about the native [? Kera ?] model support.
And thank you for joining us.
ZHITAO LI: Thank you.
[MUSIC PLAYING]