Placeholder Image

Subtitles section Play video

  • CLEMENS MEWALD: Hi, everyone.

  • My name is Clemens.

  • I'm a product manager in Google Research.

  • And today I'm going to talk about TensorFlow Extended,

  • which is a machine learning platform that we built

  • around TensorFlow at Google.

  • And I'd like to start this talk with a block

  • diagram and the small yellow box, or orange box.

  • And that box basically represents

  • what most people care about and talk about when they

  • talk about machine learning.

  • It's the machine learning algorithm.

  • It's the structure of the network

  • that you're training, how you choose

  • what type of machine learning problem you're solving.

  • And that's what you talk about when you talk about TensorFlow

  • and using TensorFlow.

  • However, in addition to the actual machine learning,

  • and to TensorFlow itself, you have

  • to care about so much more.

  • And these are all of these other things

  • around the actual machine learning algorithm

  • that you have to have in place, and that you actually

  • have to nail and get right in order

  • to actually do machine learning in a production setting.

  • So you have to care about where you get your data from,

  • that your data are clean, how you transform them,

  • how you train your model, how to validate your model,

  • how to push it out into a production setting,

  • and deploy it at scale.

  • Now, some of you may be thinking, well,

  • I don't really need all of this.

  • I only have my small machine learning problem.

  • I can live within that small orange box.

  • And I don't really have these production worries as of today.

  • But I'm going to propose that all of you

  • will have that problem at some point in time.

  • Because what I've seen time and time

  • again is that research and experimentation

  • today is production tomorrow.

  • It's like research and experimentation never

  • ends just there.

  • Eventually it will become a production model.

  • And at that point, you actually have to care about all

  • of these things.

  • Another side of this coin is scale.

  • So some of you may say, well, I do

  • all of my machine learning on a local machine, in a notebook.

  • Everything fits into memory.

  • I don't need all of these heavy tools to get started.

  • But similarly, small scale today is large scale tomorrow.

  • At Google we have this problem all the time.

  • That's why we always design for scale from day one,

  • because we always have product teams that say, well,

  • we have only a small amount of data.

  • It's fine.

  • But then a week later the product picks up.

  • And suddenly they need to distribute the workload

  • to hundreds of machines.

  • And then they have all of these concerns.

  • Now, the good news is that we built something for this.

  • And TFX is the solution to this problem.

  • So this is a block diagram that we published

  • in one of our papers that is a very simplistic view

  • of the platform.

  • But it gives you a broad sense of what

  • the different components are.

  • Now, TFX is a very large platform.

  • And it contains a lot of components

  • and a lot of services.

  • So the paper that we published, and also

  • what I'm going to discuss today, is only a small subset of this.

  • But building TFX and deploying it at Google

  • has had a profound impact of how fast product teams at Google

  • can train machine learning models

  • and deploy them in production, and how ubiquitous machine

  • learning has become at Google.

  • You'll see later I have a slide to give you some sense of how

  • widely TFX is being used.

  • And it really has accelerated all of our efforts

  • to being an AI first company and using machine learning

  • in all of our products.

  • Now, we use TFX broadly at Google.

  • And we are very committed to make

  • all of this available to you through open sourcing it.

  • So the boxes that are just highlighted in blue

  • are the components that we've already open sourced.

  • Now, I want to highlight an important thing.

  • TFX is a real solution for real problems.

  • Sometimes people ask me, well, is this the same code that you

  • use at Google for production?

  • Or did you just build something on the side and open source it?

  • And all of these components are the same code base

  • that we use internally for our production pipelines.

  • Of course, there's some things that

  • are Google specific for our deployments.

  • But all of the code that we open source

  • is the same code that we actually

  • run in our production systems.

  • So it's really code that solves real problems for Google.

  • The second part to highlight is so far

  • we've only open sourced libraries, so

  • each one of these libraries that you can use.

  • But you still have to glue them together.

  • You still have to write some code

  • to make them work in a joint manner.

  • That's just because we haven't open

  • sourced the full platform yet.

  • We're actively working on this.

  • But I would say so far we're about 50% there.

  • So these blue components are the ones

  • that I'm going to talk about today.

  • But first, let me talk about some of the principles

  • that we followed when we developed TFX.

  • Because I think it's very informative

  • to see how we think about these platforms,

  • and how we think about having impact at Google.

  • The first principle is flexibility.

  • And there's some history behind this.

  • And the short version of that history

  • is that I'm sure at other companies as well there used

  • to be problem specific machine learning platforms.

  • And just to be concrete, so we had a platform

  • that was specifically built for large scale linear models.

  • So if you had a linear model that you

  • wanted to train at large scale, you

  • used this piece of infrastructure.

  • We had a different piece of infrastructure

  • for large scale neural networks.

  • But product teams usually don't have one kind of a problem.

  • And they usually want to train multiple types of models.

  • So if they wanted to train linear [INAUDIBLE] models,

  • they had to use two entirely different technology stacks.

  • Now, with TensorFlow, as I'm sure you know,

  • we can actually express any kind of machine learning algorithm.

  • So we can train TensorFlow models

  • that are linear, that are deep, unsupervised and supervised.

  • We can train tree models.

  • And any single algorithm that you can think of either

  • has already been implemented in TensorFlow,

  • or is possible to be implemented in TensorFlow.

  • So building on top of that flexibility,

  • we have one platform that supports

  • all of these different use cases from all of our users.

  • And they don't have to switch between platforms just

  • because they want to implement different types of algorithms.

  • Another aspect of this is the input data.

  • Of course, also product teams don't only have image data,

  • or only have text data.

  • In some cases, they may even have both.

  • Right.

  • So they have models that take in both images and text,

  • and make a prediction.

  • So we needed to make sure that the platform that we built

  • supports all of these input modalities,

  • and can deal with images, text, sparse data

  • that you will find in logs, videos even.

  • And with a platform as flexible as this,

  • you can ensure that all of the users

  • can represent all of their use cases on the same platform,

  • and don't have to adopt different technologies.

  • The next aspects of flexibility is

  • how you actually run these pipelines

  • and how you train models.

  • So one very basic use case is you have all of your data

  • available.

  • You train your model once, and you're done.

  • This works really well for stationary problems.

  • A good example is always, you want

  • to train a model that classifies an image whether there's

  • a cat or a dog in that image.

  • Cats and dogs have looked the same for quite a while.

  • And they will look the same in 10 years,

  • or very much the same as today.

  • So that same model will probably work well in a couple of years.

  • So you don't need to keep that model fresh.

  • However, if you have a non stationary problem where

  • data changes over time, recommendation systems

  • have new types of products that you want to recommend,

  • new types of videos that get uploaded all the time, you

  • actually have to retrain these models, or keep them fresh.

  • So one way of doing this is to train a model

  • on a subset of your data.

  • Once you get new data, you throw that away.

  • You train a new model either on the superset, so on the old

  • and on the new data, or only on the fresh data, and so on.

  • Now, that has a couple of disadvantages.

  • One of them being that you throw away

  • learning from previous models.

  • In some cases, you're wasting resources,

  • because you actually have to retrain over the same data over

  • and over again.

  • And because a lot of these models

  • are actually not deterministic, you

  • may end up with vastly different models every time.

  • Because the way that they're being initialized,

  • you may end up in different optimum

  • every time you train these models.

  • So a more advanced way of doing this

  • is to start training with your data.

  • And then initialize your model from the previous weights

  • from these models and continue training.

  • So we call that warm starting of models that may seem trivial

  • if you just say, well, this is just

  • a continuation of your training run.

  • You just added more data and you continue.

  • But depending on your model architecture,

  • it's actually non-trivial.

  • Which in some cases, you may only

  • want to warm start embeddings.

  • So you may only want to transfer the weights of the embeddings

  • to a new model and initialize the rest of your network

  • randomly.

  • So there's a lot of different setups

  • that you can achieve with this.

  • But with this you can continuously

  • update your models.

  • You retain the learning from previous versions.

  • You can even, depending on how you set it up,

  • bias your model more on the more recent data.

  • But you're still not throwing away the old data.

  • And always have a fresh model that's updated for production.

  • The second principle is portability.

  • And there's a few aspects to this.

  • The first one is obvious.

  • So because we rely on TensorFlow,

  • we inherit the properties of TensorFlow,

  • which means you can already train your TensorFlow

  • models in different environments and on different machines.

  • So you can train a TensorFlow model locally.

  • You can distribute it in a cloud environment.

  • And by cloud, I mean any setup of multiple clusters.

  • It doesn't have to be a managed cloud.

  • You can train or perform inferences with your TensorFlow

  • models on the devices that you care about today.

  • And you can also train and deploy them on devices

  • that you may care about in the future.

  • Next is Apache Beam.

  • So when we open sourced a lot of our components

  • we faced the challenge that internally we

  • use a data processing engine that

  • allows us to run these large scale

  • data processing pipelines.

  • But in the open source world and in all of your companies,

  • you may use different data processing systems.

  • So we were looking for a portability layer.

  • And Apache beam provides us with that portability layer.

  • It allows us to express a data graph once with the Python SDK.

  • And then you can use different runners

  • to run those same data graphs in different environments.

  • The first one is a direct runner.

  • So that allows you to run these data graphs

  • on a single machine.

  • There's also the one that's being used in notebooks.

  • So I'll come back to that later, but we

  • want to make sure that all of our tools

  • work in notebook environments, because we know that that's

  • where data scientists start.

  • Then there's a data flow runner, with which

  • you can run these same pipelines at scale

  • on the cloud's dataflow in this case.

  • There's a Flink runner that's being developed

  • right now by the community.

  • There's a [INAUDIBLE] ticket that you can follow

  • for the status updates on this.

  • I'm being told it's going to be ready at some point

  • later this year.

  • And the community is also working on more runners

  • so that these pipelines are becoming more portable

  • and can be run in more different environments.

  • In terms of cluster management and managing your resources,

  • we work very well together with Kubernetes and the KubeFlow

  • project, which actually is the next talk right after mine.

  • And if you're familiar with Kubernetes,

  • there's something called Minikube,

  • with which you can deploy your Kubernetes

  • setup on a single machine.

  • Of course, there's managed Kubernetes solutions

  • such as GKE.

  • You can run your own Kubernetes cluster

  • if you want to, on prem.

  • And, again, we inherit the portability aspects

  • of Kubernetes.

  • Another extremely important aspect is scalability.

  • And I've alluded to it before.

  • I'm sure many of you know the problem.

  • There's different roles in companies.

  • And some very commonly, data scientists work on-- sometimes

  • it's down sampled set of data on their local machines,

  • maybe on their laptop, in a notebook environment.

  • And then there's data engineers or product software engineers

  • who actually either take the models that

  • were developed by data scientists

  • and deploy them in production.

  • Or they're trying to replicate what

  • data scientists did with different frameworks,

  • because they work with a different toolkit.

  • And there's this almost impenetrable wall

  • between those two.

  • Because they use different toolsets.

  • And there is a lot of friction in terms

  • of translating from one toolset to the other,

  • or actually deploying these things from the data science

  • process to the production process.

  • And if you've heard the term, throw

  • over the wall, that usually does not have good connotations.

  • But that's exactly what's happening.

  • So when we built TFX we paid particular attention

  • to make sure that all of the toolsets we build

  • are usable at a small scale.

  • So you will see from my demos, all of our tools

  • work in a notebook environment.

  • And they work on a single machine with small datasets.

  • And in many cases, or actually in all cases,

  • the same code that you run on a single machine

  • scales up to large workloads in a distributed cluster.

  • And the reason why this is extremely important

  • is there's no friction to go from experimentation

  • on a small machine to a large cluster.

  • And you can actually bring those different functions together,

  • and have data scientists and data engineers work together

  • with the same tools on the same problems,

  • and not have to dwell in between them.

  • The next principle is interactivity.

  • So the machine learning process is not a straight line.

  • At many points in this process you actually

  • have to interact with your data, understand your data,

  • and make changes.

  • So this visualization is called Facets.

  • And it allows you to investigate your data, and understand it.

  • And, again, this works at scale.

  • So sometimes when I show these screenshots,

  • they may seem trivial when you think

  • about small amounts of data that fit into a single machine.

  • But if you have terabytes of data,

  • and you want to understand them, it's less trivial.

  • And on the other side--

  • I'm going to talk about this in more detail later--

  • this is a visualization we have to actually understand how

  • your models perform at scale.

  • This is a screen capture from TensorFlow Model Analysis.

  • And by following these principles,

  • we've built a platform that has had a profound impact on Google

  • and the products that we build.

  • And it's really being used across many of our Alphabet

  • companies.

  • So Google, of course, is only one company

  • under the Alphabet umbrella.

  • And within Google, all of our major products

  • are using TensorFlow Extended to actually deploy machine

  • learning in their products.

  • So with this, let's look at a quick overview.

  • I'm going to take questions later, if it's possible.

  • Let's look at a quick overview of the things

  • that we've open sourced yet.

  • So this is the familiar graph that you've seen before.

  • And I'm just going to turn all of these boxes blue

  • and talk about each one of those.

  • So data transformation we have open sourced

  • as TensorFlow Transform.

  • TensorFlow Transform allows you to express your data

  • transformation as a TensorFlow graph,

  • and actually apply these transformations at training

  • and at serving time.

  • Now, again, this may sound trivial,

  • because you can already express your transformations

  • with a TensorFlow graph.

  • However, if your transformations require

  • an analyze phase of your data, it's less trivial.

  • And the easiest example for this is mean normalization.

  • So if you want to mean normalize a feature,

  • you have to compute the mean and the standard deviation

  • over your data.

  • And then you need to subtract the mean and divide

  • by standard deviation.

  • Right.

  • If you work on a laptop with a dataset that's a few gigabytes,

  • you can do that with NumPy and everything is great.

  • However, if you have terabytes of data,

  • and you actually want to replicate these

  • transformations in serving time, it's less trivial.

  • So Transform provides you with utility functions.

  • And for mean normalization there's

  • one that's called Scale to Z-score that is a one liner.

  • So you can say, I want to scale this feature such

  • that it has a mean of zero and a standard deviation of one.

  • And then Transform actually creates a Beam graph for you

  • that computes these metrics over your data.

  • And then Beam handles computing those metrics

  • over your entire dataset.

  • And then Transform injects the results of this analyze phase

  • as a constant in your TensorFlow graph,

  • and creates a TensorFlow graph that

  • does the computation needed.

  • And the benefit of this is that this TensorFlow graph that

  • expresses this transformation can now

  • be carried forward to training.

  • So training time, you applied those transformations

  • to your training data.

  • And the exact same graph is also applied to the inference graph,

  • such that at inference time the exact same transformations

  • are being done.

  • Now, that basically eliminates training serving skew,

  • because now you can be entirely sure

  • that the exact same transformations is

  • being applied.

  • It eliminates the need for you to have code

  • in your serving system that tries to replicate this

  • transformation, because usually the code paths that you

  • use in your training pipelines are different from the ones

  • that you use in your serving system,

  • because that's very low latency.

  • Here's just a code snippet of how such a pre processing

  • function can look like.

  • I just spoke about scaling to the Z-score.

  • So that's mean normalization.

  • String_to_int is another very common transformation

  • that does string to integer mapping by creating a vocab.

  • And bucketizing a feature, again,

  • is also a very common transformation

  • that requires an analyze phase over your data.

  • And all of these examples are relatively simple.

  • But just think about one of the more advanced use cases

  • where you can actually chain together transforms.

  • You can do a transform of your already transformed feature.

  • And Transform actually handles all of these for you.

  • So there's a few common use cases.

  • I've talked about scaling and bucketization.

  • Text transformations are very common.

  • So if you want to compute ngrams,

  • you can do that as well.

  • And the particularly interesting one

  • is actually applying a safe model.

  • And applying a safe model in Transform

  • takes an already trained or created TensorFlow model

  • and applies it as a transformation.

  • So you can imagine if one of your inputs is an image,

  • and you want to apply an inception model to that image

  • to create an input for your model,

  • you can do that with that function.

  • So you can actually embed other TensorFlow models

  • as transformations in your TensorFlow model.

  • And all of this is available on TensorFlow/Transform on GitHub.

  • Next, we talk about the trainer.

  • And the trainer is really just TensorFlow.

  • We're going to talk about the Estimate API and the Keras API.

  • This is just a code snippet that shows you how

  • to train a wide and deep model.

  • A wide and deep model combines a deep [INAUDIBLE],,

  • just a [INAUDIBLE] of a network, and the linear part together.

  • And in the case of this estimator,

  • it's a matter of instantiating this estimator.

  • And then the Estimate API is relatively straightforward.

  • There's a train method that you can call to train the model.

  • And the estimators that are up here

  • are the ones that are in core TensorFlow.

  • So if you just install TensorFlow,

  • you get DNNs, Linear, DNN and Linear combined, and boosted

  • trees, which is a great [INAUDIBLE]

  • tree implementation.

  • But if you want to do some searching

  • in TensorFlow Contrib, or in other repositories

  • under the TensorFlow [INAUDIBLE] on GitHub,

  • you will find many, many more implementations

  • of very common architectures with the estimator framework.

  • Now, the estimator, there's a method

  • that's currently in Contrib.

  • But it will move to the Estimate API with 2.0.

  • It has a method called Export Safe Models.

  • And that actually exports a TensorFlow graph

  • as a safe model, such that it can

  • be used by a TensorFlow model analysis in TensorFlow Survey.

  • This is just a code snippet from one

  • of our examples of how this looks.

  • For an actual example, in this case,

  • it's the Chicago taxi dataset.

  • We just instantiated the non-linear combined classifier,

  • called train, and exported it for use

  • by downstream components.

  • Using tf.Keras, it looks very similar.

  • So in this case, we used the Keras sequential API,

  • where you can configure the layers of your network.

  • And the Keras API is also getting

  • a method called Save Keras model that

  • exports the same format, which is the safe model, such

  • that it can be used again by downstream components.

  • Model evaluation validation is open sourced as TensorFlow

  • model analysis.

  • And that takes that graph as an input.

  • So the graph that we just exported

  • from our estimator or Keras model flows as an input

  • into TFMA.

  • And TFMA computes evaluation statistics

  • at scale in a sliced manner.

  • So now, this is another one of those examples where

  • you may say, well, I already get my metrics from TensorBoard.

  • TensorBoard metrics are computed in a streaming manner

  • during training on minute batches.

  • TFMA uses Beam pipelines to compute

  • metrics in an exact manner with one pass over all of your data.

  • So if you want to compute your metrics or a terabyte of data

  • within exactly one pass, you can use TFMA.

  • Now, in this case, you run TFMA for that model

  • and some dataset.

  • And if you just call this method called random slicing metrics

  • with the result by itself, the visualization looks like this.

  • And I pulled this up for one reason.

  • And that reason is just to highlight

  • what we mean by sliced metrics.

  • This is the metric that you may be

  • used to when someone trains a model and tells you, well,

  • my model has a 0.94 accuracy, or a 0.92 AUC.

  • That's an overall metric.

  • Over all of your data, it's the aggregate

  • of those metrics for your entire model.

  • That may tell you that the model is doing well on average,

  • but it will not tell you how the model is

  • doing on specific slices of your data.

  • So if you, instead, render those slices for a specific feature--

  • in this case we actually sliced these metrics

  • by trip start hour--

  • so, again, this is from the Chicago taxicab dataset.

  • You actually get a visualization in which you can now--

  • in this case, we look at a histogram and [INAUDIBLE]

  • metric.

  • We filter for buckets that only have 100 examples so

  • that we don't get low buckets.

  • And then you can actually see here

  • how the model performs on different slices of feature

  • values for a specific trip start hour.

  • So this particular model is trained to predict

  • whether a tip is more or less than 20%.

  • And you've seen overall it has a very high accuracy, and very

  • high AUC.

  • But it turns out that on some of these slices,

  • it actually performs poorly.

  • So if the trip start hour is seven, for some reason

  • the model doesn't really have a lot of predictive power

  • whether the tip is going to be good or bad.

  • Now, that's informative to know.

  • Because maybe that's just because there's

  • more variability at that time.

  • Maybe we don't have enough data during that time.

  • So that's really a very powerful tool

  • to help you understand how your model performs.

  • Some other visualizations that are available in TFMA

  • are shown here.

  • We haven't shown that in the past.

  • So the calibration plot, which is the first one,

  • shows you how your model predictions

  • behave against the label.

  • And you would want your model to be well calibrated,

  • and not to be over or under predicting in a specific area.

  • The prediction distribution just shows you

  • that this distribution, precision recall, and our C

  • curves are commonly known.

  • And, again, this is the plot for overall.

  • So this is the entire model and the entire eval dataset.

  • And, again, if you specify a slice here,

  • you can actually get the same visualization only

  • for a specific slice of your features.

  • And another really nice feature is

  • that if you have multiple models or multiple eval sets

  • over time, you can visualize them in a time series.

  • So in this case, we have three models.

  • And for all of these three models,

  • we show accuracy and AUC.

  • And you can imagine if you have long running training jobs,

  • and as I mentioned earlier, in some cases you want to refresh

  • your model regularly.

  • And you train a new model every day for a year,

  • you end up with 365 models, and you can

  • see how it performs over time.

  • So this product is called TensorFlow Model analysis.

  • And it's also available on GitHub.

  • And everything that I've just shown you

  • is already open sourced.

  • So next serving, which is called TensorFlow Serving.

  • So serving is one of those other areas where

  • it's relatively easy to set something up

  • that performs inference with your machine learning models.

  • But it's harder to do this at scale.

  • So some of the most important features of TensorFlow Serving

  • is that it's able to deal with multiple models.

  • And this is mostly used for actually upgrading a model

  • version.

  • So if you are serving a model, and you

  • want to update that model to a new version,

  • that server needs to load a new version at the same time,

  • and then switch over to request to that new version.

  • That's also where isolation comes in.

  • You don't want that process of loading a new model to actually

  • impact the current model serving requests, because that

  • would hurt performance.

  • There's batching implementations in TensorFlow Serving

  • that make sure that throughput is optimized.

  • In most cases when you have a high requests

  • per second service, you actually don't want to perform inference

  • on a batch of size one.

  • You can actually do dynamic batching.

  • And TensorFlow Serving is adopted, of course,

  • widely within Google, and also outside of Google.

  • There's a lot of companies that have started

  • using TensorFlow Serving.

  • What does this look like?

  • Again, the same graph that we've exported

  • from either our estimator or our Keras model

  • goes into the TensorFlow model server.

  • TensorFlow Serving comes as your library.

  • So you can build your own server if you want,

  • or you can use the libraries to perform inference.

  • We also ship a binary.

  • And this is the command of how you would just

  • run the binary, tell it what port to listen to,

  • and what model to load.

  • And in this case, it will load that model

  • and bring up that server.

  • And this is a code snippet again from our Chicago text

  • example of how you put together a request

  • and make, in this case, a GRPC call to that server.

  • Now, not everyone is using GRPC, for whatever reason.

  • So we built a REST API.

  • That was the top request on GitHub for a while.

  • And we built it such that the TensorFlow model

  • server binary ships with both the GRPC and the REST API.

  • And it supports the same APIs as the GRPC one.

  • So this is what the API looks like.

  • So you specify the model name.

  • And, as I just mentioned, it also supports classify,

  • regress, and predict.

  • And here's just two examples of an [? iris ?]

  • model with the classify API, or an [INAUDIBLE] model

  • with a particular API.

  • Now, one of the things that this enables

  • is that instead of Proto3 JSON, which

  • is a little more verbose than most people would like,

  • you can actually now use Idiomatic JSON.

  • That seems more intuitive to a lot of developers

  • that are more used to this.

  • And as I just mentioned, the model server ships

  • with this by default. So when you bring up

  • the TensorFlow model server, you just specify the REST API port.

  • And then, in this case, this is just

  • an example of how you can make a request to this model

  • from the command line.

  • Last time I spoke about this was earlier this year.

  • And I had to make an announcement

  • that it will be available.

  • But now we've made that available earlier this year.

  • So all of this is now in our GitHub repository

  • for you to use.

  • Now, what does that look like if we put all of this together?

  • It's relatively straightforward.

  • So in this case, you start with the training data.

  • You use TensorFlow Transform to express your transform graph

  • that will actually deal with the analyze phase

  • to compute the metrics.

  • It will output the transform graph itself.

  • And, in some cases, you can also materialize the transform data.

  • Now, why would you want to do that?

  • You pay the cost of materializing your data again.

  • In some cases, where throughput for the model at training time

  • is extremely important, namely when

  • you use hardware accelerators, you

  • may actually want to materialize expensive transformations.

  • So if you use GPUs or TPUs, you may

  • want to materialize all of you transforms

  • such that at training time, you can feed the model

  • as fast as you can.

  • Now, from there you can use an estimator or Keras model,

  • as I just showed you, to export your eval graph

  • and your inference graph.

  • And that's the API that connects the trainer with TensorFlow

  • Model Analysis and TensorFlow Serving.

  • So all of this works today.

  • I'll have a link for you in a minute that

  • has an end to end example of how you use all of these products

  • together.

  • As I just mentioned earlier, for us

  • it's extremely important that these products

  • work in a notebook environment, because we really

  • think that that barrier between data scientists and product

  • engineers, or data engineers, should not be there.

  • So you can use all of this in a notebook,

  • and then use the same code to go deploy

  • it in a distributed manner on a cluster.

  • For the Beam runner, as I mentioned,

  • you can run it on a local machine in a notebook

  • and on the Cloud Dataflow.

  • The Flink runner is in progress.

  • And there's also plans to develop a Spark

  • runner so that you can deploy these pipelines on Spark as

  • well.

  • This is the link to the end to end example.

  • You will find it currently lives in the TensorFlow Model

  • Analysis repo.

  • So you will find it on GitHub there,

  • or you can use that short link that takes you directly to it.

  • But then I hear some people saying, wait.

  • Actually, we want more.

  • And I totally understand why you would want more, because maybe

  • you've read that paper.

  • And you've certainly seen that graph,

  • because it was in a lot of the slides that I just showed you.

  • And we just talked about four of these things.

  • Right.

  • But what about the rest.

  • And as I mentioned earlier, it's extremely important

  • to highlight that these are just some of the libraries

  • that we use.

  • This is far from actually being an integrated platform.

  • And as a result, if you actually use these together,

  • you will see in the end to end example it works really well.

  • But it can be much, much easier once they're integrated.

  • And actually there is a layer that

  • pulls all of these components together and makes it

  • a good end to end experience.

  • So I've announced before that we will

  • release next the components for data analysis and validation.

  • There's not much more I can say about this today other

  • than these will be available really, really soon.

  • And I'll leave it at that.

  • And then after that, the next phase

  • is actually the framework that pulls all of these components

  • together.

  • That actually will make it much, much easier

  • to configure these pipelines, because then there's

  • going to be a shared configuration

  • layer to configure all of these components

  • and actually pull all of them together, such

  • that they work as a pipeline, and not

  • as individual components.

  • And I think you get the idea.

  • So we are really committed to making

  • all of this available to the community, because we've

  • seen the profound impact that it has had

  • at Google and for our products.

  • And we are really excited to see what you

  • can do with them in your space.

  • So these are just the GitHub links of the products

  • that I just discussed.

  • And, again, all of the things that I showed you today

  • are already available.

  • Now, because we have some time, I

  • can also talk about TensorFlow Hub.

  • And TensorFlow Hub is a library that

  • enables you to publish, consume, and discover

  • what we call modules.

  • And I'm going to come to what we mean by modules,

  • but it's really reusable parts of machine learning models.

  • And I'm going to start with some history.

  • And I think a lot of you can relate to this.

  • I've actually heard the talk today that mentioned

  • some of these aspects.

  • In some ways, machine learning and machine learning tools

  • are 10, 15 years behind the tools

  • that we use for software engineering.

  • Software engineering has seen rapid growth

  • in the last decade.

  • And as there was a lot of growth,

  • and as more and more developers started working together,

  • we built tools and systems that made collaboration much more

  • efficient.

  • We built version control.

  • We built continuous integration.

  • We built code repositories.

  • Right.

  • And machine learning is now going through that same growth.

  • And more and more people want to deploy machine learning.

  • But we are now rediscovering some of these challenges

  • that we've seen with software engineering.

  • What is the version control equivalent for these machine

  • learning pipelines?

  • And what is the code repository equivalent?

  • Well, the code repository is the one

  • that I'm going to talk to you about right now for TensorFlow

  • Hub.

  • So code repositories are an amazing thing,

  • because they enable a few really good practices.

  • The first one is, if, as an engineer, I want to write code,

  • and I know that there's a shared repository,

  • usually I would look first if it has already been implemented.

  • So I would search on GitHub or somewhere else

  • to actually see if someone has already implemented the thing

  • that I'm going to build.

  • Secondly, if I know that I'm going

  • to publish my code on a code repository,

  • I may make different design decisions.

  • I may build it in such a way that it's more reusable

  • and that's more modular.

  • Right.

  • And that usually leads to better software in general.

  • And in general, it also increases velocity

  • of the entire community.

  • Right.

  • Even if it's a private repository within a company,

  • if it's a public repository and open source, such as GitHub,

  • code sharing is usually a good thing.

  • Now, TensorFlow Hub is the equivalent

  • for machine learning.

  • In machine learning, you also have code.

  • You have data.

  • You have models.

  • And you would want a central repository

  • that allows you to share these reusable parts of machine

  • learning between developers, and between teams.

  • And if you think about it, in machine learning

  • it's even more important than in software engineering.

  • Because machine learning models are much,

  • much more than just code.

  • Right.

  • So there's the algorithm that goes into these models.

  • There's the data.

  • There's the compute power that was used to train these models.

  • And then there's the expertise of people

  • that built these models that is scarce today.

  • And I just want to reiterate this point.

  • If you share a machine learning model,

  • what you're really sharing is a combination of all of these.

  • If I spent 50,000 GPU hours to train an embedding,

  • and share it with TensorFlow Hub,

  • everyone who uses that embedding can benefit

  • from that compute power.

  • They don't have to go recompute that same model

  • and those same data.

  • Right.

  • So all of these four ingredients come together

  • in what we call a module.

  • And module is the unit that we care

  • about that can be published in TensorFlow Hub,

  • and that can now be reused by different people

  • in different models.

  • And those modules are TensorFlow graphs.

  • And they can also contain weights.

  • So what that means is they give you

  • a reusable piece of TensorFlow graph

  • that has the trained knowledge of the data

  • and the algorithm embedded in it.

  • And those modules are designed to be composable so they have

  • common signatures such that they can be

  • attached to different models.

  • They're reusable.

  • So they come with the graph and the weights.

  • And importantly, they're also retrainable.

  • So you can actually back propagate

  • through these modules.

  • And once you attach them to your model,

  • you can customize them to your own data an

  • to your own use case.

  • So let's go through a quick example

  • for text classification.

  • Let's say I'm a startup and I want

  • to build a new model that takes restaurant reviews

  • and tries to predict whether they are positive or negative.

  • So in this case, we have a sentence.

  • And if you've ever tried to train some of these text

  • models, you know that you need a lot of data

  • to actually learn a good representation of text.

  • So in this case we would just want to put in a sentence.

  • And we want to see if it's positive or negative.

  • And we want to reuse the code in the graph.

  • We want to reuse the trained weights

  • from someone else who's done the work before us.

  • And we also don't want to do this with fewer data

  • than is usually needed.

  • An example of these text modules that are already published

  • are TensorFlow Hub are the Universal Sentence Encoder.

  • There's language models.

  • And we've actually added more languages to these.

  • Word2vec is a very popular type of model as well.

  • And the key idea behind TensorFlow Hub,

  • similarly to code repositories, is that the latest research

  • can be shared with you as fast as possible, and as

  • easy as possible.

  • So the use of our Universal Sentence Encoder paper

  • was published by some researchers at Google.

  • And in that paper, the authors actually

  • included a link to TensorFlow Hub

  • with the embedding for that Universal Sentence Encoder.

  • That link is like a handle that you can use.

  • So in your code now, you actually

  • want to train a model that uses this embedding.

  • In this case, we train a DNN classifier.

  • It's one line to say, I want to pull from TensorFlow Hub

  • a text embedding column with this module.

  • And let's take a quick look of what that handle looks like.

  • So the first part is just a TF Hub domain.

  • All of the modules that we publish,

  • Google and some of our partners publish,

  • will show up on TFHub.dev.

  • The second part is the author.

  • So in this case, Google published this embedding.

  • Universal Sentence Encoder is the name of this embedding.

  • And then the last piece is the version.

  • Because TensorFlow Hub modules are immutable.

  • So once they're uploaded, they can't change,

  • because you wouldn't want a module

  • to change underneath you.

  • If you want to retrain a model, that's

  • not really good for reproducibility.

  • So if and when we upload a new version of the Universal

  • Sentence Encoder, this version will increment.

  • And then you can change to the new code as well.

  • But just to reiterate this point,

  • this is one line to pull this embedding column

  • from TensorFlow Hub, and uses it as an input to your DNN

  • classifier.

  • And now you've just basically benefited

  • from the expertise and the research that

  • was published by the Google Research

  • team for text embeddings.

  • I just mentioned earlier that these modules are retrainable.

  • So if you set retrainable true, now the model

  • will actually back propagate through this embedding

  • and update it as you train with your own data.

  • Because in many cases, of course,

  • you still have some small amount of data

  • that you want to train on, such that the model adapts

  • to your specific use case.

  • And if you take the same URL, the same handle, and type it

  • in your browser, you end up on the TensorFlow website,

  • and see that documentation for that same module.

  • So that same handle that you saw in the paper,

  • you can use in your code as a one liner

  • to use this embedding, and you can

  • put in your browser to see documentation

  • for this embedding.

  • So the short version of the story

  • is that TensorFlow Hub really is the repository

  • for reusable machine learning models and modules.

  • We have already published a large number of these modules.

  • So the text modules are just one example that I just showed you.

  • We have a large number of image embeddings

  • that are both cutting edge.

  • So there's a [? neural ?] architecture search module

  • that's available.

  • There's also some modules available for image

  • classification that are optimized for devices so

  • that you can use them on a small device.

  • And we are also working hard to keep publishing

  • more and more of these modules.

  • So in addition to Google, we now have

  • some modules that have been published by DeepMind.

  • And we are also working with the community

  • to get more and more modules up there.

  • And, again, this is available on GitHub.

  • You can use this today.

  • And a particularly interesting aspect

  • that we haven't highlighted so far,

  • but it's extremely important, is that you can use the TensorFlow

  • Hub libraries also to store and consume your own modules.

  • So you don't have to rely on the TensorFlow Hub platform

  • and use the modules that we have published.

  • You can internally enable your developers

  • to write out modules to disc with some shared storage.

  • And other developers can consume those modules.

  • And in that case, instead of that

  • handle that I just showed you, you

  • would just use the path to those modules.

  • And that concludes my talk.

  • I will go up to the TensorFlow booth

  • to answer any of your questions.

  • Thanks.

  • [CLAPPING]

CLEMENS MEWALD: Hi, everyone.

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it