Subtitles section Play video Print subtitles CLEMENS MEWALD: Hi, everyone. My name is Clemens. I'm a product manager in Google Research. And today I'm going to talk about TensorFlow Extended, which is a machine learning platform that we built around TensorFlow at Google. And I'd like to start this talk with a block diagram and the small yellow box, or orange box. And that box basically represents what most people care about and talk about when they talk about machine learning. It's the machine learning algorithm. It's the structure of the network that you're training, how you choose what type of machine learning problem you're solving. And that's what you talk about when you talk about TensorFlow and using TensorFlow. However, in addition to the actual machine learning, and to TensorFlow itself, you have to care about so much more. And these are all of these other things around the actual machine learning algorithm that you have to have in place, and that you actually have to nail and get right in order to actually do machine learning in a production setting. So you have to care about where you get your data from, that your data are clean, how you transform them, how you train your model, how to validate your model, how to push it out into a production setting, and deploy it at scale. Now, some of you may be thinking, well, I don't really need all of this. I only have my small machine learning problem. I can live within that small orange box. And I don't really have these production worries as of today. But I'm going to propose that all of you will have that problem at some point in time. Because what I've seen time and time again is that research and experimentation today is production tomorrow. It's like research and experimentation never ends just there. Eventually it will become a production model. And at that point, you actually have to care about all of these things. Another side of this coin is scale. So some of you may say, well, I do all of my machine learning on a local machine, in a notebook. Everything fits into memory. I don't need all of these heavy tools to get started. But similarly, small scale today is large scale tomorrow. At Google we have this problem all the time. That's why we always design for scale from day one, because we always have product teams that say, well, we have only a small amount of data. It's fine. But then a week later the product picks up. And suddenly they need to distribute the workload to hundreds of machines. And then they have all of these concerns. Now, the good news is that we built something for this. And TFX is the solution to this problem. So this is a block diagram that we published in one of our papers that is a very simplistic view of the platform. But it gives you a broad sense of what the different components are. Now, TFX is a very large platform. And it contains a lot of components and a lot of services. So the paper that we published, and also what I'm going to discuss today, is only a small subset of this. But building TFX and deploying it at Google has had a profound impact of how fast product teams at Google can train machine learning models and deploy them in production, and how ubiquitous machine learning has become at Google. You'll see later I have a slide to give you some sense of how widely TFX is being used. And it really has accelerated all of our efforts to being an AI first company and using machine learning in all of our products. Now, we use TFX broadly at Google. And we are very committed to make all of this available to you through open sourcing it. So the boxes that are just highlighted in blue are the components that we've already open sourced. Now, I want to highlight an important thing. TFX is a real solution for real problems. Sometimes people ask me, well, is this the same code that you use at Google for production? Or did you just build something on the side and open source it? And all of these components are the same code base that we use internally for our production pipelines. Of course, there's some things that are Google specific for our deployments. But all of the code that we open source is the same code that we actually run in our production systems. So it's really code that solves real problems for Google. The second part to highlight is so far we've only open sourced libraries, so each one of these libraries that you can use. But you still have to glue them together. You still have to write some code to make them work in a joint manner. That's just because we haven't open sourced the full platform yet. We're actively working on this. But I would say so far we're about 50% there. So these blue components are the ones that I'm going to talk about today. But first, let me talk about some of the principles that we followed when we developed TFX. Because I think it's very informative to see how we think about these platforms, and how we think about having impact at Google. The first principle is flexibility. And there's some history behind this. And the short version of that history is that I'm sure at other companies as well there used to be problem specific machine learning platforms. And just to be concrete, so we had a platform that was specifically built for large scale linear models. So if you had a linear model that you wanted to train at large scale, you used this piece of infrastructure. We had a different piece of infrastructure for large scale neural networks. But product teams usually don't have one kind of a problem. And they usually want to train multiple types of models. So if they wanted to train linear [INAUDIBLE] models, they had to use two entirely different technology stacks. Now, with TensorFlow, as I'm sure you know, we can actually express any kind of machine learning algorithm. So we can train TensorFlow models that are linear, that are deep, unsupervised and supervised. We can train tree models. And any single algorithm that you can think of either has already been implemented in TensorFlow, or is possible to be implemented in TensorFlow. So building on top of that flexibility, we have one platform that supports all of these different use cases from all of our users. And they don't have to switch between platforms just because they want to implement different types of algorithms. Another aspect of this is the input data. Of course, also product teams don't only have image data, or only have text data. In some cases, they may even have both. Right. So they have models that take in both images and text, and make a prediction. So we needed to make sure that the platform that we built supports all of these input modalities, and can deal with images, text, sparse data that you will find in logs, videos even. And with a platform as flexible as this, you can ensure that all of the users can represent all of their use cases on the same platform, and don't have to adopt different technologies. The next aspects of flexibility is how you actually run these pipelines and how you train models. So one very basic use case is you have all of your data available. You train your model once, and you're done. This works really well for stationary problems. A good example is always, you want to train a model that classifies an image whether there's a cat or a dog in that image. Cats and dogs have looked the same for quite a while. And they will look the same in 10 years, or very much the same as today. So that same model will probably work well in a couple of years. So you don't need to keep that model fresh. However, if you have a non stationary problem where data changes over time, recommendation systems have new types of products that you want to recommend, new types of videos that get uploaded all the time, you actually have to retrain these models, or keep them fresh. So one way of doing this is to train a model on a subset of your data. Once you get new data, you throw that away. You train a new model either on the superset, so on the old and on the new data, or only on the fresh data, and so on. Now, that has a couple of disadvantages. One of them being that you throw away learning from previous models. In some cases, you're wasting resources, because you actually have to retrain over the same data over and over again. And because a lot of these models are actually not deterministic, you may end up with vastly different models every time. Because the way that they're being initialized, you may end up in different optimum every time you train these models. So a more advanced way of doing this is to start training with your data. And then initialize your model from the previous weights from these models and continue training. So we call that warm starting of models that may seem trivial if you just say, well, this is just a continuation of your training run. You just added more data and you continue. But depending on your model architecture, it's actually non-trivial. Which in some cases, you may only want to warm start embeddings. So you may only want to transfer the weights of the embeddings to a new model and initialize the rest of your network randomly. So there's a lot of different setups that you can achieve with this. But with this you can continuously update your models. You retain the learning from previous versions. You can even, depending on how you set it up, bias your model more on the more recent data. But you're still not throwing away the old data. And always have a fresh model that's updated for production. The second principle is portability. And there's a few aspects to this. The first one is obvious. So because we rely on TensorFlow, we inherit the properties of TensorFlow, which means you can already train your TensorFlow models in different environments and on different machines. So you can train a TensorFlow model locally. You can distribute it in a cloud environment. And by cloud, I mean any setup of multiple clusters. It doesn't have to be a managed cloud. You can train or perform inferences with your TensorFlow models on the devices that you care about today. And you can also train and deploy them on devices that you may care about in the future. Next is Apache Beam. So when we open sourced a lot of our components we faced the challenge that internally we use a data processing engine that allows us to run these large scale data processing pipelines. But in the open source world and in all of your companies, you may use different data processing systems. So we were looking for a portability layer. And Apache beam provides us with that portability layer. It allows us to express a data graph once with the Python SDK. And then you can use different runners to run those same data graphs in different environments. The first one is a direct runner. So that allows you to run these data graphs on a single machine. There's also the one that's being used in notebooks. So I'll come back to that later, but we want to make sure that all of our tools work in notebook environments, because we know that that's where data scientists start. Then there's a data flow runner, with which you can run these same pipelines at scale on the cloud's dataflow in this case. There's a Flink runner that's being developed right now by the community. There's a [INAUDIBLE] ticket that you can follow for the status updates on this. I'm being told it's going to be ready at some point later this year. And the community is also working on more runners so that these pipelines are becoming more portable and can be run in more different environments. In terms of cluster management and managing your resources, we work very well together with Kubernetes and the KubeFlow project, which actually is the next talk right after mine. And if you're familiar with Kubernetes, there's something called Minikube, with which you can deploy your Kubernetes setup on a single machine. Of course, there's managed Kubernetes solutions such as GKE. You can run your own Kubernetes cluster if you want to, on prem. And, again, we inherit the portability aspects of Kubernetes. Another extremely important aspect is scalability. And I've alluded to it before. I'm sure many of you know the problem. There's different roles in companies. And some very commonly, data scientists work on-- sometimes it's down sampled set of data on their local machines, maybe on their laptop, in a notebook environment. And then there's data engineers or product software engineers who actually either take the models that were developed by data scientists and deploy them in production. Or they're trying to replicate what data scientists did with different frameworks, because they work with a different toolkit. And there's this almost impenetrable wall between those two. Because they use different toolsets. And there is a lot of friction in terms of translating from one toolset to the other, or actually deploying these things from the data science process to the production process. And if you've heard the term, throw over the wall, that usually does not have good connotations. But that's exactly what's happening. So when we built TFX we paid particular attention to make sure that all of the toolsets we build are usable at a small scale. So you will see from my demos, all of our tools work in a notebook environment. And they work on a single machine with small datasets. And in many cases, or actually in all cases, the same code that you run on a single machine scales up to large workloads in a distributed cluster. And the reason why this is extremely important is there's no friction to go from experimentation on a small machine to a large cluster. And you can actually bring those different functions together, and have data scientists and data engineers work together with the same tools on the same problems, and not have to dwell in between them. The next principle is interactivity. So the machine learning process is not a straight line. At many points in this process you actually have to interact with your data, understand your data, and make changes. So this visualization is called Facets. And it allows you to investigate your data, and understand it. And, again, this works at scale. So sometimes when I show these screenshots, they may seem trivial when you think about small amounts of data that fit into a single machine. But if you have terabytes of data, and you want to understand them, it's less trivial. And on the other side-- I'm going to talk about this in more detail later-- this is a visualization we have to actually understand how your models perform at scale. This is a screen capture from TensorFlow Model Analysis. And by following these principles, we've built a platform that has had a profound impact on Google and the products that we build. And it's really being used across many of our Alphabet companies. So Google, of course, is only one company under the Alphabet umbrella. And within Google, all of our major products are using TensorFlow Extended to actually deploy machine learning in their products. So with this, let's look at a quick overview. I'm going to take questions later, if it's possible. Let's look at a quick overview of the things that we've open sourced yet. So this is the familiar graph that you've seen before. And I'm just going to turn all of these boxes blue and talk about each one of those. So data transformation we have open sourced as TensorFlow Transform. TensorFlow Transform allows you to express your data transformation as a TensorFlow graph, and actually apply these transformations at training and at serving time. Now, again, this may sound trivial, because you can already express your transformations with a TensorFlow graph. However, if your transformations require an analyze phase of your data, it's less trivial. And the easiest example for this is mean normalization. So if you want to mean normalize a feature, you have to compute the mean and the standard deviation over your data. And then you need to subtract the mean and divide by standard deviation. Right. If you work on a laptop with a dataset that's a few gigabytes, you can do that with NumPy and everything is great. However, if you have terabytes of data, and you actually want to replicate these transformations in serving time, it's less trivial. So Transform provides you with utility functions. And for mean normalization there's one that's called Scale to Z-score that is a one liner. So you can say, I want to scale this feature such that it has a mean of zero and a standard deviation of one. And then Transform actually creates a Beam graph for you that computes these metrics over your data. And then Beam handles computing those metrics over your entire dataset. And then Transform injects the results of this analyze phase as a constant in your TensorFlow graph, and creates a TensorFlow graph that does the computation needed. And the benefit of this is that this TensorFlow graph that expresses this transformation can now be carried forward to training. So training time, you applied those transformations to your training data. And the exact same graph is also applied to the inference graph, such that at inference time the exact same transformations are being done. Now, that basically eliminates training serving skew, because now you can be entirely sure that the exact same transformations is being applied. It eliminates the need for you to have code in your serving system that tries to replicate this transformation, because usually the code paths that you use in your training pipelines are different from the ones that you use in your serving system, because that's very low latency. Here's just a code snippet of how such a pre processing function can look like. I just spoke about scaling to the Z-score. So that's mean normalization. String_to_int is another very common transformation that does string to integer mapping by creating a vocab. And bucketizing a feature, again, is also a very common transformation that requires an analyze phase over your data. And all of these examples are relatively simple. But just think about one of the more advanced use cases where you can actually chain together transforms. You can do a transform of your already transformed feature. And Transform actually handles all of these for you. So there's a few common use cases. I've talked about scaling and bucketization. Text transformations are very common. So if you want to compute ngrams, you can do that as well. And the particularly interesting one is actually applying a safe model. And applying a safe model in Transform takes an already trained or created TensorFlow model and applies it as a transformation. So you can imagine if one of your inputs is an image, and you want to apply an inception model to that image to create an input for your model, you can do that with that function. So you can actually embed other TensorFlow models as transformations in your TensorFlow model. And all of this is available on TensorFlow/Transform on GitHub. Next, we talk about the trainer. And the trainer is really just TensorFlow. We're going to talk about the Estimate API and the Keras API. This is just a code snippet that shows you how to train a wide and deep model. A wide and deep model combines a deep [INAUDIBLE],, just a [INAUDIBLE] of a network, and the linear part together. And in the case of this estimator, it's a matter of instantiating this estimator. And then the Estimate API is relatively straightforward. There's a train method that you can call to train the model. And the estimators that are up here are the ones that are in core TensorFlow. So if you just install TensorFlow, you get DNNs, Linear, DNN and Linear combined, and boosted trees, which is a great [INAUDIBLE] tree implementation. But if you want to do some searching in TensorFlow Contrib, or in other repositories under the TensorFlow [INAUDIBLE] on GitHub, you will find many, many more implementations of very common architectures with the estimator framework. Now, the estimator, there's a method that's currently in Contrib. But it will move to the Estimate API with 2.0. It has a method called Export Safe Models. And that actually exports a TensorFlow graph as a safe model, such that it can be used by a TensorFlow model analysis in TensorFlow Survey. This is just a code snippet from one of our examples of how this looks. For an actual example, in this case, it's the Chicago taxi dataset. We just instantiated the non-linear combined classifier, called train, and exported it for use by downstream components. Using tf.Keras, it looks very similar. So in this case, we used the Keras sequential API, where you can configure the layers of your network. And the Keras API is also getting a method called Save Keras model that exports the same format, which is the safe model, such that it can be used again by downstream components. Model evaluation validation is open sourced as TensorFlow model analysis. And that takes that graph as an input. So the graph that we just exported from our estimator or Keras model flows as an input into TFMA. And TFMA computes evaluation statistics at scale in a sliced manner. So now, this is another one of those examples where you may say, well, I already get my metrics from TensorBoard. TensorBoard metrics are computed in a streaming manner during training on minute batches. TFMA uses Beam pipelines to compute metrics in an exact manner with one pass over all of your data. So if you want to compute your metrics or a terabyte of data within exactly one pass, you can use TFMA. Now, in this case, you run TFMA for that model and some dataset. And if you just call this method called random slicing metrics with the result by itself, the visualization looks like this. And I pulled this up for one reason. And that reason is just to highlight what we mean by sliced metrics. This is the metric that you may be used to when someone trains a model and tells you, well, my model has a 0.94 accuracy, or a 0.92 AUC. That's an overall metric. Over all of your data, it's the aggregate of those metrics for your entire model. That may tell you that the model is doing well on average, but it will not tell you how the model is doing on specific slices of your data. So if you, instead, render those slices for a specific feature-- in this case we actually sliced these metrics by trip start hour-- so, again, this is from the Chicago taxicab dataset. You actually get a visualization in which you can now-- in this case, we look at a histogram and [INAUDIBLE] metric. We filter for buckets that only have 100 examples so that we don't get low buckets. And then you can actually see here how the model performs on different slices of feature values for a specific trip start hour. So this particular model is trained to predict whether a tip is more or less than 20%. And you've seen overall it has a very high accuracy, and very high AUC. But it turns out that on some of these slices, it actually performs poorly. So if the trip start hour is seven, for some reason the model doesn't really have a lot of predictive power whether the tip is going to be good or bad. Now, that's informative to know. Because maybe that's just because there's more variability at that time. Maybe we don't have enough data during that time. So that's really a very powerful tool to help you understand how your model performs. Some other visualizations that are available in TFMA are shown here. We haven't shown that in the past. So the calibration plot, which is the first one, shows you how your model predictions behave against the label. And you would want your model to be well calibrated, and not to be over or under predicting in a specific area. The prediction distribution just shows you that this distribution, precision recall, and our C curves are commonly known. And, again, this is the plot for overall. So this is the entire model and the entire eval dataset. And, again, if you specify a slice here, you can actually get the same visualization only for a specific slice of your features. And another really nice feature is that if you have multiple models or multiple eval sets over time, you can visualize them in a time series. So in this case, we have three models. And for all of these three models, we show accuracy and AUC. And you can imagine if you have long running training jobs, and as I mentioned earlier, in some cases you want to refresh your model regularly. And you train a new model every day for a year, you end up with 365 models, and you can see how it performs over time. So this product is called TensorFlow Model analysis. And it's also available on GitHub. And everything that I've just shown you is already open sourced. So next serving, which is called TensorFlow Serving. So serving is one of those other areas where it's relatively easy to set something up that performs inference with your machine learning models. But it's harder to do this at scale. So some of the most important features of TensorFlow Serving is that it's able to deal with multiple models. And this is mostly used for actually upgrading a model version. So if you are serving a model, and you want to update that model to a new version, that server needs to load a new version at the same time, and then switch over to request to that new version. That's also where isolation comes in. You don't want that process of loading a new model to actually impact the current model serving requests, because that would hurt performance. There's batching implementations in TensorFlow Serving that make sure that throughput is optimized. In most cases when you have a high requests per second service, you actually don't want to perform inference on a batch of size one. You can actually do dynamic batching. And TensorFlow Serving is adopted, of course, widely within Google, and also outside of Google. There's a lot of companies that have started using TensorFlow Serving. What does this look like? Again, the same graph that we've exported from either our estimator or our Keras model goes into the TensorFlow model server. TensorFlow Serving comes as your library. So you can build your own server if you want, or you can use the libraries to perform inference. We also ship a binary. And this is the command of how you would just run the binary, tell it what port to listen to, and what model to load. And in this case, it will load that model and bring up that server. And this is a code snippet again from our Chicago text example of how you put together a request and make, in this case, a GRPC call to that server. Now, not everyone is using GRPC, for whatever reason. So we built a REST API. That was the top request on GitHub for a while. And we built it such that the TensorFlow model server binary ships with both the GRPC and the REST API. And it supports the same APIs as the GRPC one. So this is what the API looks like. So you specify the model name. And, as I just mentioned, it also supports classify, regress, and predict. And here's just two examples of an [? iris ?] model with the classify API, or an [INAUDIBLE] model with a particular API. Now, one of the things that this enables is that instead of Proto3 JSON, which is a little more verbose than most people would like, you can actually now use Idiomatic JSON. That seems more intuitive to a lot of developers that are more used to this. And as I just mentioned, the model server ships with this by default. So when you bring up the TensorFlow model server, you just specify the REST API port. And then, in this case, this is just an example of how you can make a request to this model from the command line. Last time I spoke about this was earlier this year. And I had to make an announcement that it will be available. But now we've made that available earlier this year. So all of this is now in our GitHub repository for you to use. Now, what does that look like if we put all of this together? It's relatively straightforward. So in this case, you start with the training data. You use TensorFlow Transform to express your transform graph that will actually deal with the analyze phase to compute the metrics. It will output the transform graph itself. And, in some cases, you can also materialize the transform data. Now, why would you want to do that? You pay the cost of materializing your data again. In some cases, where throughput for the model at training time is extremely important, namely when you use hardware accelerators, you may actually want to materialize expensive transformations. So if you use GPUs or TPUs, you may want to materialize all of you transforms such that at training time, you can feed the model as fast as you can. Now, from there you can use an estimator or Keras model, as I just showed you, to export your eval graph and your inference graph. And that's the API that connects the trainer with TensorFlow Model Analysis and TensorFlow Serving. So all of this works today. I'll have a link for you in a minute that has an end to end example of how you use all of these products together. As I just mentioned earlier, for us it's extremely important that these products work in a notebook environment, because we really think that that barrier between data scientists and product engineers, or data engineers, should not be there. So you can use all of this in a notebook, and then use the same code to go deploy it in a distributed manner on a cluster. For the Beam runner, as I mentioned, you can run it on a local machine in a notebook and on the Cloud Dataflow. The Flink runner is in progress. And there's also plans to develop a Spark runner so that you can deploy these pipelines on Spark as well. This is the link to the end to end example. You will find it currently lives in the TensorFlow Model Analysis repo. So you will find it on GitHub there, or you can use that short link that takes you directly to it. But then I hear some people saying, wait. Actually, we want more. And I totally understand why you would want more, because maybe you've read that paper. And you've certainly seen that graph, because it was in a lot of the slides that I just showed you. And we just talked about four of these things. Right. But what about the rest. And as I mentioned earlier, it's extremely important to highlight that these are just some of the libraries that we use. This is far from actually being an integrated platform. And as a result, if you actually use these together, you will see in the end to end example it works really well. But it can be much, much easier once they're integrated. And actually there is a layer that pulls all of these components together and makes it a good end to end experience. So I've announced before that we will release next the components for data analysis and validation. There's not much more I can say about this today other than these will be available really, really soon. And I'll leave it at that. And then after that, the next phase is actually the framework that pulls all of these components together. That actually will make it much, much easier to configure these pipelines, because then there's going to be a shared configuration layer to configure all of these components and actually pull all of them together, such that they work as a pipeline, and not as individual components. And I think you get the idea. So we are really committed to making all of this available to the community, because we've seen the profound impact that it has had at Google and for our products. And we are really excited to see what you can do with them in your space. So these are just the GitHub links of the products that I just discussed. And, again, all of the things that I showed you today are already available. Now, because we have some time, I can also talk about TensorFlow Hub. And TensorFlow Hub is a library that enables you to publish, consume, and discover what we call modules. And I'm going to come to what we mean by modules, but it's really reusable parts of machine learning models. And I'm going to start with some history. And I think a lot of you can relate to this. I've actually heard the talk today that mentioned some of these aspects. In some ways, machine learning and machine learning tools are 10, 15 years behind the tools that we use for software engineering. Software engineering has seen rapid growth in the last decade. And as there was a lot of growth, and as more and more developers started working together, we built tools and systems that made collaboration much more efficient. We built version control. We built continuous integration. We built code repositories. Right. And machine learning is now going through that same growth. And more and more people want to deploy machine learning. But we are now rediscovering some of these challenges that we've seen with software engineering. What is the version control equivalent for these machine learning pipelines? And what is the code repository equivalent? Well, the code repository is the one that I'm going to talk to you about right now for TensorFlow Hub. So code repositories are an amazing thing, because they enable a few really good practices. The first one is, if, as an engineer, I want to write code, and I know that there's a shared repository, usually I would look first if it has already been implemented. So I would search on GitHub or somewhere else to actually see if someone has already implemented the thing that I'm going to build. Secondly, if I know that I'm going to publish my code on a code repository, I may make different design decisions. I may build it in such a way that it's more reusable and that's more modular. Right. And that usually leads to better software in general. And in general, it also increases velocity of the entire community. Right. Even if it's a private repository within a company, if it's a public repository and open source, such as GitHub, code sharing is usually a good thing. Now, TensorFlow Hub is the equivalent for machine learning. In machine learning, you also have code. You have data. You have models. And you would want a central repository that allows you to share these reusable parts of machine learning between developers, and between teams. And if you think about it, in machine learning it's even more important than in software engineering. Because machine learning models are much, much more than just code. Right. So there's the algorithm that goes into these models. There's the data. There's the compute power that was used to train these models. And then there's the expertise of people that built these models that is scarce today. And I just want to reiterate this point. If you share a machine learning model, what you're really sharing is a combination of all of these. If I spent 50,000 GPU hours to train an embedding, and share it with TensorFlow Hub, everyone who uses that embedding can benefit from that compute power. They don't have to go recompute that same model and those same data. Right. So all of these four ingredients come together in what we call a module. And module is the unit that we care about that can be published in TensorFlow Hub, and that can now be reused by different people in different models. And those modules are TensorFlow graphs. And they can also contain weights. So what that means is they give you a reusable piece of TensorFlow graph that has the trained knowledge of the data and the algorithm embedded in it. And those modules are designed to be composable so they have common signatures such that they can be attached to different models. They're reusable. So they come with the graph and the weights. And importantly, they're also retrainable. So you can actually back propagate through these modules. And once you attach them to your model, you can customize them to your own data an to your own use case. So let's go through a quick example for text classification. Let's say I'm a startup and I want to build a new model that takes restaurant reviews and tries to predict whether they are positive or negative. So in this case, we have a sentence. And if you've ever tried to train some of these text models, you know that you need a lot of data to actually learn a good representation of text. So in this case we would just want to put in a sentence. And we want to see if it's positive or negative. And we want to reuse the code in the graph. We want to reuse the trained weights from someone else who's done the work before us. And we also don't want to do this with fewer data than is usually needed. An example of these text modules that are already published are TensorFlow Hub are the Universal Sentence Encoder. There's language models. And we've actually added more languages to these. Word2vec is a very popular type of model as well. And the key idea behind TensorFlow Hub, similarly to code repositories, is that the latest research can be shared with you as fast as possible, and as easy as possible. So the use of our Universal Sentence Encoder paper was published by some researchers at Google. And in that paper, the authors actually included a link to TensorFlow Hub with the embedding for that Universal Sentence Encoder. That link is like a handle that you can use. So in your code now, you actually want to train a model that uses this embedding. In this case, we train a DNN classifier. It's one line to say, I want to pull from TensorFlow Hub a text embedding column with this module. And let's take a quick look of what that handle looks like. So the first part is just a TF Hub domain. All of the modules that we publish, Google and some of our partners publish, will show up on TFHub.dev. The second part is the author. So in this case, Google published this embedding. Universal Sentence Encoder is the name of this embedding. And then the last piece is the version. Because TensorFlow Hub modules are immutable. So once they're uploaded, they can't change, because you wouldn't want a module to change underneath you. If you want to retrain a model, that's not really good for reproducibility. So if and when we upload a new version of the Universal Sentence Encoder, this version will increment. And then you can change to the new code as well. But just to reiterate this point, this is one line to pull this embedding column from TensorFlow Hub, and uses it as an input to your DNN classifier. And now you've just basically benefited from the expertise and the research that was published by the Google Research team for text embeddings. I just mentioned earlier that these modules are retrainable. So if you set retrainable true, now the model will actually back propagate through this embedding and update it as you train with your own data. Because in many cases, of course, you still have some small amount of data that you want to train on, such that the model adapts to your specific use case. And if you take the same URL, the same handle, and type it in your browser, you end up on the TensorFlow website, and see that documentation for that same module. So that same handle that you saw in the paper, you can use in your code as a one liner to use this embedding, and you can put in your browser to see documentation for this embedding. So the short version of the story is that TensorFlow Hub really is the repository for reusable machine learning models and modules. We have already published a large number of these modules. So the text modules are just one example that I just showed you. We have a large number of image embeddings that are both cutting edge. So there's a [? neural ?] architecture search module that's available. There's also some modules available for image classification that are optimized for devices so that you can use them on a small device. And we are also working hard to keep publishing more and more of these modules. So in addition to Google, we now have some modules that have been published by DeepMind. And we are also working with the community to get more and more modules up there. And, again, this is available on GitHub. You can use this today. And a particularly interesting aspect that we haven't highlighted so far, but it's extremely important, is that you can use the TensorFlow Hub libraries also to store and consume your own modules. So you don't have to rely on the TensorFlow Hub platform and use the modules that we have published. You can internally enable your developers to write out modules to disc with some shared storage. And other developers can consume those modules. And in that case, instead of that handle that I just showed you, you would just use the path to those modules. And that concludes my talk. I will go up to the TensorFlow booth to answer any of your questions. Thanks. [CLAPPING]
B1 model data machine learning machine train graph TensorFlow Extended (TFX) & Hub (TensorFlow @ O’Reilly AI Conference, San Francisco '18) 2 0 林宜悉 posted on 2020/03/31 More Share Save Report Video vocabulary