Subtitles section Play video Print subtitles ♪ (music) ♪ Hello, everyone. First, thanks everyone for coming to attend the Dev Summit. And second, thanks for staying around this long. I know it's been a very long day. And there has been a lot of information that we've been throwing at you. But we've got much, much more and many more announcements to come. So please stick with me. My name is Clemens, and this is Raz. We're going to talk about TensorFlow Extended today. But before we do this, I'm going to do a quick survey. Can I get a quick show of hands? How many of you do machine learning in a research or academic setting? Okay. Quite a big number. Now how many of you do machine learning in a production setting? Okay. That looks about half-half. Obviously, also a lot of overlap. So for those of you who do machine learning in a production setting, how many of you have agreed with this statement? Yeah? Some? Okay. I see a lot of hands coming up. So everyone that I speak with who's doing machine learning in production agrees with this statement: "Doing machine learning in production is hard," and it's too hard. Because after all, we actually want to democratize machine learning and get more and more people to allow them to deploy machine learning in their products. One of the main reasons why it's still hard is because in addition to the actual machine learning. So this small orange box where you actually use TensorFlow, you may use Keras to put together your layers and train your model. You need to worry about so much more. There's all of these other things that you have to worry about to actually deploy machine learning in a production setting and serve it within your product. Now the good news is that this is exactly what TensorFlow Extended is about. TFX in [inaudible] Google is an [inaudible] machine learning platform that allows our developers to go all the way from data to production and serving machine learning models as fast as possible. Now before we introduce TFX, we saw that going through this process of writing some of these components, some of them didn't exist before. Gluing them together and actually getting to a launch took anywhere between six to nine months, sometimes even a year. Once we've deployed TFX and allow developers to use it, in many cases, people can use this platform and get up and running with it in a day and actually get to a deployable model in production in the order of weeks or in just a month. Now, TFX is a very large system and platform that consists of a lot of components and a lot of services so unfortunately I can't talk about all of this in the next 25 minutes. So we're only going to be able to cover a small part of it but we're talking about the things that we've already open sourced and made available to you. First, we're going to talk about TensorFlow Transform and show you how to apply transformations on your data consistently between training and serving. Next, Raz is going to introduce you to a new product that we're open sourcing called TensorFlow Model Analysis. We're going to give a demo of how all of this works together end to end and then make a broader announcement of our plans for TensorFlow Extended and sharing it the community. Let's jump into TensorFlow Transform first. So, a typical ML pipeline that you may see in the wild is during training, you usually have a distributed data pipeline that applies transformations to your data. Because usually you train in a large amount of data, this needs to be distributed, and you're on this pipeline and sometimes materialize the output before you actually put it into your trainer. Now at serving time, we need to find a way to somehow replay those exact transformations online. As a new request comes in, it needs to be sent to your model. There's a couple of challenges with this. The first one is, usually those two things are very different code paths. The data distribution systems that you would use for batch processing are very different from the libraries and tools that you would use to-- in real time transform data to make a request to your model. Now we have two different code paths. Second, in many cases, it's very hard to keep those two in sync. I'm sure a lot of you have seen this. You change your batch processing pipeline and introduce a new feature or change how it behaves and you somehow need to make sure that the code that they actually use in your production system is changed at the same time and is kept in sync. The third problem is, sometimes you actually want to deploy your TensorFlow machine learning model in many different environments. You want to deploy it in a mobile device; you want to deploy in a server; maybe you want to put it on a car; now suddenly you have three different environments where you have to apply these transformations, and maybe there's different languages that you use for those, and it's also very hard to keep those in sync. And this introduces something that we call training serving skew, where the transformations that you do at training time may be different from the ones in serving time, which usually leads to bad quality of your serving model. TensorFlow Transform addresses this by helping you write your data processing job at training time, so actually help you create those data pipelines to do those transformations, and at the same time, it emits a TensorFlow graph that can be in line with your training model and also your serving model. Now what this does is, it actually hermetically seals the model, and your model takes a raw data request as input, and all of the transformations are actually happening within the TensorFlow graph. This is a lot of advantages, one of them is that you no longer have any code in your serving environment that does these transformations because they're all being done in the TensorFlow graph. Another one is wherever you deploy this TensorFlow model, all of those transformations are applied in a consistent way. No matter where this graph is being evaluated. Let's see how that looks like. This is a code snippet of a pre-processing function that you would write with TF Transform. I'm just going to walk you through what happens here and what we need to do for this. First thing we do is normalize this feature. As all of you know, in order to re-normalize a feature we need to compute the mean and the standard deviation, and to actually apply this transformation, we need to subtract by the mean and divide by the center of deviation. So what has to happen is, for the input feature X, we have to compute these statistics which is a trivial task. If the data fits into a single machine, you can do it easily. It's a non-trivial task if you have a gigantic training data set and actually have to compute these metrics... ...effectively. Once we have these metrics we can actually apply this transformation to the feature. This is to show you that the output of this transformation can then be, again, multiplied with another tensor-- which is just a regular TensorFlow transformation. And then in order to bucketize a feature, you also again need to compute the bucket boundaries to actually apply this transformation. And again, this is a distributed data job to compute those metrics for the result of an already transformed feature. This is another benefit to then actually apply this transformation. The next examples just show you that in the same function it can apply any other tensor in tensor [inaudible] function and there's also some of what we call mappers in TF transform that don't require this analyze phase. So, N-grams doesn't require us to actually run a data pipeline to compute anything. Now what happens here is that these orange boxes are what we call analyzers. We realize those as actual data pipelines that compute those metrics over your data. They're implemented using Apache Beam. And we're going to talk about this more later. But what this allows us to do is actually run this distributor data pipeline in different environments. There's different runners for Apache Beam. And all of the transforms are just simple instance to instance transformations using pure TensorFlow code. What happens when you run TensorFlow Transform is that we actually run these analyze phases, compute the results of those analyze phases, and then inject the result as a constant in the TensorFlow graph-- so this is on the right-- and in this graph, it's a hermetic TensorFlow graph that applies all the transformations, and it can be in-lined in your serving graph. So now your serving graph has the transform graph as part of it and can play through all of these transforms wherever you want to deploy this TensorFlow model. What can be done with TensorFlow Transform? At training time for the batch processing, really anything that you can do with a distributed data pipeline. So there's a lot of flexibility here with types of statistics you can compute. We provide a lot of utility functions for you, but you can also write custom data pipelines. And at serving time because we generate a TensorFlow graph that applies these transformations-- we're limited to what you can do with a TensorFlow graph, but for all of you who know TensorFlow, there's a lot of flexibility in there as well. Anything that you can do in a TensorFlow graph, you can do with your transformations. Some of the common use cases that we've seen, the ones on the left I just spoke about, you can scale a continuous value to the C-score which is minimalization or to a value between 0 and 1. You can bucketize a continuous value. If you have text features, you can apply Bag of Words or N-grams, or for feature crosses, you can actually cross those strings and then generate vocabs of the result of those crosses. As mentioned before, TF Transform is extremely powerful in actually being able to chain together these transforms so you can apply transform under result of a transform and so on. Another particular interesting transform is actually applying another TensorFlow model. You've heard about the saved model before? If you have a saved model that you can apply as a transformation, you can use this until you've transformed. Let's say you have an image and you want to apply an inception model as it transforms and then use the output of that inception model maybe to combine it with some other feature or use it as an input feature to your model. You can use any other TensorFlow model that ends up being in-lined in your transform graph and also in-lined in your serving graph. All of this is available today and you can go check it out on github.com/tensorflow/transform. With this I'm going to hand it over to Raz who's going to talk about TensorFlow Model Analysis. Alright, thanks Clemens. Hi, everyone. I'm really excited to talk about TensorFlow Model Analysis today. We're going to talk a little bit about metrics. Let's see, next slide. Alright, so we can already get metrics today right? We use TensorBoard. TensorBoard's awesome. You saw an earlier presentation today about TensorBoard. It's a great tool-- while you're training, you can watch your metics, right? If your training isn't going well, you can save yourself a couple of hours of your life, right? Terminate the training, fix some things... Let's say you have your trained model already. Are we done with metics? Is that it? Is there any more to be said about metics after we're done training? Well, of course, there is. We want to know how well our trained model actually does for our target population. I would argue that we want to do this in a distributed fashion over the entire data set. Why wouldn't we just sample? Why wouldn't we just save more hours of our lives, right? And just sample, make things fast and easy. Let's say you start with a large data set. Now you're going to slice that data set. You're going to say, "I'm going to look at people at noon time." Right? That's a feature. >From Chicago, my hometown. Running on this particular device. Each of these slices reduce the size of your evaluation dataset by a factor. This is an exponential decline. By the time you're looking at the experience for a particular... ...set of users, you're not left with very much data. And the error bars on your performance measures, they're huge. I mean, how do you know that the noise doesn't exceed your signal by that point, right? So really you want to start with your larger dataset before you start slicing. Let's talk about a particular metric. I'm not sure-- Who's heard of the ROC Curve? It's kind of an unknown thing in machine learning these days. Okay. We have our ROC Curve, and I'm going to talk about a concept that you may or may not be familiar with which is ML Fairness. So what is fairness? Fairness is a complicated topic. Fairness is basically how well does our machine learning model do for different segments of our population, okay? You don't just have one ROC Curve, you have an ROC Curve for every segment. You have an ROC Curve for every group of users. Who here would run their business based on their top line metrics? No one! Right? That's crazy. You have to slice your metrics; you have to go in and dive in and find out how things are going so that lucky user, that black curve on the top, great experience. That unlucky user, the blue curve? Not such a great experience. When can our models be unfair to various users? One instance is if you simply don't have a lot of data from which to draw your inferences. Right? We use Stochastic optimizers, and if we re-train the model, it does something different every time, slightly. You're going to get a high variance for some users just because you don't have a lot of data there. We may be incorporating data from multiple data sources. Some data sources are more biased than others. So some users just get the short end of the deal, right? Whereas other users get the ideal experience. Our labels could be wrong. Right? All of these things can happen. Here's TensorFlow Model Analysis. You're looking here at the UI hosted within a Jupyter Notebook. On the X-axis, we have our loss. You can see there's some natural variance in the metrics. We're not always going to get spot on the same precision and recall for every segment of population. But sometimes you'll see... what about those guys at the top there experiencing the highest amount of loss? Do they have something in common? We want to know this. Sometimes our users that... ...get the poorest experience, they're sometimes our most vocal users, right? We all know this. I'd like to invite you to come visit ml-fairness.com. There's a deep literature about the mathematical side of ML Fairness. Once you've figured out how to measure fairness, there's a deep literature about what to do about it. How does TensorFlow Model Analysis actually give you these sliced metrics? How did you go about getting these metrics? Today you export a saved model for serving. It's kind of a familiar thing. TensorFlow Model Analysis is simple. As it's simple, it's similar. You export a saved model for evaluation. Why are these models different? Why export two? Well the eval graph that we serialize as a saved model has some additional annotations that allow our evaluation batch job to find the features, to find the prediction, to find the label. We don't want those things mixed in with our serving graphs so you export a second one. So this is the GitHub. We just opened it, I think last night at 4.30 pm. Check it out. We've been using it internally for quite some time now. Now it's available externally as well. The GitHub has an example that kind of puts it all together so that you can try all these components that we're talking about from your local machine. You don't have to get an account anywhere. You just get cloned and run the scripts and run the code lab. This is the Chicago Taxi Example. So we're using public data from-- publicly available data to determine which riders will tip their driver and which riders, shall we say, don't have enough money to tip today. What does fairness mean in this context? So our model is going to make some predictions. We may want to slice these predictions by time of day. During rush hour we're going to have a lot of data so hopefully our model's going to be fair if that data is not biased. At the very least it's not going to have a lot of variance. But how's it going to do at 4 a.m. in the morning? Maybe not so well. How's it going to do when the bars close? An interesting question. I don't know yet, but I challenge you to find out. So this is what you can run using your local scripts. We start with our raw data. We run the TF Transform; the TF Transform emits a transform function and our transformed examples. We train our model. Our model, again, emits two saved models as we talked about. One for serving and one for eval. And we try this all locally, just run scripts and play with the stuff. Clemens talked a little bit about transform. Here we see that we want to take our dense features, and we want to scale them to a particular Z-Score. And we don't want to do that batch by batch because the mean for each batch is going to differ, and there's going to be fluctuations. We may want to do that across the entire data set. We may want to normalize these things across the entire data set. We build a vocabulary; we bucket for the wide part of our model, and we emit our transform function, and into the trainer we go. You heard earlier today about TF Estimators, and here is a wide and deep estimator that takes our transformed features and emits to saved models. Now we're in TensorFlow Model Analysis, which reads in the saved model and runs it against all of the raw data. We called render slicing metrics from the Jupyter Notebook, and you see the UI. The thing to notice here is that this UI is immersive, right? It's not just a static picture that you can look at and go, "Huh" and then walk away from. It lets you see your errors broken down by bucket or broken down by feature, and it lets you drill in and ask questions and be curious about how your models are actually treating various subsets of your population. Those subsets may be the lucrative subsets you really want to drill in. And then you want to serve your models so our demo-- our example has a one-liner here that you can run to serve your model. Make a client request-- the thing to notice here is that we're making a GRPC request to that server. We're taking our feature tensors, we're serializing them into the GRPC request, sending them to the server and back comes probability. But that's not quite enough, right? We've heard a little bit of feedback about this server. The thing that we've heard is that GRPC is cool, but REST is really cool. I tried. This is actually one of the top feature requests on GitHub for model serving. You can now pack your tensors into a JSON object, send that JSON object to the server and get a response back to [inaudible]. Much more convenience and I'm very excited to say that it'll be released very soon. Very soon. I see the excitement out there. Back to the end to end. You can try all of these pieces end to end all on your local machine. Because they're using Apache Beam direct runners, and direct runners allow you to take your distributive job and to run them all locally. Now if you swap in Apache Beam's data flow runner, you can now run against the entire data set in the cloud. The example also shows you how to run the big job against the cloud version as well. We're currently working with a community to develop a runner for Apache Flink, a runner for Spark. Stay tuned to the TensorFlow blog and to our GitHub... ...and you can find the example at tensorflow/model-analysis and back to Clemens. Thank you, Raz. (applause) Alright, so we've heard about Transform. We've heard how to train models, how to use model analysis and how to serve them. But I hear you say you want more. Right? Is that enough? You want more? Alright. You want more. And I can think of why you want more. Maybe you read the paper we published last year and presented at KDD about TensorFlow Extended. In this paper we laid out this broad vision of how this platform works within Google and all of the features that it has and all the impact that we have by using it. Figure one, which allows these boxes and describes what TensorFLow Extended actually is. Although, overly simplified, this is still much more than we've discussed today. Today, we spoke about these four components of TensorFlow Extended. Now it's important to highlight that this is not yet an end to end machine learning platform. This is just a very small piece of TFX. These are the libraries that we've open-sourced for you to use. But we haven't yet released the entire platform. We're working very hard on this because we've seen the profound impact that it had internally-- how people could start using this platform into applying machine learning in production using TFX. And we've been working very hard to actually make more of these components available to you. So in the next phase, we're actually looking into our data components and looking to make those available to users that you can analyze your data, visualize the distributions, and detect anomalies because it's an important part of any machine learning pipeline to detect changes and shifts in your data and anomalies. After this we're actually looking into some of the horizontal pieces that helped tie all of these components together because if they're only single libraries, you still have to glue them together yourself. You still have to use them individually. They have well-defined interfaces, but you still have to combine them by yourself. Internally we have a shared configuration framework that allows you to configure the entire pipeline and a nice integrated fountain that allows you to monitor the status of these pipelines and see progress and inspect the different artifacts that have been produced by all of the components. So this is something that we're also looking to release later this year. And I think you get the idea. Eventually we want to make all of this available to the community because internally, hundreds of teams use this to improve our product. We really believe that this will be as transformative to the community as it is at Google. And we're working very hard to release more of these technologies into the entire platform to see what you can do with them for your products and for your companies. Keep watching the TensorFlow blog posts for a more detailed announcement about TFX and our future plans. And as mentioned, you can already use some of these components today. Transform is released. Model Analysis was just released yesterday, Serving is also released, and the end-to-end example is available under the shortlink and you can find it on the model analysis [inaudible]. So with this, thank you from both myself and Raz, and I'm going to ask you to join me in welcoming a special external guest, Patrick Brand, who's joining us from Coca-Cola, who's going to talk about applied AI at Coca-Cola. Thank you. (applause) ♪ (music) ♪
B1 model data transform serving machine learning graph TensorFlow Extended (TFX) (TensorFlow Dev Summit 2018) 1 0 林宜悉 posted on 2020/03/30 More Share Save Report Video vocabulary