Subtitles section Play video Print subtitles [LOGO MUSIC] KONSTANTINOS KATSIAPIS: Hello everyone. My name is Gus Katsiapis. And together with my colleagues Kevin Haas and Tulsee Doshi, we will talk to you today about TensorFlow Extended, and the topic covers two areas. Let's go into that I think most of you here that have used ML in the real world realize that machine learning is a lot more than just a model. It's a lot more than just training a model. And especially, when machine learning powers your business or powers a product that actually affects your users, you absolutely need the reliability. So, today, we'll talk about how, in the face of massive data and the real world, how do you build applications that use machine learning that are robust to the world that they operate in? And today's talk will actually have two parts. The first part will be about TensorFlow Extended, otherwise known as TFX. This is an end-to-end machine learning platform. And the second part of the talk will talk about model understanding and how you can actually get insights into your business by understanding how your model performs in real-world situations. OK. So let's get started with the first part, TensorFlow Extended, otherwise known as TFX. We built TFX at Google. We started building it approximately two and a half years ago. And a lot of the knowledge that went into building TFX actually came from experience we had building other machine learning platforms within Google that preceded it, that preceded TensorFlow even. So TFX has had a profound impact to Google, and it's used throughout several Alphabet companies and also in several products within Google itself. And several of those products that you can see here-- like Gmail or Add, et cetera, or YouTube-- have pretty large scale. So they affect billions of users, et cetera. So this is one more reason for us to pay extra attention to building systems that use ML reliably. Now, when we started building TFX, we had published a paper about it, and we promised we would eventually make it available to the rest of the world. So over the last few years, we've been open sourcing aspects of it, and several of our partners externally have actually been pretty successful deploying this technology, several of the libraries we have offered over time. So just to call out an interesting case study that Twitter made, they actually made a fascinating blog post where they spoke about how they ranked tweets with TensorFlow and how they used TensorFlow Hub in order to do transfer learning and shared word embeddings and share them within their organization. And they also showcased how they use TensorFlow Model Analysis in order to have a better understanding of their model-- how their model performs not just globally over the population, but several slices of their population that were important to the business. So we'll be talking about more of this later, especially with the model understanding talk. Now I think most of you here are either software developers or software engineers or are very much familiar with software processes and technologies. So I think most of you probably recognize several of the themes presented in this slide, like scalability, extensibility, modularity, et cetera. But, my conjecture is that most people think about those concepts in terms of code and how to build software. Now, with the advent of machine learning, we are building applications that are powered by machine learning, which means that those applications are powered by data-- fundamentally are powered by data. So if you just think about code and you don't think about data, you're only thinking about half of the picture, 50% of the picture. So you can optimize one amazingly. But if you don't think about the other half, you cannot be better than the half-- than the half itself. So I would actually encourage everyone just to take each of those concepts and see, how does this concept apply to data as opposed to just code? And if you can apply these concepts to both data and code, you can build a holistic application that is actually robust and powers your products. So we will actually go into each of those individually and see how they apply to machine learning. OK. Let's start with scalability. Most of you know that, when you start your business, it might be small. But the reality is that, as your business grows, so might your data. So, ideally, you want a solution that is able to over time scale together with your business. Ideally, you would be able to write a single library or software-- piece of software-- and that's also could operate on a laptop because you want to experiment quickly, but it could also operate on a beefy machine with tons of processors or a ton of accelerators. And you could also scale it over hundreds or thousands of machines if you need to. So this flexibility in terms of scale is quite important. And ideally, each time you hop from kilobytes to megabytes to gigabytes to terabytes, ideally you wouldn't have to use different tools because you have a huge learning curve each time you change your technology under the covers. So the ideal here is to have a machine learning platform that is able to work on your laptop but can also scale it on any cloud you would like. OK. Now, let's talk about accessibility. So everyone here understands that you can have libraries and components that make up your system, and you can have things that work out of the box. But, you always want to customize it a little bit to meet your goals. You always want to put custom business logic in some part of your application. And this is similar for machine learning. So if you think about the concrete example, when you fit data into machine learning model, you need to do multiple transformations to put the data in a format that the model expects. So as a developer of an ML application, you want to have the transformation flexibility that an ML platform can provide to you-- whether that's bucketizing, creating vocabularies, et cetera. And that's just one example, but this applies pervasively throughout the ML process. OK. Let's talk a little bit about modularity. All of you probably understand the importance of having nice APIs and reusable libraries that allow you to build bigger and bigger systems. But, going back to our original question, how does this apply to artifacts produced by machine learning pipelines? How does this apply to data? So ideally, I would be able to reuse the reusable components of a model that was trained to recognize images and take that part-- the reusable part of it-- and put it in my model that predicts kinds of chairs. So ideally, we would be able to reuse parts of models as easy as it would be to reuse libraries. So check out TensorFlow Hub, which actually allows you to reuse the reusable parts of machine learning models and plug them into your own infrastructure. And going a step further, how does this apply to artifacts? So machine learning platforms usually produce lots of data artifacts, whether that's statistics about your data or something else. And many times, those operate in a continuous fashion. So data continuously arrives into the system, and you have to continuously produce models that mimic reality quickly, that understand reality faster. And, if you have to redo computation from scratch, then it means that you can sometimes not follow real time. So somehow you need to be able to take artifacts that are produced over time and merge them easily, quickly together so that you can follow real time as new data arrives. OK. Moving on to the stability, most of us are familiar with unit tests, integration tests, regression tests, performance tests. All of those are about code. What does it mean to write a test about data? ML and data are very much intertwined. What is the equivalent of a unit test or an integration test? If we take a step back, when we write tests, we encode expectations of what happens in code. So when we deal with applications that have both code and data, we need to write expectations both in terms of the code-- how the code behaves-- and in terms of what is the shape of the data that goes into this black box, this black process. So I would say that the equivalent of a unit test or an integration test for data is writing expectations about types of the data that goes into your system, the distributions you expect, the values that are allowed, the values that are not allowed, et cetera. And we'll be discussing about those later. If we take this a step further, code oftentimes gives us very strong contracts. When you have a sorting algorithm, you can expect that it will do exactly what the contract promises. When you have a machine learning model, basically you can think of it as a function that was generated automatically through data. So you don't have as strong contracts as we had before. So in order to set expectations about how those black box functions that were created by data behave, we need to set expectations about what the data that went into them was. And this relates a lot to the previous stability point we mentioned. OK. Moving on a little bit more from a systems view perspective, ideally when you use an application, you won't have to code everything yourself. You would be able to reuse several components out of the box that accept well-defined configuration, and the configuration is ideally also flexible enough for you to parameterize it a little bit and customize it to your needs. So reuse as many black boxes as possible, but touch them up when you need to. This is very similar to machine learning. Ideally, I wouldn't have to rebuild everything from scratch. I would be able to reuse components, libraries, modules, pipelines. And I would be able to share them with multiple folks within my business or publicly. OK. Now, once you have a system and it's actually responsible for the performance of your product or your business, you need to know what's going on. So you need to be able to continuously monitor it and get alerted when things are not moving as one would expect. And, once again, here I would say that data is a first-class citizen. So unexpected changes to the data coming into your system-- whether that's your training system or your serving system-- need to be monitored because, otherwise, you cannot know what the behavior of the system will be especially in the absence of those strong contracts we discussed earlier. So both data need to be first-class citizens, and both models need to be first-class citizens. And we need to be able to monitor the performance of the models as well over time. Now, if we apply those concepts, we can build better software. And if we apply the concepts we just discussed to machine learning, we can build better software that employs machine learning. So this should, in principle, allow us to build software that is safer. And safe is a very generic word, but let me try to define it a little bit here. I would call it maybe a robustness to environment changes. So, as the data that undergirds your system changes, you should be able to have a system that is robust to it, that performs properly in light of those changes, or at least you get notified when those changes happen so that you change the system to become more robust in the future. And how do we do this? It's actually hard. I think the reality is that all of us build on collective knowledge and experience. When we reuse libraries, we build on the shoulders of giants that built those libraries. And machine learning is not any different. I think the world outside is complex. And if you build any system that tries to mimic it, it has to mirror some of the complexities. Or, if you build a system that tries to predict it, it has to be able to mirror some of those complexities. So what comes to the rescue here is, I would say, automation and best practices. So if we apply some best practices for machine learning that have been proven to be useful in various other circumstances, they're probably useful for your business as well. And many of those are difficult. We oftentimes spend days or months debugging situations. And then, once we are able to do that, we can encode the best practices we learn into the ML software so that you don't need to reinvent the wheel, basically, in this area. And the key here is that learning from the pitfalls of others is very important to building your own system. OK. So how do we achieve those in machine learning? Let's look into a typical user journey of using ML in a product and what this looks like. In the beginning, you have data ingestion. As we discussed, data and code are tightly intertwined in machine learning. And sometimes, in order to change your machine algorithm, you have to change your data and vice versa. So these need to be tightly intertwined, and you need something that brings the data into the system and applies the best practices we discussed earlier. So it shuffles the training data so that downstream processes can operate efficiently and learn faster. Or, it splits your data into the training and evolved set in a way that make sure is that there is no leakage of information during this split. For example, if you have a financial application, you want to make sure that the future does not look into the past where you're training, just as a current example. OK. Once we are able to generate some data, we need to make sure that data is of good quality. And why is that? The answer is very simple. As we discussed, garbage in, garbage out. This applies especially well to machine learning because ML is this kind of black box that is very complicated. So if you put things in that you don't quite understand, there is no way for you to be able to understand the output of the model. So data understanding is actually required for model understanding. We'll be talking about this later as well. Another thing here is that, if you catch errors as early as possible, that is critical for your machine learning application. You reduce your wasted time because now you've identified the error early on where it's easier to actually spot. And you actually decrease the amount of computation you perform. I don't need to train a very expensive model if the data that's going into it is garbage. So this is key. And I think the TL;DR here is that you should treat data as you treat code. They're a first-class citizen in your ML application. OK. Once you have your data, sometimes you need to massage it in order to fit it into your machine learning algorithm. So oftentimes, you might need to build vocabularies or you need to normalize your constants in order to fit into your neural network or your linear algorithm. And, in order to do that, you often need full passes over the data. So the key thing is that, when you train, you do these full passes over the data. But when you serve, when you evaluate your model, you actually have one prediction at a time. So how can we create the hermetic representation of a transformation that requires the full possibility of data and be able to apply that hermetic presentation at serving time so that my training and serving way of doing things is not different? If those are different, my model would predict bogus. So we need to have processes that ensure those things are hermetic and equivalent. And ideally, you would have a system that does that for you. OK. Now that we have data, now that we have data that is of good quality because we've validated it, and now that we have transformations that allows us to fill the data into the model, let's train the model. That's where the magic happens, or so we think, when in reality everything else before it is actually needed. But the chain doesn't stop here. Once you produce a model, you actually need to validate the quality of the model. You need to make sure it passes a threshold that you think are sufficient for your business to operate in. And ideally, you would do this not just globally-- how does this model perform on the total population of data-- but also how it performs on each individual slice of the user base you care about-- whether that's different countries or different populations or different geographies or whatever. So, ideally, you would have a view not just how the model does holistically, but in each slice where you're interested in. Once you've performed this validation, you now have something that we think is of good quality, but we want to have a separation between our training system-- which oftentimes operates at large scale and at high throughput-- from our serving system that actually operates with low latency. When you try to evaluate the model, sometimes you want a prediction immediately-- within seconds or milliseconds even. So you need a separation between your training part of the system and your serving part of the system, and you need clear boundaries between those two. So you need something that takes a model, decides whether it's good or not, and then pushes it into production. Once your model is in production, you're now able to make predictions and improve your business. So as you can see, machine learning is not about the middle part there. This is not about training your model. Machine learning is about the end-to-end thing of using it in order to improve an application or improve a product. So TFX has, over the course of several years, open sourced several libraries and components that make use of those libraries. And the libraries are very, very modular, going back to the previous things we discussed. And they can be stitched together into your existing infrastructure. But, we also offer components that understand the context-- and Kevin will be talking about this later-- we understand the context they are in, and they can connect to each other and operate in unison as opposed to operating as single things. And we also offer some horizontal layers that allow you to do those connections, both in terms of configuration-- like you have a single way to configure your pipeline-- and in terms of data storage and metadata storage. So those components understand both the state of the world, what exists there, and how they can be connected with each other. And we also offer an entrance system. So we give you the libraries that allow you to build your system if you want, build your car if you will, but we also offer a car itself. So we offer an entrance system that gives you well-defined configuration, simple configuration for you to use. It has several components that work out of the box to help you with the user journey we just discussed, which as we saw has multiple phases. So we have components that work for each of those phases. And we also have a metadata store in the bottom that allows you to track basically everything that's happening in this machine learning platform. And, with this, I would like to invite Kevin Haas to talk a little bit more about the effects and the details. KEVIN HAAS: Thanks, Gus. So my name is Kevin Haas, and I'm going to talk about the internals of TFX. First a little bit of audience participation. How many out there know Python? Raise your hands. All right. Quite a few. How many know TensorFlow? Raise your hands. Oh, quite a few. That's really good. And how many of you want to see code? Raise your hands. A lot less than the people who know Python. OK. So I'm going to get to code in about five minutes. First, I want to talk a little bit about what TFX is. First off, let's talk about the component. The component is the basic building block of all of our pipelines. When you think about a pipeline, it's an assembly of components. So, using this, we have ModelValidator. ModelValidator is one of our components, and it is responsible for taking two models-- the model that we just trained as part of this pipeline execution and the model that runs in production. We then measure the accuracy of both of these models. And if the newly-trained model is better than the model in production, we then tell a downstream component to push that model to production. So what we have here is two inputs-- two models-- and an output-- a decision whether or not to push. The challenge here is the model that's in production has not been trained by this execution of the pipeline. It may not have been trained by this pipeline at all. So we don't have a reference to this. Now, an easy answer would be, oh, just hard code it somewhere in your config, but we want to avoid that as well. So we get around this by using another project like Google put together called ML Metadata. With ML Metadata, we're able to get the context of all prior executions and all prior artifacts. This really helps us a lot in being able to say, what is the current production model because, we can query the ML Metadata store and ask for the URI, the artifact, of the production model. Once we have that, then now we have both inputs that are able to go to the validator to do our testing. Interestingly enough, it's not just the production model that we get from ML Metadata, but we get all of our inputs from ML Metadata. If you look here, the trainer when it emits a new model writes it to ML Metadata. It does not pass it directly to the next component of ModelValidator. When ModelValidator starts up, it gets both of the models from ML Metadata, and then it writes back the validation outcome back to the ML Metadata store. What this does is it does two things. One, it allows us to compose our pipelines a lot more loosely and so have tightly coupled pipelines. And two, it separates the orchestration layer from how we manage state and the rest of our metadata management. So a little bit more about how a component is configured. So we have three main phases of our component. It's a very specific design pattern that we've used for all of TFX, and we find it works really well with machine learning pipelines. First, we have the driver. The driver is responsible for additional scheduling and orchestration that we may use to determine how the executor behaves. For example, if we're asked to train a model using the very same examples as before, the very same inputs as before, the very same runtime parameters as before, and the same version of the estimator, we can kind of guess that we're going to end up with the same model that we started with. As a result, we may choose to skip the execution and return back a cached artifact. This saves us both time and compute. Next, we have the executor. What the executor is responsible for is the business logic of the component. In the case of the ModelValidator, this is where we test both models and make a decision. In the case of the trainer, this is where we train the model using TensorFlow. Going all the way back to the beginning of the pipeline, in the case of example.jm, this is where we extract the data out of either a data store or out of BigQuery or out of the file system in order to generate the examples that the rest of the pipeline trains with. Finally we have the publishing phase. The publishing phase is responsible for writing back whatever happened in this component back to the metadata store. So if the driver decided to skip work, we write that back. If an executor created one or more new artifacts, we also write that back to ML Metadata. So artifacts and metadata management are a crucial element of managing our ML pipelines. If you look at this, this is a very simple task dependency graph. We have transform. When it's done, it calls trainer. It's a very simple finish-to-start dependency graph. So while we can model our pipelines this way, we don't because task dependencies alone are not the right way to model our goals. Instead, we actually look at this as a data dependency graph. We're aware of all the artifacts that are going into these various components, and the components will be creating new artifacts as well. In some cases, we can actually use components that were not created by the current pipeline or even this pipeline configuration. So what we need is some sort of system that's able to both schedule and execute components but also be task-aware and maintain a history of all the previous executions. So, what's the metadata store? First off, the metadata store will keep information-- I guess I click here-- the metadata store will keep information about the trained models-- so, for example, the artifacts, the type of the artifacts that have been trained by previous components. Second, we keep a list of all the components and all the versions, all the inputs and all the runtime parameters that were provided to these components. So this gives us history of what's happened. Now that we have this, we actually have this bi-directional graph between all of our artifacts and all of our components. What this does is it gives us a lot of extra capabilities where we can do advanced analytics and metrics off of our system. An example of this is something called lineage. Lineage is where you want to know where all your data has passed through the system or all of the components. So, for example, what were all the models that were created by a particular data set is something that we could use with ML Metadata? On the flip side, what were all the components or all the artifacts that were created by a very particular version of a component? We can answer that question as well using ML Metadata. This is very important for debugging, and it's also very important for our enquiries when somebody says, how has this data been consumed by any of your pipelines? Another example where prior state helps us is warm starting. Warm starting is a case in TensorFlow where you incrementally train a model using the checkpoints in the weights of a previous version of the model. So here, because we know that the model has already been warm started by using ML Metadata, we're able to go ahead and warm start the model, saving us both time and compute. Here's an example of TensorFlow Model Analysis, also known as TFMA. This allows us to visualize how a model's been performing over time. This is a single model on a time series graph, so we can see whether the model has been improving or degrading. Tulsee is going to be talking a little bit more about TFMA in a bit. Finally, we have the ability to reuse the components. Now, I mentioned before caching is great. Caching is great in production, but it's also great when you're building your model the first time. When you think about you have your pipeline, you're probably working in the static data set, and you're probably not working on your features that much. You're probably spending most of your time on the estimator. So with TFX, we more or less cache through all that part. So as you're running your pipeline, your critical path is on the model itself. So you probably wondering, how do I develop models in this? This is where the code comes in. So at the core, we use TensorFlow estimators. You can build an estimator using a Keras inference graph. You can build it using a cant estimator, or you can go ahead and create your own using the low-level ops. We don't care. All we need is an estimator. Once we have the estimator, the next thing we do is we need to put it back into a callback function. In the case of trainer, we have a trainer callback function. And you'll see here the estimator plus a couple more parameters are passed back to the caller of the callback function. This is how we call TensorFlow for the train and evaluate. Finally, you add your code, pretty much the file that had the callback function and the estimator, into this TFX pipeline. So you'll see there in the trainer, there's a module file, and the module file has all the information. This is what we use to call back to your function. This will generate the save model. I was also talking about data dependency graphs. If you notice here on the graph, we don't actually explicitly say transform runs and then the trainer runs. Instead, we're implied dependency graph based on the fact that the outputs of transform are required as inputs for the trainer. So this allows us to couple task scheduling along with our metadata awareness. At this point, we've talked about components, we've talked about the metadata store, and we've talked about how to configure a pipeline. So now, what I'll do is talk about how to schedule and execute using some of the open source orchestrators. So we've modified our internal version of TFX to support Airflow and Kubeflow, two popular open source orchestrators. We know that there's additional orchestrators, not all of which are in open source, so we built an interface that allows us to add additional orchestrators as possible. And we'd love contributions. So if somebody out there wants to go ahead and implement on a third or a fourth orchestrator, please let us know via GitHub, and we'll help you out. Our focus is primarily on extensibility. We want to be able to allow people to extend ML pipelines, build new graphs as opposed to the ones that we've been showing today, and then also adding new components because not all the components that you need for machine learning for your particular machine learning environments are the ones that we're providing. So putting it all back together, this is the slide that Gus showed earlier. At the very beginning, we have the ExampleGen component that's going to extract and generate examples. Then, we go through the phases of data validation. And assuming that the data is good, then we go ahead and do feature engineering. Provided that the feature engineering completes, we train a model, validate the model, and then push it to one or more systems. This is our typical ML pipeline. While some of the components can change-- the draft structure can change-- that's pretty much how we do it. And what you see here on the right is Airflow, and on the left is Kubeflow. We use the very same configuration for both of these implementations. And the key thing that we're looking for in TFX is portability. We want portability where the same configuration of a pipeline can move between orchestrators. We also want to be portable across the on-prem, local machine, and public cloud boundaries. So going from my single machine up to running in Google Cloud with, for example, Dataflow is really a one-line change in my pipeline just to reconfigure Beam. So that's it for how the internals of TFX works. Next up, my colleague Tulsee will be talking about model understanding. TULSEE DOSHI: Awesome. Thank you. Hi, everyone. My name is Tulsee, and I lead product for the ML Fairness Effort here at Google. ML Fairness, like many goals related to modeling, benefits from debugging and understanding model performance. So, today, my goal is to walk through a brief example in which we leverage the facets of TFX that Gus and Kevin spoke about earlier to better understand our model performance. So let's imagine that you're an e-tailer, and you're selling shoes online. So you have a model, and this model predicts click-through rates that help inform how much inventory you should order. A higher click-through rate implies a higher need for inventory. But, all of a sudden, you discover that the AUC and prediction accuracy have dropped on men's dress shoes. Oops. This means that you may have over-ordered certain shoes or under-ordered certain inventory. Both cases could have direct impact on your business. So what went wrong? As you heard in the keynote yesterday, model understanding is an important part of being able to understand and improve possible causes of these kinds of issues. And today, we want to walk you through how you can leverage TFX to think more about these problems. So first things first, you can check your inputs with the built-in TF Data Validation component. This component allows you to ask questions like, are there outliers? Are some features missing? Is there something broken? Or, is there a shift in distribution in the real world that changes the ways your users might be behaving that might be leading to this output? For example, here's a screenshot of what TensorFlow Data Validation might look like for your example. Here, you can see all the features that you're using in your data set-- for example, price or shoe size. You can see how many examples these features cover. You can see if any percent of them are missing. You can see the mean, the standard deviation, min, median, max. And you can also see the distribution of how those features map across your data set. For example, if you look at shoe size, you can see the distribution from sizes 0 to 14. And you can see that sizes 3 to 7 seem to be a little bit missing. So maybe we don't actually have that much data for kids' shoes. You can now take this a step further-- so going beyond the data to actually ask questions about your model and your model performance using the TensorFlow Model Analysis. This allows you to ask questions like, how does the model perform on different slices of data? How does the current model performance compare to previous versions? With TensorFlow Model Analysis, you get to dive deep into slices. One slice could be men's dress shoes. Another example could be what you see here, where you can actually slice over different colors of shoes. The graph in this example showcases how many examples have a particular feature. and the table below allows you to deep dive into the metrics that you care about to understand not just how your model is performing overall, but actually taking that next step to understand where your performance may be skewed. For example, for the color brick, you can actually see an accuracy of 74%. Whereas, for shoes of color light gray, we have an accuracy about 79%. Once you find a slice that you think may not be performing the way you think it should, you may want to dive in a bit deeper. You actually now want to start understanding why this performance went off. Where is the skew? Here, you can extend this with the what-if tool. The what-if tool allows you to understand the input your model is receiving and ask and answer what if questions. What if the shoe was a different color? What if we were using a different feature? With the what if tool, you can select a data point and actually look at the features and change them. You can play with their feature values to be able to see how the example might change. Here, for example, we're selecting a viewed data point. You can go in and change the value of the feature and actually see how the classification output would change as you change these values. This allows you to test your assumptions to understand where there might be correlations that you didn't expect the model to pick up and how you might go about tackling them. The what-if tool is available as part of the TFX platform, and it's part of the TensorBoard dashboard. You can also use it as a Jupyter widget. Training data, test data, and trained models can be provided to the what-if tool directly from the TFX Metadata store that you heard about earlier. But, the interesting thing here is CTR is really just the model's proxy objective. You want to really understand CTR so that you can think about the supply you should buy, your inventory. And so, your actual business objectives depend on something much larger. They depend on revenue. They depend on cost. They depend on your supply. So you don't just want to understand when your CTR is wrong. You actually want to understand how getting this wrong could actually affect your broader business-- this misprediction cost. So in order to figure this out, you decide you want to join your model predictions with the rest of your business data to understand that larger impact. This is where some of the component functionality of TFX comes in. You could actually create a new component with a custom executor. You can customize this executor such that you can actually join your model predictions with your business data. Let's break that down a bit. So you have your trainer. You train a model. And then, you run the evaluator to be able to get the results. You can then leverage your custom component to join in business data and export every prediction to a SQL database where you can actually quantify this cost. You can then take this a step further. With the what-if tool, we talked a little bit about the importance of assumptions, of testing what you think might be going wrong. You can take this farther with the idea of understandable baselines. Usually, there are a few assumptions we make about the ways we believe that our models should be performing. For example, I might believe that weekends and holidays are likely to have a higher CTR with my shoes than weekdays. Or, I may have the hypothesis that certain geographic regions are more likely to click on certain types of shoes than others. These assumptions are rules that I have attributed to how my models should perform. So I could actually create a very simple rule-based model that could encode these prior beliefs. This baseline expresses my priors, so I can use them to tell when my model might be uncertain or wrong, or to dive in deeper if my expectations were in fact wrong. They can help inform when the deep model is overgeneralizing or when maybe my expectations are overgeneralizing. So here again, custom components can help you. You can actually build a baseline trainer with these simple rules whose evaluator also exports to your SQL database. Basically, you stamp each example with its baseline prediction. So now, you can run queries not just over the model predictions and your business data, but also over your baseline assumptions to understand where your expectations are violated. Understandable baselines are one way of understanding your models, and you can extend even beyond this by building custom components that leverage new and amazing research techniques in model understanding. These techniques include [INAUDIBLE],, which Sundir touched on yesterday in the keynote, but also path-integrated gradients, for example. Hopefully, this gave you one example of many ways that we hope that the extensibility of TFX will allow you to go deeper and extend the functionality for your own business goals. Overall, with TensorFlow Extended you can leverage many out-of-the-box components for your production model needs. This includes things like TensorFlow Data Validation and TensorFlow Model Analysis. TFX also provides flexible orchestration and metadata, and building on top of the out-of-the-box components can help you further your understanding of the model such that you can truly achieve your business goals. We're excited to continue to expand TensorFlow Extended for more of your use cases and see how you extend and expand it as well. We're excited to continue to grow the TFX community with you. And come to our office hours tomorrow if you're still around I/O to be able to talk to us and learn more and for us to be able to learn from you about your needs and use cases as well. Thank you so much for joining us today, and we hope this was helpful. [APPLAUSE] [MUSIC PLAYING]
B1 model data tfx metadata machine learning machine TensorFlow Extended (TFX): Machine Learning Pipelines and Model Understanding (Google I/O'19) 2 0 林宜悉 posted on 2020/04/04 More Share Save Report Video vocabulary