Subtitles section Play video Print subtitles ROBERT CROWE: I'm Robert Crowe. And we are here today to talk about production pipelines, ML pipelines. So we're not going to be talking about ML modeling too much or different architectures. This is really all focused about when you have a model and you want to put it into production so that you can offer a product or a service or some internal service within your company, and it's something that you need to maintain over the lifetime of that deployment. So normally when we think about ML, we think about modeling code, because it's the heart of what we do. Modeling and the results that we get from the amazing models that we're producing these days, that's the reason we're all here, the results we can produce. It's what papers are written about, for the most part, overwhelmingly. The majority are written about architectures and results and different approaches to doing ML. It's great stuff. I love it. I'm sure you do too. But when you move to putting something into production, you discover that there are a lot of other pieces that are very important to making that model that you spent a lot of time putting together available and robust over the lifetime of a product or a service that you're going to offer out to the world so that they can experience really the benefits of the model that you've worked on. And those pieces are what TFX is all about. In machine learning, we're familiar with a lot of the issues that we have to deal with, things like where do I get labeled data. How do I generate the labels for the data that I have. I may have terabytes of data, but I need labels for them? Does my label cover the feature space that I'm going to see when I actually run inference against it? Is my dimensionality-- is it minimized? Or can I do more to try to simplify my set, my feature vector, to make my model more efficient? Have I got really the predictive information in the data that I'm choosing? And then we need to think about fairness as well. Are we are we serving all of the customers that we're trying to serve fairly, no matter where they are, or what religion they are, what language they speak, what demographic they might be because you want to serve those people as well as you can? You don't want to unfairly disadvantage people. And we may have rare conditions too, especially in things like health care where we're making a prediction that's going to be pretty important to someone's life. And it maybe on a condition that occurs very rarely. But a big one when you go into production is understanding the data lifecycle. Because once you've gone through that initial training and you've put something into production, that's just the start of the process. You're now going to try to maintain that over a lifetime, and the world changes. Your data changes. Conditions in your domain change. Along with that, you're doing now production software deployment. So you have all of the normal things that you have to deal with any software deployment, things like scalability. Will I need to scale up? Is my solution ready to do that? Can I extend it? Is it something that I can build on? Modularity, best practices, testability. How do I test an ML solution? And security and safety, because we know there are attacks for ML models that are getting pretty sophisticated these days. Google created TFX for us to use. We created it because we needed it. It was not the first production ML framework that we developed. We've actually learned over many years because we have ML all over Google taking in billions of inference requests really on a planet scale. And we needed something that would be maintainable and usable at a very large production scale with large data sets and large loads over a lifetime. So TFX has evolved from earlier attempts. And it is now what most of the products and services at Google use. And now we're also making it available to the world as an open-source product available to you now to use for your production deployments. It's also used by several of our partners and just companies that have adopted TFX. You may have heard talks from some of these at the conference already. And there's a nice quote there from Twitter, where they did an evaluation. They were coming from a Torch-based environment, looked at the whole suite or the whole ecosystem of TensorFlow, and moved everything that they did to TensorFlow. One of the big contributors to that was the availability of TFX. The vision is to provide a platform for everyone to use. Along with that, there's some best practices and approaches that we're trying to really make popular in the world, things like strongly-typed artifacts so that when your different components produce artifacts they have a strong type. Pipeline configuration, workflow execution, being able to deploy on different platforms, different distributed pipeline platforms using different orchestrators, different underlying execution engines-- trying to make that as flexible as possible. There are some horizontal layers that tie together the different components in TFX. And we'll talk about components here in a little bit. And we have a demo as well that will show you some of the code and some of the components that we're talking about. The horizontal layers-- an important one there is metadata storage . So each of the components produce and consume artifacts. You want to be able to store those. And you may want to do comparisons across months or years to see how did things change, because change becomes a central theme of what you're going to do in a production deployment. This is a conceptual look at the different parts of TFX. On the top, we have tasks-- a conceptual look at tasks. So things like ingesting data or training a model or serving the model. Below that, we have libraries that are available, again, as open-source components that you can leverage. They're leveraged by the components within TFX to do much of what they do. And on the bottom row in orange, and a good color for Halloween, we have the TFX components. And we're going to get into some detail about how your data will flow through the TFX pipeline to go from ingesting data to a finished trained model on the other side. So what is a component? A component has three parts. This is a particular component, but it could be any of them. Two of those parts, the driver and publisher, are largely boilerplate code that you could change. You probably won't. A driver consumes artifacts and begins the execution of your component. A publisher takes the output from the component, puts it back into metadata. The executor is really where the work is done in each of the components. And that's also a part that you can change. So you can take an existing component, override the executor in it, and produce a completely different component that does completely different processing. Each of the components has a configuration. And for TFX, that configuration is written in Python. And it's usually fairly simple. Some of the components are a little more complex. But most of them are just a couple of lines of code to configure. The key essential aspect here that I've alluded to is that there is a metadata store. The component will pull data from that store as it becomes available. So there's a set of dependencies that determine which artifacts that component depends on. It'll do whatever it's going to do. And it's going to write the result back into metadata. Over the lifetime of a model deployment, you start to build a metadata store that is a record of the entire lifetime of your model. And the way that your data has changed, the way your model has changed, the way your metrics have changed, it becomes a very powerful tool. Components communicate through the metadata store. So an initial component will produce an artifact, put it in the metadata store. The components that depend on that artifact will then read from the metadata store and do whatever they're going to do, and put their result into it, and so on. And that's how we flow through the pipeline. So the metadata store I keep talking about. What is it? What does it contain? There's really three kinds of things that it contains. Trained models or just artifacts themselves. They could be trained models, they could be data sets, they could be metrics, they could be splits. There's a number of different types of objects that are in the metadata store. Those are grouped into execution records. So when you execute the pipeline, that becomes an execution run. And the artifacts that are associated with that run are grouped under that execution run. So again, when you're trying to analyze what's been happening with your pipeline, that becomes very important. Also, the lineage of those artifacts-- so which artifact was produced by which component, which consumed which inputs, and so on. So that gives us some functionality that becomes very powerful over the lifetime of a model. You can find out which data a model was trained on, for example. If you're comparing the results of two different model trainings that you've done, tracing it back to how the data changed can be really important. And we have some tools that allow you to do that. So TensorBoard for example will allow you to compare the metrics from say a model that you trained six months ago and the model that you just trained now to try to understand. I mean, you could see that it was different, buy why-- why was it different. And warm-starting becomes very powerful too, especially when you're dealing with large amounts of data that could take hours or days to process, being able to pull that data from cache. If the inputs haven't changed, rather than rerunning that component every time becomes a very powerful tool as well. So there's a set of standard components that are shipped with TFX. But I want you to be aware from the start that you are not limited to those standard components. This is a good place to start. It'll get you pretty far down the road. But you will probably have needs-- you may or may not-- where you need to extend the components that are available. And you can do that. You can do that in a couple of different ways. This is sort of the canonical pipeline that we talk about so on the left, we're ingesting our data. We flow through, we split our data, we calculate some statistics against it. And we'll talk about this in some detail. We then make sure that we don't have problems with our data, and try to understand what types our features are. We do some feature engineering, we train. This probably sounds familiar. If you've ever been through an ML development process, this is mirroring exactly what you always do. Then you're going to check your metrics across that. And we do some deep analysis of the metrics of our model because that becomes very important. And we'll talk about an example of that in a little bit. And then you have a decision because let's assume you already have a model in production and you're retraining it. Or maybe you have a new model that you're training and the question becomes, should I push this new model to production, or is the one I already have better. Many of you have probably had the experience, if you trained a new model and it actually didn't do as well as the old one did. Along with that, we also have the ability to do bulk inference on inference requests. So you may be in a batch request environment. So you're pulling in data in batches and you're running requests against it, and then taking that result and doing something with it, that's a very common use case. This is actually kind of new. We have components to do that as well. This is the Python framework of defining a pipeline. So that's a particular component. It's the transform component and the configuration that you need to use for that. But on a very simple level, that's how you set up components. And in the bottom, you can see there's a list of components that are returned in that call. Those are going to be passed to a runner that runs on top of whatever orchestrator you're using. So a little bit more complex example, a little bit harder to read. Gives you an idea there's several components there. The dependencies between components and the dependencies and artifacts that I just talked about are defined in code like that. So you'll see, for example, that StatisticsGen depends on the output of ExampleGen. So now let's talk about each of the standard components. ExampleGen is where you ingest your data. And it is going to take your data, it's going to convert it to TensorFlow examples. This is actually showing just two input formats, but there's a reasonably long list of input formats that you can have. And it'll also do your splits for you. So you may want just training and eval, or maybe you want a validation split as well. So that's what ExampleGen does. And then it passes the result onto StatisticsGen. StatisticsGen, because we all work with data, we know you need to dive into the data and make sure that you understand the characteristics of your data set. Well, StatisticsGen is all about doing that in an environment where you may be running that many times a day. It also gives you tools to do visualization of your data like this. So for example, that's the trip start hour feature for this particular data set. And just looking at that, just looking at the histogram, tells me a lot about an area that I need to focus on. The 6 o'clock hour, I have very little data. So I want to go out and get some more data. Because if I try to use that and I run inference requests at 6:00 AM, it's going to be overgeneralizing. So I don't want that. SchemaGen is looking at the types of your features. So it's trying to decide is it a float, is it an int, is it a categorical feature. And if it's a categorical feature, what are the valid categories? So SchemaGen tries to infer that. But you as a data scientist need to make sure that it did the job correctly. So you need to review that and make any fixes that you need to make. ExampleValidator then takes those two results, the SchemaGen and StatisticsGen, and it looks for problems with your data. So it's going to look for things like missing values, values that are zero and shouldn't be zero, categorical values that are really outside the domain of that category, things like that, problems in your data. Transform is where we do feature engineering. And Transform is one of the more complex components. As you can see from the code there, that could be arbitrarily complex because, depending on the needs of your data set and your model, you may have a lot of feature engineering that you need to do, or you may just have a little bit. The configuration for it is fairly standard. Just one line of code there with some configuration parameters. But it has a key advantage in that it's going to take your feature engineering and it's going to convert it into a TensorFlow graph. That graph then gets prepended to the model that you're training as the input stage to your model. And what that does is it means you're doing the same feature engineering with the same code exactly the same way, both in training and in production when you deploy to any of the deployment targets. So that eliminates the possibility that you may have run into where you have two different environments, maybe even two different languages and you're trying to do the same thing in both places and you hope it's correct. This eliminates that. We call it training serving skew. It eliminates that possibility. Trainer. Well, now we're coming back to the start. Trainer does what we started with. It's going to train a model for us. So this is TensorFlow. And the result is going to be a SavedModel, and a little variant of the SavedModel, the eval SaveModel that we're going to use. It has a little extra information that we're going to use for evaluation. So Trainer has the typical kinds of configuration that you might expect, things like the number of steps, whether or not to use warm-starting. And you can use TensorBoard, including comparing execution runs between the model that you just trained and models that you've trained in the past at some time. So TensorBoard has a lot of very powerful tools to help you understand your training process and the performance of your model. So here's an example where we're comparing two different execution runs. Evaluator uses TensorFlow Model Analysis, one of the libraries that we talked about at the beginning, to do some deep analysis of the performance of your data. So it's not just looking at the top level metrics, like what is the RMSE or the AUC for my whole data set. It's looking at individual slices of your data set and slices of the features within your data set to really dive in at a deeper level and understand the performance. So that things like fairness become very manageable by doing that. If you don't do that sort of analysis, you can easily have gaps that may be catastrophic in the performance of your model. So this becomes a very powerful tool. And there's some visualization tools that we'll look at as well that help you do that. ModelValidator asks that question that I talked about a little while ago where you have a model that's in production, you have this new model that you just trained. Is it better or worse than what I already have? Should I push this thing to production? And if you decide that you're going to push it to production, then Pusher does that push. Now production could be a number of. different things. You could be pushing it to a serving cluster using TensorFlow Serving. You could be pushing it to a mobile application using TensorFlow Lite. You could be pushing it to a web application or a Node.js application using TensorFlow.js. Or you could even just be taking that model and pushing it into a repo with TensorFlow Hub that you might use later for transfer learning. So there's a number of different deployment targets. And you can do all the above with Pusher. Bulk infer is that component that we talked about a little while ago where we're able to take bulk inference requests and run inference across them, and do that in a managed way that allows us to take that result and move it off. All right, orchestrating. We have a number of tasks in our pipeline. How do we orchestrate them? Well, there's different ways to approach it. You can do task-aware pipelines where you simply run a task, you wait for it to finish, and you run the next task. And that's fine. That works. But it doesn't have a lot of the advantages that you can have with a task or data-aware pipeline. This is where we get our metadata. So by setting up the dependencies between our components and our metadata artifacts in a task and data-aware pipeline, we're able to take advantage of a lot of the information over the lifetime of that product or service that that ML deployment that we have in the artifacts that we've produced. Orchestration is done through an orchestrator. And the question is, which orchestrator do you have to use. Well, the answer is you can use whatever you want to use. We have three orchestrators that are supported out of the box-- Apache Airflow, Kubeflow, for a Kubernetes containerized environment, and Apache Beam. Those three are not your only selections. You can extend that to add your own orchestration. But in the end, you're going to end up with essentially the same thing, regardless of which orchestrator you're going to use. You're going to end up with a Direct Acyclic Graph, or DAG, that expresses the dependencies between your components, which are really a result of the artifacts that are produced by your components. So here's three examples. They look different, but actually if you look at them, they are the same DAG. We get this question a lot, so I want to address this. What's this cube thing and this TFX thing, and what's the difference between the two? The answer is Kubeflow is really focused on a Kubernetes containerized environment. And it's a great deployment platform for running in a very scalable manageable way. Kubernetes pipelines uses TFX. So you're essentially deploying TFX in a pipeline environment on Kubernetes. And that's Kubeflow pipeline. But TFX can be deployed in other ways as well. So if you don't want to use Kubeflow pipelines, if you want to deploy in a different environment, maybe on-prem in your own data center or what have you, you can use TFX in other environments as well. One of the things that we do, because we're working with large data sets, and there's a lot of processing involved, some of these operations that we're going to do require a lot of processing, we need to distribute that processing over a pipeline. So how do we do that? Well, a component that uses a pipeline is going to create a pipeline for the operations that it wants to do. It's going to hand that pipeline off to a cluster. Could be a Spark cluster, could be a Flink cluster, could be Cloud Dataflow. Map reduce happens on the cluster. And it comes back with a result. But we want to support more than just one or two types of distributed pipelines. So we're working with Apache Beam to add an abstraction from the native layer of those pipelines so that you can take the same code and run it actually on different pipelines without changing your code. And that's what Apache Beam is. There's different runners for different things like Flink and Spark. There's also a really nice one for development called the direct runner or local runner that allow you to run even just on a laptop. But here's the vision for Beam. There's a whole set of pipelines out there that are available. They have strengths and weaknesses. And in a lot of cases, you will already have one that you've stood up and you want to try to leverage that resource. So by supporting all of them, you're able to do this installation. You don't have to spin up a completely different cluster to do that. You can leverage the ones or just expand the ones that you already have. So Beam allows you to do that, and also with different languages. Now in this case, we're only using Python. But Beam as a vision, as an Apache project, allows you to work with other languages as well through different STKs. And now I'd like to introduce Charles Chen, who's going to give us a demo of TFX running on actually just a laptop system in the cloud here. So you'll get to see some live code. CHARLES CHEN: Thank you, Robert. So now that we've gone into detail about TFX and TFX components, I'd like to take the chance to make this really concrete with a live demo of a complete TFX pipeline. So this demo uses the new experimental TFX notebook integration. And the goal of this integration is to make it easy to interactively build up TFX pipelines in a Jupyter or Google Colab notebook environment. So you can try your pipeline out before you export the code and deploy it into production. You can follow along and run this yourself in the Google Colab notebook at this link here. So for the interactive notebook, we introduced one new concept. This is the interactive context. In a production environment, like Robert said, we would construct a complete graph of pipeline components and orchestrated on an engine like Airflow or Kubeflow. By contrast, when you're experimenting in a notebook, you want to interactively execute and see results for individual components. We construct an interactive context which can do two things. The first thing is it can run the component we define. This as context.run. And the second is it can show a visualization of the components output. This is context.show. Let's get started. Here's the overview we've seen of a canonical TFX pipeline where we go from data ingestion to data validation, feature engineering, model training and model validation to deployment. We'll go through each of these in the notebook. Here's the notebook here. And the first thing we do is a bit of setup. This is the PIP install step. And basically, we run the PIP Install to install the Python package and all its dependencies. And if you're following along, after doing the installation, you need to restart the runtime so that the notebook picks up some of the new versions of dependencies. Next, we do some imports, set up some paths, download the data. And finally, we create the interactive context. Once we've done this, we get to our first component, which is ExampleGen, which ingests data into the pipeline. Again, this is just a couple of lines of code. And we run this with context.run. After we're done, we're ready to use the data, which has been ingested and processed into splits. We take the output of ExampleGen and use this in our next component, StatisticsGen, which analyzes data and outputs detailed statistics. This component can also be used standalone outside of a TFX pipeline with the TensorFlow data validation package. You can see that our input here is the output of ExampleGen again. And after we're done, we can visualize this with context.show. For each split, we get a detailed summary statistics and a visualization of our data, which we can dig into to ensure data quality, even before we train a model. After that, we can use SchemaGen to infer a suggested schema for your data. We see that visualized here. This includes the type and domain for each feature in your data set. For example, for categorical features, the domain is inferred to be all the values we've seen so far. This is just the starting point. And you can then edit and curate the schema based on your domain knowledge. Once we have the schema, we can use the ExampleValidator to perform anomaly detection. That is, find items in your input data that don't match your expected schema. This is especially useful as your pipeline evolves over time with new data sets coming in. We visualize this. And any unexpected values are highlighted. If you see anomalies, you might want to either update your schema or fix your data collection process. After our data is validated, we move onto data transformation or feature engineering. This is done in the Transform component. The first thing we do is write a little common code and pre-processing function with TensorFlow Transform. You can look at this code in more detail on your own. It defines the transformations we do using TensorFlow Transform. And this means for each feature of your data, we define the individual transformations. This comes together in the Transform component where feature engineering is performed. And we output the transform graph and the engineered features. This will take a bit of time to run. And after that's done, we get to the heart of the model with the trainer. Here, we define a training function that returns a TensorFlow estimator. We build the estimator and return this function from the function. And this is just TensorFlow. So once we have this, we run the Trainer component, which is going to produce a trained model for evaluation and serving. You can watch it train here. It'll give you the loss, evaluate, and then produce a SavedModel for evaluation and serving. After we've trained the model, we have the Evaluator component. This uses the standalone TensorFlow Model Analysis library. In addition to overall metrics over the entire data set, we can define more granular feature column slices for evaluation. The Evaluator component then computes metrics for each data slice, which you can then visualize with interactive visualizations. What makes TensorFlow Model Analysis really powerful is that in addition to the overall metrics we see here, we can analyze model performance on granular feature slices-- granular feature column slices, that is. So here we see the metrics rendered across one slice of our data. And we can do even more granular things with multiple columns of our data. After we've evaluated our model, we come to the ModelValidator. Based on a comparison of the performance of your model compared to an existing baseline model, this component checks for whether or not the model is ready to push to production. Right now, since we don't have any existing models, this check will by default return true. You can also customize this check by extending the ModelValidator executor. The output of this check is then used in the next component, the Pusher. The Pusher pushes your model, again, to a specific destination for production. This can be TensorFlow Serving, a file system destination or a cloud service like Google Cloud Platform. Here, we configure the Pusher to write the model to a file system directory. Once we've done this, we've now architected a complete TFX pipeline. With minimal modifications, you can use your new pipeline in production in something like Airflow or Kubeflow. And essentially, this would be getting rid of your usage of the interactive context and creating a new pipeline object to run. For convenience, we've included a pipeline export feature in the notebook that tries to do this for you. So here, well, first we do some housekeeping-- we mount Google Drive. We select a runner type. Let's say I want to run this on Airflow. We set up some paths. We specify the components of the exported pipeline. We do the pipeline export. And finally, we use this export cell to get a zip archive of all you need to run the pipeline on an engine like Airflow. With that, I'll hand it back to Robert, who will talk about how to extend TFX for your own needs. ROBERT CROWE: Thanks, Charles. All right, so that was the notebook. Let me just advance here. All right, custom components. So again, these are the standard components that come out of the box. With TFX, but you are not limited to those. You can write your own custom components. So let's talk about how to do that. First of all, you can do a semi-custom component by taking an existing component and working with the same inputs, the same outputs, essentially the same contract, but replacing the executor by just overriding the existing executor. And that executor, remember, is where the work is done. So changing that executor is going to change that component in a very fundamental way. So if you're going to do that, you're going to extend the base executor and implement a do function. And that's what the code looks like. There's a custom config dictionary that allows you to pass additional things into your component. So this is a fairly easy and powerful way to create your own custom component. But you can also, if you want-- and this is how you would fit a custom component into an existing pipeline-- it just fits in like any other component. You can also, though, do a fully custom component where you have a different component spec, a different contract, different inputs, different outputs, that don't exist in an existing component. And those are defined in the component spec that gives you the parameters and the inputs and the outputs to your component. And then you are going to need an executor for that as well, just like you did before. But it takes you even further. So your executor, your inputs, your outputs, that's a fully custom component. Now I've only got three minutes left. I'm going to go through a quick example of really trying to understand why model understanding and model performance are very important. First of all, I've talked about data lifecycle a couple of times, trying to understand how things change over time. The ground truth may change. Your data characteristics, the distribution of each of your features may change. Conditions in the world may change. You may have different competitors. You may expand into different markets, different geographies. Styles may change. The world changes. Over the life of your deployment, that becomes very important. So in this example, and this is a hypothetical example, this company is an online retailer who is selling shoes. And they're trying to use click-through rates to decide how much inventory they should order. And they discover that suddenly-- they've been going along and now on a particular slice of their data-- not their whole data set, just a slice of it-- things have really gone south. So they've got a problem. What do they do? Well, first of all, it's important to understand the realities around this. Mispredictions do not have uniform cost. Across your business or whatever service you're providing, different parts of that will have different costs. The data you have is never the data that you wish you had. And the model, as in this case, the model objective is often a proxy for what you really want to know. So they're trying to use click-through rates as a proxy for ordering inventory. But the last one, the real world doesn't stand still, is the key here. You need to really understand that when you go into a production environment. So what can they do? Well, their problems are not with the data set that they use to train their model. Their problems are with the current inference requests that they're getting. And there's a difference between those two. So how do they deal with that? Well, they're going to need labels. Assuming they're doing supervised learning, they're going to need to label those inference requests somehow. How can they do that? If they're in the fortunate position of being able to get direct feedback, they can use their existing processes to label that data. So for example, if they're trying to predict the click-through rate and they have click-through rate data that they're collecting, they can use that directly. That's great. Many people are not in that situation. So you see a lot of environments where you're trying to use things like semi-supervision and humans to label the data that you have where a subset of the data that you have so you can try to understand how things have changed since you trained your model. Weak supervision is a very powerful tool as well. But it's not that easy to use in a lot of cases. You need to try to apply historical data or other types of modeling, heuristics. And in many cases, those are giving you a labeling signal that is not 100% accurate. But it gives you some direction and you can work. There are modeling techniques to work with that kind of a signal. TensorFlow Model Analysis. The Fairness indicators that you may have seen today-- and we're out on the show floor with-- those are great tools to try to understand this and identify the slices and the problems that you have with your data. But first things first, you need to check your data, look for outliers, check your feature space coverage. How well does your data cover the feature space that you have? And use the tools that we give you in TensorFlow Data Validation and TensorFlow Model Analysis. We also have the what-if tool, a very powerful tool for doing exploration of your data and your model. And in the end, you need to quantify the cost because you are never going to get 100%. So how much is that extra 5% worth? In a business environment, you need to understand that. And that's TFX. TFX is, again, what Google built because we needed it. And now, we want you to have it too. So it's available is an open source platform that we encourage you all to build on. And since it's open source, we hope you're going to help us contribute and build the platform to make it better in the future as well. So on behalf of myself and my colleague Charles, thanks very much. [APPLAUSE]
B1 data tfx component pipeline model production TFX: Production ML pipelines with TensorFlow (TF World '19) 4 0 林宜悉 posted on 2020/04/04 More Share Save Report Video vocabulary