Placeholder Image

Subtitles section Play video

  • [MUSIC PLAYING]

  • CLEMENS MEWALD: My name is Clemens.

  • I'm the product lead for TensorFlow Extended,

  • the end-to-end machine learning platform

  • that we built for TensorFlow.

  • And we have a lot of exciting announcements,

  • so let's jump right in.

  • A lot of you may be familiar with this graph.

  • We published this in a paper in 2017.

  • And the main point that I usually make on this graph

  • is that there's more to machine learning than just the training

  • part.

  • In the middle, the trainer piece,

  • that's where you train your machinery model.

  • But if you want to do machine learning in production

  • reliably and in a robust way, you actually

  • need all of these other components

  • before and after, and in parallel, to the training

  • algorithm.

  • And often I hear, sometimes from researchers, well,

  • I really only do research.

  • I only care about training the machine learning model

  • and I don't really need all of these upstream and downstream

  • things.

  • But what I would argue is that research often

  • leads to production.

  • And what we want to avoid is researchers

  • having to re-implement their hard work,

  • in a model that they've built, when they want to put

  • the model into production.

  • That's actually one of the main reasons

  • why we open sourced TensorFlow because we really

  • wanted the research community to build the models in a framework

  • that we can then use and actually move into production.

  • A second comment that I hear often

  • is, well, I only have a very small data set

  • that fits in a single machine.

  • And all of these tools are built to scale up

  • to hundreds of machines.

  • And I don't really need all of these heavy tools.

  • But what we've seen time and time again at Google

  • is that small data today becomes large data tomorrow.

  • And there's really no reason why you

  • would have to re-implement your entire stack just

  • because your data set grew.

  • So we really want to make sure that you

  • can use the same tools early on in your journey

  • so that the tools can actually grow with you and your product,

  • with the data, so that you can scale the exact same code

  • to hundreds of machines.

  • So we've built TensorFlow Extended as a platform

  • at Google, and it has had a profound impact

  • to how we do machine learning and production

  • and into becoming an AI-first company.

  • So TFX really powers some of our most important Alphabet

  • companies.

  • Of course, Google is just one of the Alphabet companies.

  • So TFX is used at six different Alphabet companies.

  • And within Google, it's really used

  • with all of the major products.

  • And also, all of the products that

  • don't have billions of users [INAUDIBLE] this slide.

  • And I've said before that we really

  • want to make TFX available to all of you

  • because we've seen the profound impact it

  • has had on our business.

  • And we're really excited to see what

  • you can do with the same tools in your companies.

  • So a year ago we talked about the libraries

  • that we had open sourced at that point in time.

  • So we talked about TensorFlow Transform, the training

  • libraries, Estimators and Keras, TensorFlow Model Analysis,

  • and TensorFlow Serving.

  • And I made the point that, back then, as today, all of these

  • are just libraries.

  • So they're low-level libraries that you still

  • have to use independently and stitch together

  • to actually make work and train for your own use cases.

  • Later that year, we added TensorFlow Data Validation.

  • So that made the picture a little more complete.

  • But we're still far away from actually being done yet.

  • However, it was extremely valuable to release

  • these libraries at that point in time

  • because some of our most important partners

  • externally has also had a profound impact with some

  • of these libraries.

  • So we've just heard from our friends at Airbnb.

  • They use TensorFlow Serving in that case study

  • that they mentioned.

  • Our friends at Twitter just published this fascinating blog

  • post of how they used TensorFlow to rank tweets

  • on their home timeline.

  • And they've used TensorFlow Model Analysis to analyze

  • that model on different segments of the data

  • and used TensorFlow Hub to share some of the word embeddings

  • that they've used for these models.

  • So coming back to this picture.

  • For those of you who've seen my talk last year,

  • I promised everyone that there will be more.

  • Because, again, this is only the partial platform.

  • It's far away from actually being an end-to-end platform.

  • It's just a set of libraries.

  • So today, for the very first time,

  • we're actually sharing the horizontal layers

  • that integrate all of these libraries

  • into one end-to-end platform, into one end-to-end product,

  • which is called TensorFlow Extended.

  • But first, we have to build components out

  • of these libraries.

  • So at the top of this slide, you see in orange, the libraries

  • that we've shared in the past.

  • And then in blue, you see the components

  • that we've built from these libraries.

  • So one observation to be made here is that, of course,

  • libraries are very low level and very flexible.

  • So with a single library, we can build many different components

  • that are part of machine learning pipeline.

  • So in the example of TensorFlow Data Validation,

  • we used the same library to build

  • three different components.

  • And I will go into detail on each one of these components

  • later.

  • So what makes a component?

  • A component is no longer just a library.

  • It's a packaged binary or container

  • that can be run as part of a pipeline.

  • It has well-defined inputs and outputs.

  • In the case of Model Validation, it's

  • the last validated model, a new candidate model,

  • and the validation outcome.

  • And that's a well-defined interface

  • of each one of these components.

  • It has a well-defined configuration.

  • And, most importantly, it's one configuration model

  • for the entire pipeline.

  • So you configure a TFX pipeline end to end.

  • And some of you may have noticed,

  • because Model Validation needs the last validated model,

  • it actually needs some context.

  • It needs to know what was the last model that was validated.

  • So we need to add a metadata store that actually provides

  • this context, that keeps a record of all

  • of the previous runs so that some of these more advanced

  • capabilities can be enabled.

  • So how does this context get created?

  • Of course, in this case, the trainer produces new models.

  • Model Validator knows about the last validated model

  • and the new candidate model.

  • And then downstream from the Validator,

  • we take that new candidate model and the validation outcome.

  • And if the validation outcome is positive,

  • we push the model to the serving system.

  • If it's negative, we don't.

  • Because usually we don't want to push

  • a model that's worse than our previous model

  • into our serving system.

  • So the Metadata Store is new.

  • So let's discuss why we need this

  • and what the Metadata Store does.

  • First, when most people talk about machine learning

  • workflows and pipelines, they really

  • think about task dependency.

  • They think there's one component and when that's finished,

  • there's another component that runs.

  • However, all of you who actually do machine learning

  • in production know that we actually need data dependency,

  • because all of these components consume artifacts and create

  • artifacts.

  • And as the example of Model Validation has showed,

  • it's incredibly important to actually know

  • these dependencies.

  • So we need a system that's both task and data aware so

  • that each component has a history of all

  • of the previous runs and knows about all of the artifacts.

  • So what's in this Metadata Store?

  • Most importantly, type definitions

  • of artifacts and the properties.

  • So in our case, for TFX, it contains the definition

  • of all of the artifacts that are being consumed and produced

  • by our components and all of their properties.

  • And it's an extensible type system,

  • so you can add new types of artifacts,

  • if you add new components.

  • And you can add new properties to these artifacts,

  • if you need to track more properties of those.

  • Secondly, we keep a record of all of the execution

  • of the components.

  • And with that execution, we store

  • all of the input artifacts that went into the execution,

  • all of the output artifacts that were produced,

  • and all of the runtime configuration

  • of this component.

  • And, again, this is extensible.

  • So if you want to track things like the code snapshot that

  • was used to produce that component,

  • you can store it in the Metadata Store, as well.

  • So, putting these things together

  • allows us to do something we call lineage tracking

  • across all executions.

  • Because if you think about it, if you

  • know every execution, all of its inputs and all of its outputs,

  • you can piece together a story of how an artifact was created.

  • So we can actually, by looking at an artifact,

  • say what were all of the upstream executions

  • and artifacts that went into producing this artifact,

  • and what were all of the downstream runs

  • and downstream artifacts that were produced using

  • that artifact as an input?

  • Now, that's an extremely powerful capability,

  • so let me talk you through some of the examples of what

  • this enables.

  • The first one is a pretty straightforward one.

  • Let's say I want to list all of the training

  • runs that I've done in the past.

  • So in this case, I am interested in the trainer

  • and I want to see all of the training runs

  • that were recorded.

  • In this case, I had two training runs.

  • And I see all of the properties of these training runs.

  • This is pretty straightforward, yet nothing new to see here.

  • However, I just spoke about lineage.

  • We can visualize that lineage and all this information

  • that we have.

  • The first comment on this slide to make

  • is we're working on a better UI.

  • This is really just for demonstration purposes.

  • But if you look at the end of this graph to the right side,

  • you see the model expert path.

  • This is the specific instance of a model that was created.

  • And as you can see, we see that the model

  • was created by the trainer.

  • And the trainer created this model

  • by consuming a Schema, Transform and Examples.

  • And, again, these are specific instances.

  • So the IDs there are not just numbering,

  • they're Schema of ID number four and Transform

  • of ID number five.

  • And for each one of those artifacts,

  • we also see how they were created upstream.

  • And this allows us to do this lineage tracking and going

  • forward and backward in our artifacts.

  • The narrative I used was walking back from the model but,

  • similarly, you could look at your training data

  • and say, what were all of the artifacts that were produced

  • using that training data?

  • This slide shows a visualization of the data distribution

  • that went into our model.

  • Now, at first glance, this may not

  • be something earth shattering because we've done this before.

  • We can compute statistics and we can visualize them.

  • But if we look at the code snippet,

  • we're not referring data or statistics.

  • We're referring to a model.

  • So we say for this specific model,

  • show me the distribution of data that the model was trained on.

  • And we can do this because we have a track

  • record of all of the data and the statistics that

  • went into this model.

  • We can do a similar thing in the other direction

  • of saying for a specific model, show me the sliced metrics that

  • were produced downstream by TensorFlow Model Analysis,

  • and we can get this visualization.

  • Again, just by looking at a model

  • and not specifically pointing to the output of TensorFlow Model

  • Analysis.

  • Of course, we know all of the models that were trained

  • and where all of the checkpoints lie

  • so we can start TensorBoard and point to some

  • of our historic runs.

  • So you can actually look at the TensorBoard

  • for all of the models that you've trained in the past.

  • Because we have a track record of all of the models

  • that you've trained, we can launch TensorBoard and point it

  • to two different directories.

  • So you can actually compare two models in the same TensorBoard

  • instance.

  • So this is really model tracking and experiment comparison

  • after the fact.

  • And we enable this by keeping a track record of all of this.

  • And, if we have multiple models, you

  • can also look at the data distribution

  • for multiple models.

  • So this usually helps with debugging a model.

  • If you train the same model twice, or on different data,

  • and it behaves differently, sometimes it

  • can pay off to look at whether the data distribution has

  • changed between the two different ones.

  • And it's hard to see in this graph,

  • but here we're actually overlaying

  • two distributions of the statistics

  • for one model and the other.

  • And you would see if there's a considerable drift

  • between those two.

  • So all of these are enabled by this lineage tracking

  • that I just mentioned.

  • Another set of use cases is visualizing previous runs

  • over time.

  • So if you train the same model over time, over new data,

  • we can give you a time series graph of all of the evaluation

  • metrics over time, and you can see

  • if your model improves or gets worse over time

  • as you retrain them.

  • Another very powerful use case is carrying over state

  • from previous models.

  • Because we know that you've trained the model in the past,

  • we can do something we call warm starting.

  • So we can re-initialize the model

  • with weights from a previous run.

  • And sometimes we want to re-initialize the entire model

  • or maybe just an embedding.

  • And in this way, we can continue training

  • from where we left off with a new data set.

  • And another very powerful application of this

  • is being able to reuse previously computed outputs.

  • A very common workflow is to iterate on the model

  • and basically iterate on your model architecture.

  • Now, if you have a pipeline that ingests data,

  • applies transformations to your data,

  • and then you train a model--

  • every time you make a small change to your model,

  • you don't want to recompute everything upstream.

  • There's no reason why you would have to re-ingest your data,

  • re-compute the transform just because you changed something

  • in your model.

  • Because we have a track record of all of the previous steps,

  • we can make a decision of saying your data hasn't changed,

  • your transform code hasn't changed,

  • so we will reuse the artifacts that were produced upstream.

  • And you can just iterate much, much faster on your model.

  • So this improves iteration speeds,

  • and it also saves compute because you're not

  • re-computing things that you've already computed in the past.

  • So now, we've talked about components quite a bit.

  • Now how do we actually orchestrate TFX pipelines?

  • First, every component has something we

  • call a driver and a publisher.

  • The driver's responsibility is to basically retrieve state

  • from the Metadata Store to inform

  • what work needs to be done.

  • So in the example of Model Validation,

  • the driver looks into the Metadata Store

  • to find the last validated model,

  • because that's the model that we need

  • to compare with the new model.

  • The publisher then basically keeps the record of everything

  • that went into this component, everything that was produced,

  • and all of the runtime configuration,

  • so that we can do that linear tracking

  • that I mentioned earlier.

  • And in between sits the executor.

  • And the executor is blissfully unaware of all of this metadata

  • stuff because it's extremely important for us

  • both to make that piece relatively simple.

  • Because if you want to change the code in one

  • of these components, if you want to change the training code,

  • you shouldn't have to worry about drivers and publishers.

  • You should just have to worry about the executor.

  • And it also makes it much, much easier

  • to write new components for the system.

  • And then we have one shared configuration model

  • that sits on top that configures end-to-end TFX pipelines.

  • And let's just take a look at what that looks like.

  • As you can see, this is a Python DSL.

  • And, from top to bottom, you see that it

  • has an object for each one of these components.

  • From ExampleGen, StatisticsGen, and so on.

  • The trainer component, you can see, basically

  • receives its configuration, says that your inputs

  • come from the transferred output and the schema that

  • was inferred.

  • And let's just see what's inside of that trainer.

  • And that's really just TensorFlow code.

  • So in this case, as you can see, we just use an estimator.

  • And we use to estimator train and evaluate method

  • to actually train this model.

  • And it takes an estimator.

  • And we just use one of our peak end estimators, in this case.

  • So this is a wide and deep model that you can just

  • instantiate and return.

  • But what's important to highlight here

  • is that we don't have an opinion on what this code looks

  • like, it's just TensorFlow.

  • So anything that produces a safe model as an output

  • is fair game.

  • You can use a Keras model that produces the inference graph

  • or, if you choose to, you can go lower level

  • and use some of the lower-level APIs in TensorFlow.

  • As long as it produces a safe model in the right format

  • that it can be used TensorFlow Serving,

  • or the [? eval graph ?] that can be used in TensorFlow Model

  • Analysis, you can read any type of TensorFlow code you want.

  • So, if you've noticed, we still haven't

  • talked about orchestration.

  • So we now have a configuration system, we have components,

  • and we have a metadata store.

  • And I know what some of you may be thinking right now.

  • Is he going to announce a new orchestration system?

  • And the good news is no--

  • at least not today.

  • Instead, we talked to a lot of our users, to a lot of you,

  • and unsurprisingly found out--

  • whoops.

  • Can we go back one slide?

  • Yup.

  • Unsurprisingly found out that there's

  • a significant installed base of orchestration

  • systems in your companies.

  • We just heard from Airbnb.

  • Of course, they developed Airflow.

  • And there's a lot of companies that use Kubeflow

  • And there's a number of other orchestration systems.

  • So we made a deliberate choice to support

  • any number of orchestration systems

  • because we don't want to make you adopt

  • a different orchestration system just

  • to orchestrate TFX pipelines.

  • So the installed base was reason number one.

  • Reason number two is we really want

  • you to extend TFX pipelines.

  • What we publish is really just our opinionated version

  • of what a TFX pipeline looks like

  • and the components that we use at Google.

  • But we want to make it easier for you

  • to add new components before and after and in parallel

  • to customize the pipeline to your own use cases.

  • And all of these orchestration systems

  • are really made to be able to express arbitrary workflows.

  • And if you're already familiar with one of those orchestration

  • systems, you should be able to use them for your use case.

  • So here we show you two examples of what

  • that looks like with Airflow and Kubeflow pipelines.

  • So on the left you see that same TFX pipeline configured

  • that is executed on Airflow.

  • And there on my example we use this for a small data

  • set so we can iterate on it fast on a local machine.

  • So in the Chicago taxicab example we use 10,000 records.

  • And on the right side, you see the exact same pipeline

  • executed on Kubeflow pipelines, on Google Cloud

  • so that you can take advantage of Cloud Dataflow and Cloud ML

  • Engine and scale it up to the 100 million [INAUDIBLE]

  • in that data set.

  • What's important here is it's the same configuration,

  • it's the same components, so we run the same components

  • in both environments, and you can

  • choose how you want to orchestrate them

  • in your own favorite orchestration system.

  • So this is what this looks like if it's put together.

  • TFX goes all the way from your raw data

  • to your deployment environment.

  • We discussed a shared configuration model

  • at the top, the metadata system that keeps track of all the

  • runs no matter how you orchestrate

  • those components, and then two ways

  • that we published of how to orchestrate them

  • with Airflow and with Kubeflow pipelines.

  • But, as mentioned, you can choose to orchestrate a TFX

  • pipeline in any way you want.

  • All of this is available now.

  • So you can go on GitHub, on github.com/tensorflow/tfx

  • to check out our code and see our new user guide

  • on tensorflow.org/tfx.

  • And I also want to point out that tomorrow we

  • have a workshop where you can get

  • hands-on experience with TensorFlow Extended,

  • from 12:00 to 2:00 PM.

  • And there's no prerequisites.

  • You don't even have to bring your own laptop.

  • So with this, we're going to jump into an end-to-end example

  • of how to actually go through the entire workflow

  • with the Chicago taxicab data set And just

  • to set some context.

  • So the Chicago taxi data set is a record

  • of cab rides in Chicago for some period of time.

  • And it contains everything that you would expect.

  • It contains when the trip started, when they ended,

  • where they started and where they ended,

  • how much was paid for it, and how it was paid.

  • Now, some of these features need some transformation,

  • so latitude and longitude features need to be bucketized.

  • Usually it's a bad idea to do math

  • with geographical coordinates.

  • So we bucketize them and treat them as categorical features.

  • Vocab features, which are strings, need to be integerized

  • and some of the Dense Float features need to be normalized.

  • We feed them into a wide and deep model.

  • So, the Dense features we feed into the deep part of the model

  • and all of the others we use in the wide part.

  • And then the label that we're trying to predict

  • is a Boolean, which is if the tip is

  • larger than 20% of the fare.

  • So really what we're doing is we're

  • building a high tip predictor.

  • So just in case if there's any cab drivers in the audience

  • or listening online, come find me later

  • and I can help you set this up for you.

  • I think it would be really beneficial to you

  • if you could predict if a cab ride gives a high tip or not.

  • So let's jump right in.

  • And we start with data validation and transformation.

  • So the first part of the TFX pipeline is ingesting data,

  • validating that data-- if it's OK--

  • and then transforming it such that it can

  • be fed into a TensorFlow graph.

  • So we start with ExampleGen. And the ExampleGen component really

  • just ingests data into it a TFX pipeline.

  • So it takes this input, your raw data.

  • We ship by default capabilities for CSV and TF Records.

  • But that's of course extensible as we

  • can ingest any type of data into these pipelines.

  • What's important is that, after this step,

  • the data is in a well-defined place where we can find it--

  • in a well-defined format because all

  • of our downstream components standardize on that format.

  • And it's split between training and eval.

  • So you've seen the configuration of all of these components

  • before.

  • It's very minimal configuration in most of the cases.

  • Next, to move onto data analysis and validation.

  • And I think a lot of you have a good intuition why

  • that is important.

  • Because, of course, machine learning is just

  • the process of taking data and learning models

  • that predict some field in your data.

  • And you're also aware that if you feed garbage in,

  • you get garbage out.

  • This will be no hope in a good machine learning model

  • if the data wrong, or if the data have errors in them.

  • And this is even reinforced if you have continuous pipelines

  • that train on data that was produced by a bad model

  • and you're just reinforcing the same problem.

  • So first, what I would argue is that data understanding

  • is absolutely critical for model understanding.

  • There's no hope in understanding why

  • a model is mis-predicting something if you don't

  • understand what the data looked like

  • and if the data was OK that was actually fed into the model.

  • And the question you might ask as a cab

  • driver is why are my trip predictions bad in the morning

  • hours?

  • And for all of these questions that I'm highlighting here,

  • I'm going to try to answer them with the tools

  • that we have available in TFX later.

  • So I will come back to these questions.

  • Next, we really would like you to treat your data

  • as your treat code.

  • There's a lot of care taken with code these days.

  • It's peer reviewed, it's checked into shared repositories,

  • it's version controlled, and so on.

  • And data really needs to be a first class

  • citizens in these systems.

  • And with this question, what are our expected

  • values from our payment types, that's

  • really a question about the schema of your data.

  • And what we would argue is that the schema

  • needs to be treated with the same care

  • as you treat your code.

  • And catching errors early is absolutely critical.

  • Because I'm sure, as all of you know,

  • errors propagate through the system.

  • If your data are not OK, then everything else downstream

  • goes wrong as well.

  • And these errors are extremely hard to correct for or fix

  • if you catch them relatively late in the process.

  • So really catching those problems as early as possible

  • is absolutely critical.

  • So in the taxicab example, you would ask a question

  • like is this new company that I have in my data set a typo

  • or is it actually a real company which

  • is a natural evolution of my data set?

  • So let's see if we can answer some of these questions

  • with the tools we have available,

  • starting with Statistics.

  • So the StatisticsGen component takes in your data,

  • computes statistics.

  • The data can be training, eval data,

  • it can also be serving logs--

  • in which case, you can look at the skew between your training

  • and your serving data.

  • And the statistics really capture the shape of your data.

  • And the visualization components we

  • have draw your attention to things

  • that need your attention, such as if a feature is missing

  • most of the times, it's actually highlighted in red.

  • The configuration for this component is minimal, as well.

  • And let me zoom into some of these visualizations.

  • And one of the questions that I posed earlier

  • was why are my tip predictions bad in the morning hours?

  • So one thing you could do is look at your data set

  • and see that for trip start hour, in the morning

  • hours between 2:00 AM and 6:00 AM,

  • you just don't have much data because there's not

  • that many taxi trips at that time.

  • And not having a lot of data in a specific area of your data

  • can mean that your model is not robust, or has higher variance.

  • And this could lead to worse predictions.

  • Next, you move on the SchemaGen. SchemaGen

  • takes this input, the output of StatisticsGen,

  • and it infers schema for you.

  • In the case of the chicago taxicab example,

  • there's very few features, so you

  • could handwrite that schema.

  • Although, it would be hard to handwrite what you expect

  • the string values to look like.

  • But if you have thousands of features,

  • it's hard to actually handwrite that expectation.

  • So we infer that schema for you the first time we run.

  • And the schema really represent what you expect from your data,

  • and what good data looks like, and what

  • values your string features can take on, and so on.

  • Again, very minimal configuration.

  • And the question that we can answer,

  • now, is what are expected values for payment types?

  • And if you look here at the very bottom,

  • you see the field payment type can

  • take on cash, credit, card dispute, no charge, key card,

  • and unknown.

  • So that's the expectation of my data

  • that's expressed in my schema.

  • And now the next time I run this,

  • and this field takes on a different value,

  • I will get an anomaly--

  • which comes from the ExampleValidator.

  • The ExampleValidator takes the statistics and the schema

  • as an input and produces an anomaly report.

  • Now, that anomaly report basically

  • tells you if your data are missing features,

  • if they have the wrong valency, if your distributions have

  • shifted for some of these features.

  • And it's important to highlight that the anomalies

  • report is human readable.

  • So you can look at it and understand what's going on.

  • But it's also machine readable.

  • So you can automatically make decisions

  • based on the anomalies and decide

  • not to train a model if you have anomalies in your data.

  • So the ExampleValidator just takes this input, statistics,

  • and the schema.

  • And let me zoom into one of these anomaly reports.

  • Here you can see that the field company has taken

  • on unexpected string values.

  • That just means that these string values weren't

  • there in your schema before.

  • And that can be a natural evolution of your data.

  • The first time you run this, maybe

  • you just didn't see any trips from those taxi companies.

  • And by looking at it, you can say, well, all of these look

  • like they're normal taxicab companies.

  • So you can update your schema with this expectation.

  • Or if you saw a lot of scrambled text in here,

  • you would know that there's a problem in your data

  • that you would have to go and fix.

  • Moving on, we actually get to Transform.

  • And let me just recap the types of transformations

  • that we want to do.

  • I've led them here in red--

  • in blue, sorry.

  • So we want to bucketize the longitude and latitude

  • features.

  • We want to convert the strings to ints, which

  • is also calling integerizing.

  • And for the Dense features, we want to actually normalize them

  • to a mean of zero and a standard deviation of one.

  • Now, all of these transformations

  • require you to do a full pass of your data

  • to compute some statistics.

  • To bucketize, you need to figure out

  • the boundaries of the buckets.

  • To do a string to integer, you need

  • to see all of the string values that show up in your data.

  • And to scale to a Z-score, you need

  • to compute the mean and the standard deviation.

  • Now, this is exactly what we built TensorFlow Transform for.

  • TensorFlow Transform allows you to express

  • a pre-processing function of your data

  • that contains some of these transformations that require

  • a full pass of your data.

  • And it will then automatically run a data processing graph

  • to compute those statistics.

  • So in this case, you can see the orange boxes

  • are statistics that we require.

  • So for minimalization, we require a mean

  • and the standard deviation.

  • And what TensorFlow Transform does

  • is it has a utility function that says scale to Z-score.

  • And it will then create a data processing graph for you

  • that computes the mean and the standard deviation

  • of your data, return the results,

  • and inject them as constants into your transformation graph.

  • So now that graph is a hermetic graph

  • that contains all of the information

  • that you need to actually apply your transformations.

  • And that graph can then be used in training

  • and in serving, guaranteeing that there's

  • no drift between them.

  • This basically eliminates the chances

  • of training serving skew by applying

  • the same transformations.

  • And at serving time, we just need to fit in the raw data,

  • and all the transformations are done as part of the TensorFlow

  • graph.

  • So how does that look like in the TFX Pipeline?

  • The Transform component takes in data, schema--

  • the schema allows us to parse the data more easily--

  • and code.

  • In this case, this is the user-provided pre-processing

  • function.

  • And it produces the Transform graph, which I just mentioned,

  • which is a hermetic graph that applies

  • the transformations, that gets attached

  • to your training and your serving graph.

  • And it optionally can materialize the Transform data.

  • And that's a performance optimization that sometimes you

  • need when you want to feed hardware accelerators really

  • fast, it can sometimes pay off to materialize

  • some transformations before your training step.

  • So in this case, the configuration of the component

  • takes in a module file.

  • That's just the file where you configure

  • your pre-processing function.

  • And in this code snippet, the actual code

  • is not that important.

  • But what I want to highlight is--

  • the last line in this code snippet

  • is how we transform our label.

  • Because, of course, the label is a logic expression of saying,

  • is the tip greater than 20% of my fare?

  • And the reason why I want to highlight this

  • is because you don't need analyze phases for all

  • of your transformations.

  • So in cases where you don't need analysis phases,

  • the transformation is just a regular TensorFlow graph

  • that transforms the features.

  • However, to scale something to Z-score, to integerize strings,

  • and to bucketize a feature, you definitely

  • need analysis phases, and that's what Transform helps you with.

  • So this is the user code that you would write.

  • And TF Transform would create a data processing graph

  • and return the results and the transform graph

  • that you need to apply these transformations.

  • So now that we've done with all of this,

  • we still haven't trained our machine learning model yet,

  • right?

  • But we've made sure that we know that our data is

  • in a place where we can find it.

  • We know it's in a format that we can understand.

  • We know it's split between training and eval.

  • We know that our data are good because we validated them.

  • And we know that we're applying transforms consistently

  • between training and serving.

  • Which brings us to the training step.

  • And this is where the magic happens, or so they say.

  • But, actually, it's not because the training step in TFX

  • is really just the TensorFlow graph and the TensorFlow

  • training step.

  • And the training component takes in the output of Transform,

  • as mentioned, which is the Transform

  • graph and, optionally, the materialized data, a schema,

  • and the training code that you provide.

  • And it creates, as output, TensorFlow models.

  • And those models are in the safe model format,

  • which is the standard serialized model

  • format in TensorFlow, which you've heard quite a bit

  • about this morning.

  • And in this case, actually, we produce two of them.

  • One is the inference graph, which

  • is used by TensorFlow Serving, and another one

  • is the eval graph, which contains

  • the metrics and the necessary annotations

  • to perform TensorFlow Model Analysis.

  • And so this is the configuration that you've seen earlier.

  • And, again, that the trainer takes in a module file.

  • And the code that's actually in that module file, again,

  • I'm just going to show you the same slide

  • again just to reiterate the point, is just TensorFlow.

  • So, in this case, it's the train and evaluate method

  • from estimators and a [? canned ?] estimator

  • that has been returned here.

  • But again, just to make sure you're

  • aware of this, any TensorFlow quote here

  • that produces the safe model in the right format is fair game.

  • So all of this works really well.

  • So with this, we've now trained the TensorFlow model.

  • And now I'm going to hand it off to my colleague,

  • Christina, who's going to talk about model evaluation

  • and analysis.

  • [MUSIC PLAYING]

[MUSIC PLAYING]

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it