Placeholder Image

Subtitles section Play video

  • SPEAKER: A machine learning workflow

  • can involve many steps, from data prep to training

  • to evaluation and more.

  • It's hard to track these in an ad hoc manner,

  • like in a set of notebooks or scripts.

  • On top of that, monitoring and version tracking

  • can become a challenge.

  • Kubeflow Pipelines lets data scientists codify their ML

  • workflows so that they're easily composable, shareable,

  • and reproducible.

  • Let's check out how it can help you achieve

  • ML engineering best practices.

  • [MUSIC PLAYING]

  • If you want to check out the documentation directly,

  • check out the link below to read more about Kubeflow Pipelines.

  • Kubeflow Pipelines help drive data scientists

  • to adopt a disciplined pipeline mindset when developing ML code

  • and scaling it up to the cloud.

  • It's a Kubernetes native solution

  • that helps you with a number of things,

  • like simplifying the orchestration of machine

  • learning pipelines, making experimentation easy

  • for you to try ideas, reproduce runs, and share pipelines.

  • And you can stitch together and reuse components and pipelines

  • to quickly create end-to-end solutions

  • without having to rebuild every time, like building blocks.

  • It also comes with framework support

  • for things like execution monitoring, workflow

  • scheduling, metadata logging, and versioning.

  • Kubeflow Pipelines is one of the Kubeflow core components.

  • It's automatically deployed during Kubeflow deployment.

  • Now, there are many ways of defining pipelines

  • when it comes to data science.

  • But for Kubeflow, a pipeline is a description

  • of an ML workflow.

  • Under the hood, it runs on containers,

  • which provide portability, repeatability,

  • and encapsulation, because it decouples the execution

  • environment from your code runtime.

  • When you break a pipeline down, it

  • includes all the components in the workflow

  • and how they combine, and that makes a graph.

  • It includes the definition of all the inputs or parameters

  • needed to run the pipeline.

  • A pipeline component is one step in the workflow that

  • does a specific task, which means it takes inputs and can

  • produce outputs.

  • An output of a component can become the input

  • of other components and so forth.

  • Think of it like a function, in that it has a name, parameters,

  • return values, and a body.

  • For example, a component can be responsible for data

  • preprocessing, data transformation, model training,

  • and so on.

  • Now, this is where Kubeflow Pipelines shines.

  • A pipeline component is made up of code, packaged

  • as a docker image, and that performs

  • one step in the pipeline.

  • That's right.

  • The system launches one or more Kubernetes pods corresponding

  • to each step in your pipeline.

  • You can also leverage prebuilt components found

  • on the Kubeflow GitHub page.

  • Under the hood, the pods start docker containers,

  • and the containers start your programs.

  • Containers can only give you composability and scalability,

  • but this means your teams can focus

  • on one aspect of the pipeline at a time.

  • While you can use the Kubeflow Pipelines

  • SDK to programmatically upload pipelines

  • and launch pipeline runs--

  • for example, directly from a notebook--

  • you can also work with pipelines via the Kubeflow UI.

  • That way you can leverage some of its powerful features,

  • like visualizing the pipeline through a graph.

  • As you execute a run, the graph shows the relationships

  • between pipelines and their status.

  • Once a step completes, you'll also see the output artifact

  • in the UI.

  • As you can see here, the UI takes the output

  • and can actually render it as a rich visualization.

  • You can also get statistics on the performance

  • of the model for performance evaluation,

  • quick decision-making, or comparison

  • across different runs.

  • And we can't forget about the experiments in the UI,

  • which let you group a few of your pipeline runs

  • and test different configurations

  • of your pipelines.

  • You can compare the results between experiments

  • and even schedule recurring runs.

  • Because they're built on the flexibility of the containers,

  • Kubeflow Pipelines are useful for all sorts of tasks,

  • like ETL and CI/CD, but they're most popularly used

  • for ML workflows.

  • While you can deploy it on your own installation of Kubeflow,

  • a new hosted version of Pipelines on Google Cloud's AI

  • platform lets you deploy a standalone version

  • of Kubeflow Pipelines on a GKE cluster in just a few clicks.

  • You can start using it by checking out

  • the AI platform section in the Google Cloud console.

  • Kubeflow Pipelines handles the orchestration of ML workflows

  • and hides the complexities of containers as much as possible.

  • You get continuous training and production,

  • automatic tracking of metadata, and reusable ML components.

  • You can clone and iterate on Pipelines and leverage

  • the power of the UI to visualize and compare models.

  • Stay tuned to learn more about what you can do with Kubeflow.

  • [MUSIC PLAYING]

SPEAKER: A machine learning workflow

Subtitles and vocabulary

Click the word to look it up Click the word to find further inforamtion about it