Subtitles section Play video Print subtitles SPEAKER: A machine learning workflow can involve many steps, from data prep to training to evaluation and more. It's hard to track these in an ad hoc manner, like in a set of notebooks or scripts. On top of that, monitoring and version tracking can become a challenge. Kubeflow Pipelines lets data scientists codify their ML workflows so that they're easily composable, shareable, and reproducible. Let's check out how it can help you achieve ML engineering best practices. [MUSIC PLAYING] If you want to check out the documentation directly, check out the link below to read more about Kubeflow Pipelines. Kubeflow Pipelines help drive data scientists to adopt a disciplined pipeline mindset when developing ML code and scaling it up to the cloud. It's a Kubernetes native solution that helps you with a number of things, like simplifying the orchestration of machine learning pipelines, making experimentation easy for you to try ideas, reproduce runs, and share pipelines. And you can stitch together and reuse components and pipelines to quickly create end-to-end solutions without having to rebuild every time, like building blocks. It also comes with framework support for things like execution monitoring, workflow scheduling, metadata logging, and versioning. Kubeflow Pipelines is one of the Kubeflow core components. It's automatically deployed during Kubeflow deployment. Now, there are many ways of defining pipelines when it comes to data science. But for Kubeflow, a pipeline is a description of an ML workflow. Under the hood, it runs on containers, which provide portability, repeatability, and encapsulation, because it decouples the execution environment from your code runtime. When you break a pipeline down, it includes all the components in the workflow and how they combine, and that makes a graph. It includes the definition of all the inputs or parameters needed to run the pipeline. A pipeline component is one step in the workflow that does a specific task, which means it takes inputs and can produce outputs. An output of a component can become the input of other components and so forth. Think of it like a function, in that it has a name, parameters, return values, and a body. For example, a component can be responsible for data preprocessing, data transformation, model training, and so on. Now, this is where Kubeflow Pipelines shines. A pipeline component is made up of code, packaged as a docker image, and that performs one step in the pipeline. That's right. The system launches one or more Kubernetes pods corresponding to each step in your pipeline. You can also leverage prebuilt components found on the Kubeflow GitHub page. Under the hood, the pods start docker containers, and the containers start your programs. Containers can only give you composability and scalability, but this means your teams can focus on one aspect of the pipeline at a time. While you can use the Kubeflow Pipelines SDK to programmatically upload pipelines and launch pipeline runs-- for example, directly from a notebook-- you can also work with pipelines via the Kubeflow UI. That way you can leverage some of its powerful features, like visualizing the pipeline through a graph. As you execute a run, the graph shows the relationships between pipelines and their status. Once a step completes, you'll also see the output artifact in the UI. As you can see here, the UI takes the output and can actually render it as a rich visualization. You can also get statistics on the performance of the model for performance evaluation, quick decision-making, or comparison across different runs. And we can't forget about the experiments in the UI, which let you group a few of your pipeline runs and test different configurations of your pipelines. You can compare the results between experiments and even schedule recurring runs. Because they're built on the flexibility of the containers, Kubeflow Pipelines are useful for all sorts of tasks, like ETL and CI/CD, but they're most popularly used for ML workflows. While you can deploy it on your own installation of Kubeflow, a new hosted version of Pipelines on Google Cloud's AI platform lets you deploy a standalone version of Kubeflow Pipelines on a GKE cluster in just a few clicks. You can start using it by checking out the AI platform section in the Google Cloud console. Kubeflow Pipelines handles the orchestration of ML workflows and hides the complexities of containers as much as possible. You get continuous training and production, automatic tracking of metadata, and reusable ML components. You can clone and iterate on Pipelines and leverage the power of the UI to visualize and compare models. Stay tuned to learn more about what you can do with Kubeflow. [MUSIC PLAYING]
B2 US pipeline workflow data component graph output Intro to Kubeflow Pipelines 28 2 wkfs4qjn9r posted on 2021/06/15 More Share Save Report Video vocabulary