Subtitles section Play video Print subtitles LIZ FONG-JONES: Hi there. I'm Liz, a Site Reliability Engineer, or SRE, at Google. And I teach Google Cloud customers how to build and operate reliable services. SETH VARGO: And I'm Seth, a Developer Advocate at Google focused on infrastructure and operations. And Liz and I are here to settle things once and for all. Which is better, DevOps or SRE? LIZ FONG-JONES: Whoa there, Seth. Hold on a second. I'm not sure you're really looking at this in the right way. But first of all, maybe we should clarify some things. What do you think DevOps is? SETH VARGO: So that's a great question, Liz. Back in the day, operators and developers had a lot of contention. Developers used to throw their code over the metaphorical wall, and operators were responsible for keeping that code running in production. Operators had little understanding of the code bases, and developers had little understanding of operational practices. But developers were concerned with shipping code, and operators were concerned with reliability. This misalignment often caused tension within the organization. LIZ FONG-JONES: So if I understand you correctly, you're saying that the developers were responsible for features, and the operators were responsible for stability, meaning the developers wanted to move faster to get their features out faster and the operators wanted to move slower to keep things stable? I could see how that would cause a lot of tension. SETH VARGO: Exactly. So DevOps is a set of practices and a culture designed to break down those barriers between developers, operators, and other parts of the organization. I break DevOps down into five key areas. First, reduce organizational silence. By breaking down barriers across teams, we can increase collaboration and thorough put. Second, accept failure as normal. Computers are inherently unreliable, so we can't expect perfection. And when we introduce humans into the system, we get even more imperfection. Third, implement gradual change. Not only are small, incremental changes easier to review, but in the event that a gradual change does make a bug in production, it allows us to reduce our mean time to recover, making it simple to roll back. Fourth, we need to leverage tooling and automation. And fifth, we need to measure everything. Measurement is a critical gauge for success. And without a way to measure if our first four pillars were successful, we would have no way of knowing if they were. So, Liz, you've been an SRE at Google for over 10 years now. Do you think any of the way that I described DevOps aligns with your experience as an SRE? LIZ FONG-JONES: It's sounding very familiar. Because, if you think about DevOps as a philosophy, SRE is a prescriptive way of accomplishing that philosophy. So if DevOps were an interface in programming language, you might almost say that SRE is a concrete class that implements DevOps. Let's take a look at how that is. So, Seth, when you talked about eliminating organizational silos, what I thought about is the fact that we share ownership of production with our developers. And we use the same tooling in order to make sure everyone has the same view and same approach to working with production. When you talked about accepting accidents and failure as normal, what I thought about is the fact that-- similar to many DevOps practitioners-- we have blameless postmortems, where we make sure that the failures that happen in our production systems don't happen the exact same way more than once. And we accept the failures as normal by encoding a concept of an error budget of how much the system is allowed to go out of spec. And then third, we talked about making gradual changes. And when you said that, I thought about the fact that we canary things, that we roll things out to a small percentage of the fleet before we move them out for all users. And then fourth, when you talked about leveraging tooling and automation, what I thought about is the fact that we try to eliminate manual work as much as possible. So we measure how much toil we have, and then we try to automate this year's job away. And then fifth, when you talked about measuring everything, I thought about exactly that measurement of measuring the amount of toil that we have and measuring the reliability and health of our systems. SETH VARGO: I really like that. Class SRE implements DevOps. We should get that on a shirt or something. But just like a class in a programming language, there might be additional functions or methods that don't necessarily correspond to that interface. Or the class might implement multiple interfaces. Do you think SRE is like that? LIZ FONG-JONES: I absolutely think that's the case because of the fact that SRE doesn't do things in the exact same way that other people that implement DevOps elsewhere might want to do. So we'll talk a little bit more about those differences, such as how exactly SLOs work, which are a very specific concept that we implement in order to make SRE successful. SETH VARGO: Great. Well, that settles it, then. It turns out that DevOps and SRE aren't two competing methods, but rather close friends designed to help break down organizational barriers to deliver better software faster. Thank you, everyone, for watching. Please be sure to check the description below for more links, and don't forget to subscribe to our channel. Stay tuned for our next video, where we will discuss the differences between SLIs, SLOs, and SLAs.
B1 US liz seth fong jones production implement What's the Difference Between DevOps and SRE? 44 1 Marsen Lin posted on 2018/09/02 More Share Save Report Video vocabulary