Better, Faster, Cheaper: What Does It Mean for Ops? – John Willis

So I’m Botcha Galupe, faster, cheaper, better I’m not gonna tell you about my background If you actually want to know me you gotta remember that horrible spelled botchagalupe thing from there, that’s my Gmail, that’s my Twitter, and then all my presentations that I’ve done over the last seven years are there, plus my bio stuff The bigger things is that DevOps Handbook I’ve been one of the co-organizers of these events, the DevOps Enterprise Summit Also one of the co-organizers of the original DevOps Days A lot of start-ups I was the ninth person in Chef, helped build the customer-facing business there, got lucky, and sold the company to Dell, and self-appointed my name the director of DevOps I sold a company to Docker about two and a half years ago called Socketplane, so a lot of start-ups That’s the book This is worth mentioning I spent a year developing this course It was supposed to be three months Linux Foundation is really mad at me, but it was a labor of love because I included everything I mean, everything Sengay, Deming, Conway, I mean … And a lot of technology It’s about 15 hours of videos It’s free Linux Foundation If people tell me “John, can you come train DevOps?” I’m like “I don’t have to It’s free It’s everything I know about DevOps.” Alright so, I’m a big shot, right? Hey man, I’m a big shot DevOps So last year I’m in China I don’t know why I do these poses and apparently I do it a lot, so I’m now trying to do it less, but these knuckleheads over in Seattle, friends of mine actually, but they decided that they would go have some fun with this thing Then I think they caught me with another pose and started another thread, so the thing is you can have a PhD, you can start the DevOps movement, but unless you’ve been meme’d, you ain’t nothing And then there I am, of course, with Muhammad So there you go Alright So now, to get serious, folks What if I told you you could be 2000 times faster than your competitors? What if I told you you could be 100 times more reliable? Maybe 200 times, depending on who’s counting, than your competitors? What if I told you you could have both? You’ve kinda seen this already Jean talked about this morning You’re certainly gonna see it tomorrow if you’re seeing the [inaudible 00:02:33] presentation about DevOps survey I’ve got a little bit of a side story on this faster and more reliable I call it immutable service delivery pattern Right? I would say, and even though these all certainly do overlap, I would say that DevOps will make you faster Of course there is resilience that’s built into DevOps But I would say, combining that with a container strategy, and my favorite container happens to be Docker, will make it cheaper I’ll walk you through some of that All that’s great and in fact, you have a billion containers running somewhere, and they’re all over the place Wait a minute, how do I fix things when they go haywire? This takes us back to what we learn from Lean, particularly a book called Toyota Supply Chain I think if you combine those ideas, then you get kind of the best of everything You get to be basically very fast You get to containerize everything You can have gazillions of these containers out there, but just like an auto manufacturer, you can have meta data, you can have bill of material There’s a reason why a car company knows all the brakes in a certain car All the cars that are out distributed because they have bill of materials They have manifests They know how to recall and which cars need to be recalled So you can build that into your structure if we learn from some of the things that Toyota and Lean has taught us I’m gonna walk you through that So faster I’m not gonna spend … I’ve got 25 minutes I have to cut somewhere Sorry, DevOps Conference, I’m gonna cut out the DevOps stuff No The truth is you’ve heard a lot of this This is usually an audience that hasn’t heard much about DevOps So I’m gonna go a little fast but I’m around, and there’s a longer version if you wrote down the Botcha Galupe thing, you can find the longer version of this I’m gonna assume there’s certain things you’ve heard already, or you’re gonna hear over and over I do think this is my favorite all time picture that represents what DevOps is all about Right? One thing I always tell people is, people get all held up on the Dev, the Ops, why is it … Could it be More Ops? When could we sell? It’s a metaphor It’s a metaphor for two teams that have this wall that needs to be just demolished, and what you have really on the top is it’s about shortening lead times Right?

We wanna be faster We wanna take ideas and we wanna take them We have The classic was the a-ha to the ka-ching How do you get an idea to making money How do you make that faster? Well you have to shorten lead times We learn that from Lead How do you that correctly and resilient? You get really good at amplifying feedback loops So this picture tells you everything you kind of need to know Early on in the kind of DevOps movement, if you will Damien, my DevOps cafĂ© podcast host, we codified this thing accidentally called CAMS Culture Automation Measurement and Sharing It’s kinda stuck as kind of a loose taxonomy Jean, if you’ve read the Phoenix Project and certainly I hope you’ve … Well actually you can get a free copy, the DevOps handbook We kind of built the book in this kind of idea of three ways The first way is a left to right flow The second way is amplifying feedback loops Right to left telemetry Get really good at … I’ll show you some of the other slides But then the third way is this continuous learning, right? What I realized is actually these actually really map very well Something I kind of figured out along the way Although CAMS was much earlier in this But culture is culture is culture Lots of presentations and ideas on how you deal with culture Certainly automation can be attached or aligned with first way measurement, second way telemetry, and sharing is kind of third way You’ll hear more and more about the DevOps handbook, but really we break up the three ways and we do a lot of concentration on case studies on [inaudible 00:06:38] and then the culture is extremely important In fact, one of the things that came out really early on the CAMS thing is, if you can’t get the culture right, don’t even bother with the … One of the sayings that one customers which is AMS not CAMs, like you can’t do AMS If you do automation measurement sharing, I said that wrong but, automation measurement sharing without the culture is like going fast maybe in the wrong direction You have to get the culture right You’ve seen this, and if you haven’t seen it you’re gonna get a way better explanation of this on tomorrow afternoon by Nicole and Jess and they’ll give you the 2017 This is 2015 There are people who absolutely wanted 2015 This is 2015 I started DevOps movement, I been doing IT operations for 35 years now DevOps movement was interesting if it was a stake around 2009, 2010 You saw this kinda beautiful movement of operations and development and things actually making sense for the first time in my career One of the things that we empirically learned along the way, which was that you could go fast and be more resilient, but we didn’t have … It was like you had to have a case study and say “Well they did it.” But then Nicole came in in 2015 and worked on the survey What we found from the statistically sound data, and if anybody has read the book, The Science of DevOps, she explains psychometrics Basically we find that based on your culture, the generative cultures, and Nicole and Jess will explain this more tomorrow, are 200 times faster than the pathological cultures That’s what the survey data says That’s great, but they’re also, depending on which survey and which data, 168 times better at resolving issues, MTTR In 16, it was 2500 That was Jean’s slide this morning What this really means is that the iron triangle of you kinda pick two, or even better yet is, be reliable or fast You have two choices, you can’t do both We see this with pattern after pattern where that’s not true Now we have statistical data to prove it’s not true You can do both What’s the linchpin? You get a free book What’s the linchpin of going fast and resilient? Come on, somebody Culture, right? It’s culture That’s what glues the two things together Otherwise, you can’t Then you’re into that broken model of when you go fast, everything breaks This is in 2006 Imagine it This is 2006 Jess Humble, Dan North, and Chris Reed did this at Agile This is basically how this feedback loop works, right? Everything’s automated, if possible Somebody commits a code, it goes through the gates, it hits one gate Red, it goes back Gets recommitted, hits the next gate, next gate Red, goes back Over time, you’re creating an amplification of feedback loop and you’re getting really resilient It just has to constantly improve itself It’s kind of an anti fragile thing If we look at from Lean, there’s this concept of … Not concept, it was a thing in Toyota

called the Andon Chord The Andon Chord, when they were manufacturing cars at Toyota, there was a rope It was a rope that anybody on the line, lowest status, didn’t matter, if they didn’t like what they say, they pulled the rope, and the rope stopped the line The first thing that the line manager would say to that person is thank you Before they even investigated It was a psychologically safe place that you could stop the production line, and the person who’d normally yell at you in most organizations, before they even knew what happened said thank you There’s a story about a Toyota plant in Kentucky, that they were producing 2200 cars a day An analyst, an industry auto manufacturing analyst went and said “How do you produce 2200 cars a day? That’s amazing!” You know what the answer was? “We pull the Andon Chord 5000 times a day.” If you get that, you get this idea If you don’t get it, then let me show you what Google does These numbers are like four years old This number actually, this is the number that actually I screwed up there, but that’s a 100 million It’s over 100 million They run 100 million tests a day That is that Kentucky plant That’s this on steroids Sorry people trying to take pictures, I’m bouncing around That’s the point The more you build this anti-fragility into your infrastructure Those gates, that telemetry, right? Again, there’s a newer slide Some of the Google folk had promised me they’d send it and they haven’t, but all these numbers almost double They’re all phenomenal, but the one to me is the 100 million test cases run daily How do you produce really resilient software that … You could hate Google or love Google or whatever, but you think about the tools that are offered They’re pretty freaking resilient There’s the kind of legendary Amazon story of 11.6 second mean time between deploys into production, right? These stories have been told over and over It’s the same type of mentality You have to build that level of A, culture, B, resilience, building that kind of Andon Chord into your infrastructure and your delivery pipeline Right about now is when I’m looking at faces and half the people in here are like “This guy is so full of shit He’s never been in an enterprise.” And that’s not true I have Because … So I shamelessly stole this from a guy named Pete Cheslock, who originally said DevOps, and that was security I see a couple of you Right now you’re like, “Oh he’s totally doing the unicorn poop thing and we’re gonna have to shovel it because our manager’s in the room and he’s gonna go back and ask us why we don’t do this And you’re gonna think it can’t be done in the enterprise.” Absolutely not Well we know that’s not true You’re all here We’re going onto our fourth year of this summit in San Francisco in November So we have stories I mean, these are short lists Like Ticketmaster And following the same principles, right? Early on … Jean talked this morning about that first DevOps enterprise summit One of the things going into that was we knew the enterprise could do DevOps the same way the web scale companies did The large consulting companies were like “Oh no, you guys are a bunch of kids You don’t know what you’re doing.” I’m not a kid I’m 58, right? I told Jean I was like, when we present, when the customers present at that first DevOps enterprise summit, and they talk about A, how hard it was but B, they didn’t compromise They did it the way literally the web scales, the Twitters, the Googles were doing it, to a certain extent, and they were having success It was hard But like Ticketmaster They did it the way you’re all hearing about the stuff over this day and the next day Nordstrom, 20% shorter lead time Same concepts Value stream mapping Target USAA ING All these companies There are all videos out there There’s hundreds of them now Stories that didn’t compromise, that are enterprise, and it was hard Don’t take anything I’m saying right here and make it sound like “Oh, he thinks it’s easy.” No, it’s incredibly hard in enterprise The older the enterprise, the harder it is Doesn’t mean … You have to do it You got no choice Your competition is right around the corner Right? So that was my DevOps, right? So now I’m gonna talk about Docker, right? I think most people know about Docker these days so I’m giving you an overview of Docker I will say if you don’t know, IBM had a nice white paper It’s pretty outdated but it describes the difference between hypervisor based compute and what we would call OS level compute So hypervisors basically get the whole stack You take a bare amount of machine, you carve it up, you’re running full stack, compute

everything, the operating system The difference between a container is that you share the kernel with the host so that your compute instance is typically an order of magnitude smaller, an order of magnitude faster in its start up and shut down time So you get this incredible density At a high level, we call this OS levl virtualization Provisions in milliseconds It basically almost meets bare metal runtime performance, because you don’t have a hypervisor brokering all the in and out of memory and all the things that happen in [inaudible 00:15:41] What we find is almost anything can be containerized Old XP applications I saw a Four Train application recently containerized I mean, so the idea that it’s just green field and cloud native, that’s not true You can pretty much containerize almost anything They’re lightweight because you can now start thinking about only building the minimum of what you need It changes the paradigm from running an application and okay, here’s your VM, and it’s got all this stuff I’m kind of lazy, I’m just gonna leave all that stuff there This paradigm kinda forces you to change the way you think, because you start thinking about minimal It’s easier to start thinking inward out What is the minimum I need to run this thing? It’s a lot easier You don’t have to a Linux kernel expert to try to figure out “Hey, you know I can build this just enough Operating System, put the application.” It even gets better Why Docker? Linux container’s been around for a while A lot of people can do it At the end of the day, we didn’t invent it, but we did invent the simplicity in the workflow That’s why is everybody running around with their heads on fire with Docker It’s because we put a really good technology and made it easy to use for people, in a sense that we added a workflow No, a pull, a push We emulated GIT There’s a lot more to that story but the workflow behind Docker is very malleable, and very malleable to developers, which is really … This is very rare in our industry that there’s a solid developer movement, like the developers are like “We want this Get out of our way.” Right? Docker has definitely been one of those Alright Moving on We’ve always been hit with this idea of like Docker is insecure, right? You know what’s interesting is, like three years ago, somebody showed a slide, they said “Docker’s amazing It’s awesome, but don’t run it on your production.” What’s unfortunate now is some of those slides, they still have those slides in their deck today, and that’s bullshit I would argue, I could make the argument and I think I would win, is that you could run a container more secure than VM today We have put … People joke to us about our unicorn, all the investment we got, all the money we got We put a lot of that money into security We have built in image scanning You can sign a container image We can actually … The signed image can be on both ends It can be on the push to the [inaudible 00:18:15] on the deployment So you can put policy on signed images You’re guaranteed now one is that, not guaranteed, there’s no guarantee in life But you are, there is a really good shot that any known vulnerabilities are covered and not in the container image, and you know the providence of that image We got trusted registry We got LDAP support, so now you can add policy into your internal security, encrypted, read only containers You can build a container that has user name space, so you can run [inaudible 00:18:53] container that won’t have root access to the host That’s a big deal We have all the LSMS security support This is a big deal DOD, unofficially but, they loved this because if you know what you’re doing, and I’ll actually show you an example a little later where you could turn off all the sis call kernel opportunities, and then work your way back and only add the things It’s not a trivial process but if you know what you’re doing, now you can make the thing incredibly secure, because now you can say the application only needs these three capabilities and you could turn everything off by default So now you’ve got it to a really locked down state We got secrets management so in your composition, we can go ahead and you can now, we got kind of a sequence engine that can basically decompose on the fly So you can put token passwords and token tokens, if you want This is what I’m going to show you at the end We have, I’ll show you in a few minutes, a thing called Immutable Operating System It’s not coming soon, it’s actually available now It’s called Linux Kit, where the operating itself is actually adjusting time It can actually build from the annul files

This is a big deal I’m gonna skip this This is a great presentation about immutable infrastructure The only thing I need when immutable infrastructure is that the idea that all your service definitions become artifacts, that are basically read-only the minute they leave the developer’s desktop So imagine that you basically un-commit if it goes green through the pipeline, and the developer wears a pager The binary artifact that they tested in their development environment is the same exact binary artifact that’s running in production That’s a big deal This is something, but I wanted to show you this is the other thing that’s really interesting is this modernization, and I’m just not gonna have time to do it, but I wanted to show you this This is pretty cool I was talking about this thing called Linux Kit Basically, when people define Docker things, they could create this thing called Compose They build composition definitions for a service What’s really interesting here is, I’m gonna use some buzz words, but it really is the ability to create a converged infrastructure Your network, compute, and storage in one human readable definition for a complete service definition If you look at this real quickly, I have micro segmentation That’s [inaudible 00:21:20] famous word for building multiple interfaces, but I can segment I can have the front end be pretty open The middle tier The front end will never have another database tier in it Network So I can build my networks, I can have my networks be encrypted I can use overlays I can use VX [inaudible 00:21:39] I can mix and match at the service level, with multiple interfaces, and I can do the same with storage I can have multiple storage interfaces I can be using [inaudible 00:21:48] at one level I could be using Cassandra at another level All this is in a human readable file I commit that and it builds my infrastructure But what’s even better now is with that same commit, I can actually build a Just in Time operating system [inaudible 00:22:05] based This creates an immutable operating system that basically, same properties, now my service and composition leave my desktop all immutable, completely defined converged infrastructure, with an operating system that is built specifically for this service, or these services I get to even level, the kernel level, I want it to find how the [inaudible 00:22:30] structure is, not only is it a read only operating system when it’s executing It’s basically anything that runs on it is a container So you can’t run anything on it unless it’s a container I’m gonna be giving a presentation in Portland later this year, and I’m gonna challenge the attendees to compromise I’m gonna run it, and I’m gonna challenge the … And I’m gonna say “By the way, I’m not a security wonk.” And I’m gonna challenge a bunch of security people and say “Go ahead and compromise this I dare you.” I don’t think they’ll be able to do it First of all, [inaudible 00:23:02] operating system Second is, they’ve gotta basically know how to implement … they’d have to implement a container, and then they’d have to compromise another running container that might have capabilities and network segmentation Gonna be really turkey Finally, I’m a big fan of Deming The thing I wanted to point out is this book called Toyota Supply Chain which works really well If you wanna really go fast, you want all this stuff, the problem is you’ve gotta be good at these principles that Toyota was really good at One of them is … you know, I’m gonna skip this because I’m running out of time Sorry That’s the problem with doing a 45 minute presentation in 25 minutes If I have time I’ll go back In that book, Toyota followed these three principles Fewer better suppliers, higher quality products, and then track what you use Right? So it gets pretty simple They call it the four V’s right? That you wanna decrease variety You wanna increase velocity You wanna decrease variation You wanna increase visibility Variety is key because you basically start thinking, Deming would say this, that you want to shorten the amount of suppliers that you have In fact, that prior chart shows that GM had more suppliers than Toyota vault, but they did more in-house work That sounds weird, right? But it’s true The more suppliers … So the healthcare.gov debacle in the US? This is what, two years it took to develop? Maybe three years? They had 17 java logging frameworks Google, by the way, if you read the Google SRE book, two kernels Two java log frames I mean, they even have a team that they call the death squad, which they will actually go into your office “Hey, you’re running an outdated kernel.” “Yeah, I got it Could you leave?” “No, no, my job is just to sit here and wait til …” It’s kinda … [inaudible 00:25:00] is actually in death squad

If you’re on the phone with Jean, this is why I’m running out of time, you’re on the phone with Jean and Jean says to do something, and I’ll say “I’ll do it Jean, just give me a couple days.” He’ll say “No, no, why don’t you do it right now?” And then it’ll be like, he’ll wait and like “Alright, I’ll do it now.” But yeah, Google across the board, right? Two of everything You get these efficiencies, like how many java frameworks do you have? How many operating systems do you have? Right? Like there’s some magic in this consolidation Then you say “Oh, it’s hard We run old versions of this and that and oh my goodness.” Tough If your CTO’s standing up and saying we wanna be more like Google then point out to him, like “Hey, then help me get rid of these 18 versions of a logging framework.” I wrote a paper on this called Docker and the Three Ways of DevOps Again, if you wanna look at what does Google do, you learn faster, limited frameworks, you limit vendors, you do small batch These are the things we started talking about in DevOps handbook This is where immutability comes in containers Right? So the fact that when it leaves a developer’s desktop, if everything, including operating system, doesn’t change There’s not opportunity to change I love Chef I still [inaudible 00:26:14] Chef But the bottom line is if you’re infrastructuring code at every level, you’re adding variation There’s a great paper called Order Matters, about how like any of these little things like a script or bad return, I mean there’s so many ways at scale, you will get variation if you’re not immutable So immutability And then divisibility is the kind of last part, which is you could put metadata in a container Because it’s binary, you know it’s left the developers On the commit, you put the commit [inaudible 00:26:44] on it You could put other meta data in there You could bill a material You could put all that stuff So now if you have thousands or millions of these things running, it’s very easy to basically say “Gimme all the department code for that have this thing, that has this framework, and you get a list.” So it’s actually more manageable The fact that you’ve got millions and millions, but the fact that you have all this meta and you can actually now it’s included binary, and you know it’s guaranteed to be in there There’s no mishap, you’ve got the commit [inaudible 00:27:18], you’ve got bill of material, you’ve got everything you need, and anything else you wanna add Because you could add meta data to a container image til the day is long This is a somewhat [inaudible 00:27:29] Ari Paneer, the inventor of M Collective He did this very early on If you want the blog article, I’ll give it to you So back to this The only point now is there’s one last overlay that’s really important, which is the DevSecOps overlay So we got this, right? That’s great, and we’ve been really good at this We did development, we got operations, we got QA, we really won And now we’re seeing that the developers are finally getting on, now the security people are not finally getting on but getting on board with this idea of that they’re adding their overlay abstraction into this thing I kind of adapted, stole, adapted … but DJ Schleen gave this presentation at RSA earlier this year but this was their … So you look up RSA DJ Schleen, you’ll see this is an overlay on their already existing Dockerized immutable infrastructure You see all the threat modeling, the development static analysis, they’re injecting all that stuff into the automation pipeline The thing here is, it was Aetna right? Because it was DJ Schleen, but here’s the thing and I wanna end this up with … I promised you this 2000 times faster and 100 or 200 times more reliable, so the [inaudible 00:28:41] really interesting About three or four years ago, they started down this kind of DevOps path They would track security defects per 10,000 lines They started out with ten Through kind of DevOps practices, they got it down to four They applied the Toyota supply chain, kind of minimized suppliers, that was really actually the only principle, the single principle was just minimize supplies Get rid of … Get down to two java logging frameworks They got it down to one Then they went with a Docker immutable model for delivery In production These are production applications They got it down to 01 In fact at RSA, that was a year ago, at RSA they said that number’s actually zero now And it’s a pen test, bug fest zero This is not a made up zero So imagine taking ten security defects By the way, a bug is a bug is a bug is a bug, right? Security is a bug Let’s stop bifurcating security from everything else Imagine being able to follow these principles, and the principles that I described you are the ones that they did that took them from basically ten down to zero I’m being generous and saying .01 or .1, but you know The kinda last thing The guy at RSA who gave the presentation, DJ, I never met him before, and the speaker he comes up to me and he says “Dude, I love Docker.” So I’m like “Dude, I didn’t write it

Calm down.” And we had this conversation why he loved Docker, and he said, “Here’s the thing, John.” This is the part where it doesn’t make you like haa, then maybe you’re in the wrong business, I don’t know But he said, “With Docker, it’s one service, one container, one read only file system,” here is the one that should get you to jump up and cheer, “one port.” I see some smiles in the audience, right? Imagine that One port How much wasted time have you all done on firewall park and ports and all that, right? Imagine that One service I like to repeat this It gives me the chills One service One container One read only file system One port Right? There’s some … If you’ve been in this business for more than three years, this is some magic Anyway, that’s my presentation Thank you very much