On the origins of the term DevOps
In 2010, I had the official title of DevOps at NASA working on the Nebula project. I am the first person I personally know to hold that title in my network of friends. At the time, and today I thought the title was ridiculous. I kept asking them to change my title to Wizard. They never did. I still aspire to hold the title of Wizard.
Today I am titled “Automation Engineer”. But, I’ve been involved in some pretty serious industrial grade datacenter automation for now going on 7 years. This is very rare in our industry regardless of what snakeoil someone may be trying to sell you. Not only that, I’ve had the distinct misfortune of having in the past and continuing today to operate at enterprise scale. For you silicon valley types, that’s 10000 physical machines or higher. Silicon Valley will call any old thing enterprise, doesn’t mean that it actually is.
Now the reason I mention this enterprise scale thing, is that scale is at the very heart of the rise of the term DevOps. Basically, DevOps began as a term in Silicon Valley among serial startup employees. With the rise of elastic computing and on demand virtualization the growth from tens of servers to thousands of instances changed the way many small to medium sized businesses operated. In Silicon Valley, where coders are in much greater supply, they began doing what they do best, reinventing the wheel.
On the Architecture of Pythonic Automation
I always like to use the example of Loudcloud when talking to folks involved in OpenStack today. Some would think their cloud stuff is new, and frankly it’s not. It’s not at all. It’s literally close to 15 years old. Older if you want to make comparisons to mainframe automation and LPARs. But I won’t do that. The reason I point to Loudcloud, is Opsware. The Opsware ( later HP ) server automation product is architecturally very similar to the design OpenStack has settled on. Which makes sense since both are service oriented architectures written in python.
In both OpenStack, and Opsware you have a message bus that links scalable services that communicate either as an RPC, or a RESTful type application. You have a central state database. And you have decentralization, federation, and regionalization needs. Additionally you have identity management, and RBAC. I could go on, but you get the drift. Modern automation is not significantly different from the automation that’s been going on since the late 90s in most enterprise environments.
Frankly, the reason cloudstack and eucalyptus never caught on is because enterprise folks looked at the architecture on them, and saw immediately the scalability issues inherint in them. Cloudstack is actually finally attempting to break their system into a more service oriented and horizontally scalable design. When nova was initially being designed, this was an imperative design goal, and remains one of the core design guidelines for all OpenStack environment.
On distributed systems
One of the newer paradigms in technology is PaaS, with Hadoop being among the best known examples. As with anything Hadoop builds upon a mountain of past advances in technology. But, it is among one of the most advanced concepts in distributed systems today, and the ramifications of our current degree of abstraction are not at this point well understood. Everyone is outside their depth in this field at the moment, and anyone who tells you otherwise is just plain lying to you.
In the earliest days of automation we were given to believe in the idea of the Turing Machine. What Alan Turing was attempting to do, was to distill the bare essentials of logic necessary to perform all logical programming concepts with the least bit of hardware necessary. The reason for this is simple, generic compute devices. Simple to build, simple to maintain, simple to replace. It’s an idea that crops up time and again.
Denis Ritchie once famously said “UNIX is basically a simple operating system, but you have to be a genius to understand the simplicity.” Every engineering teacher ever has told their students to ‘keep it simple stupid’. This is known as the KISS method. Basically, engineers have a nasty tendency to build rhube goldbergesqe contraptions. And the results are generally pretty horrific. The greater degree of complexity in anything, the more opportunities for failure. The more opportunities for failure, the greater the opportonity for exponential failures. So, in avoiding catastrophe, it’s always a good idea to keep things as simple as possible.
More to the point, people suck at complexity. If you ever had a solid physics teacher you learned how to solve Fermi problems. Named after the famed physicist Enrico Fermi, this would be a problem so hugely complex you could never answer it. Something like, How many leaves are there on earth? or How many books are in that library? or what is the estimated blast yield of a hydrogen bomb? Fermi would solve problems like these with astounding accuracy in his own head. The way he did this was by breaking down what seems to be a hugely complex problem through simple estimates and a reliance on sampling as an effective method for inference. What I mean is… say for the library, he’d guess how many books are on a shelf. How many shelves are in a case. How many cases in a row. How many rows on a floor. How many floors per building. Multiply it all up, and bobs your uncle.
Engineering, and automation in general rely on the principles of KISS. Keep it simple. The problem with distribute systems is, we run dangerously afoul of these core engineering principles. Now, we do a pretty good job of shaving off some of the complexity in the modern distributed systems architectures. For instance, by adopting the horizontally scalable service oriented view that EC2 promoted, we end up treating physical hardware as, lossy. We expect to lose it and adapt our software to plan around it. We also promote the use of simple services. But, what we haven’t done yet, is escaped the need for a state machine. We still need to be able to ask what’s the exact number relating to attribute whatever for node whatever. And because we need that we end up building pathways to query that information.
The modern distributed architecture is anything but distributed. Most rely on centralized or ‘clustered’ databases or key value stores to provide the information they desire. Some require API catalogs. Many require deployment and command and control to originate at a gateable centralized authority. We’re still building complex verticals. We’re still facing the added complexities of trust relationships developed against outside industry and transposed onto technology. We’re seeing complexity spread throughout our environments like cancer and just as destructively.
DevOps goes rogue
DevOps was more than reinventing distributed systems theory, or automation engineering principles and practice. DevOps was a group of folks saying, we need to go rogue. We need to hike up the flag of the jolly roger and set an example for the world. Things can be simpler. Or at least that’s what I’ve seen. I follow Jeff Lindsay’s github all the time. He seems to be on a chronic quest to try and simplify automation. He’s mostly not successful, but every now and again he raises some eyebrows. And that’s how progress is made.
Developers have the capacity to contribute to this field in exciting ways. DevOps wasn’t supposed to mean Developers running operations. That was NEVER the idea. Every operations guy ever would cringe at the very thought of it. Let’s be frank, talking a developer down from a friday afternoon deploy is effectively the ops equivalent of talking a jumper off a ledge. This is something that even senior devs have problems grasping. It’s not that we don’t trust you. It’s not that your one line change isn’t utterly benign. It’s that on a friday afternoon, people have a tendency to fuck up. It’s not a time that promotes full attention to detail and calm cooled adherence to procedure. People are too busy thinking ahead to drinks that night or camping that weekend instead of thinking about what it is they are seeing and whether or not everything is checking out as expected. There are some experience taught values that end up in the operations track for tech folks. At the same time, we want developers involved because they can bring new insight. And some of our preconceived notions or even experience learned tendencies will be challenged. This is going to lead to a good amount of culture clash.
I’ve said this countless times. When building a team to do ANYTHING, it’s community first and everything else second. You can’t put four of the world’s best in a room and expect success. You’ll end up with 1 guy standing on a mountain of 3 bodies and not much to show for the investment. Teams work because people mesh well together and can share load and inspire each other to greater success. Sometimes the 2 worst and the 3 middle of the road candidates will outperform all the rockstars and show ponies you can fit in your office. It’s about how well the team can interract. This is especially important with DevOps. Because you are going to have people who honestly and truly disagree for very good opposing reasons. And they are going to have to learn to accept each others ideas and not jump down each others throats when something does go wrong. Going rogue isn’t anything special. Going rogue and surviving is.
Ducks out of water
Imagine if you were working as a dev or an ops guy/gal and suddenly you were transferred for 3 weeks out of the month into the accounting office, because you must be good at math. You wouldn’t like that most likely. It’s not your strength so you’d be mediocre at best, you’d not be doing what is your strength so your skills would be atropheing, and you’d be wondering why are you suddenly in an entirely new career path. Yup. For some devs, and some ops, devops is that. Tommy the rack monkey who swaps disks, verifies backups, files tickets, and provisions systems may not be the best addition to a devops team. As he can’t code and doesn’t really care about automating anything he’ll just be confused as to why he is there and wander off. Jill developer who doesn’t know what a subnet is and wouth rather focus on optimizing a travelling salesman algorithm might not be a great addition either. Jill would’t give two shits about building a bunch of APIs off the same frameworks to deal with the world of unix nerd trivia that is operations tasks. She’s frankly start considering some friday deployments. And god help you if you grabbed your infosec guy and told him to go write an API for anything. He MIGHT actually deliver something in the next 10 years. Certainly he’d have written 30 some odd papers and filed a thousand vulnerability reports on half the worlds REST frameworks by the time he was done.
Not everyone wants to solve the problems of industrial automation. Be it cloud mechanics or the mechanics of vivescting a cow and packaging it for consumption. Yes it’s coding. Yes it’s technology. No, not everyone is going to excel at this. Just because you work on a turbine engine and know how to balance it, does not mean that you are going to be taking out Jester during Top Gun training. Abstraction layers attract their own supporters. I for one love automating datacenters. I dig it. For me, this is the clear path forward. And I see a lot of excitement here in the future. That being said, there is a whole great big wide world out there waiting for developers to get their hands on it and change it ( hopefully to our betterment ).
Engineering is built around the idea of repeatable, quantifiable, best practices. Most companies are not looking to go rogue and look for adventure in the 7 layers. Automation is what people want. They want to spend more money on developing their line of business and less money on everything else. To do that, you need teams that are interested in, and capable of dividing up a complex task into simple mechanisms, assemble them, optimize them, secure them, and place them quickly and cheaply. And to do all of that, you need a very solid understanding of what your risk margins are in regards to costs all around. The genius isn’t in the solution. You won’t finish with a chrome plated tube filled mass of gleaming metal hurling sparks and belching brimstone. If all goes well it will look as utterly mundane and cheap as possible. The solutions should be simple. Much to the pity of your markettng folks.
DevOps as a movement was an attempt to reimagine automation practice and design principles to meet the needs of a changing technological landscape. There was less of a focus on engineering because the goal was a research and development one. That’s awesome. And the idea of mixing two skillsets to try and cross polinate ideas and promote skill set growth is great too. I’d love to see security show up in the mix as well. It worked out really well for us at Ames to have a couple of security guys in our sprint planning sessions helping us drive our tickets and our sprints. It helped being able to assign them tickets and be assigned tickets directly by them. It helps to have people working on a project fully engaged as members of the team. But, don’t expect everyone on earth to shift over into a new culture and survive. More to the point, realize that at some point most companies want to solidify their research and hone it into a real engineering best practice or maybe even a product. Release that design is the mariage of research and development as well as engineering and even a little bit of marketting.
We’re building abstraction layers so that people can focus more on creating their stuff than solving the problems of our own industry. A developer doesn’t want to be working on unix trivia problems. And an operations team doesn’t want to be figuring out how to optimize their log parsers. But people do need to bridge those divides and build the abstraction necessary to bring ideas together for their common good. Whether they are developers, operations, security, or baristas it hardly matters. There is a reward and it’s probably worth the effort to reach for it painful though it may be at times.