Another year behind us in the post-modern Internet world -- a world where cloud computing provides near-instant infrastructure and application platforms like Drupal provide near-instant data-driven applications [still a world where you should be following an End-of-Year IT Checklist, however]. Sadly, most of the IT world is stuck an an era where IT operations ("Ops") and developers ("Dev", see later note about "Dev" applying to more than just code-from-scratch) are separate -- and often warring -- factions. While Dev groups are usually driven by user needs for frequent delivery of new features, Ops groups focus more on availability, stability of IT services and IT cost efficiency. These two contradicting goals create a "gap" between Development and Operations, which slows down IT's delivery of business value. Here are the typical characteristics of such a gap [BEGIN *1]:
- Developers are often not concerned about the impact of their code on Operations. They deliver their code without involving Operations into architectural decisions or code reviews.
- Developers fail to communicate configuration or environment changes necessary to run the updated code base.
- Developers apply configuration changes manually to their workstations and do not document each necessary step. Often, coming up with the necessary configuration parameters for software involves experimentation with various parameters. After reaching a working state it is often difficult to identify the minimal steps to reach the working state.
- Developers tend to use a tool set optimized for rapid development: Fast feedback on code changes, low memory consumption of runtime environment, etc. This tool set is very different from the target runtime environment in Operations where stability and performance trump flexibility requirements.
- As Developers work on desktop computers they tend to use Operating Systems optimized for desktop use. The runtime environment is usually running a server operating system. In development, the systems run locally on the developers workstation. In Operations, the system is often distributed amongst various servers like web server, application server, database server, etc.
- Development is driven by functional requirements usually directly related to business needs. Operations is driven by non-functional requirements like availability, stability, performance, etc.
- Operations tries to minimize risk for delivering on non-functional requirements by avoiding change If frequent change is avoided, but the amount of necessary change stays constant, every change will be bigger. Bigger changes involve more risk, as more areas are affected by any change In trying to avoid change, Operations slows down the flow of new features to production and therefore slowing down Development's ability to deliver features.
- Operations might not be fully aware of the application's internals making it hard to correctly define the runtime environment and update procedures. Development might not be fully aware of the runtime environment making it hard to correctly adapt the code accordingly. [END *1]
Yes, we've all seen those things happen, tragically. Creating a culture where Dev and Ops are brought together in a marriage of process and responsibilities is long-overdue, and 2012 is the year to do it in YOUR organization. 2012 is the year of DevOps!
Curiously, I don't think that "Dev" has to be limited to groups that write code from scratch. In fact, you can and should be doing DevOps even if you don't have a single programmer in your organization! "Dev" in this context really represents any group which introduces new technologies and applications into the enterprise. For example, in healthcare there's the need for the Dev group -- that mostly buys canned applications from the healthcare vendors -- to be more tightly integrated with IT Ops, too. And all the same principles apply. Or, maybe your "Dev" group customizes some set of off-the-shelf applications for your Enterprise. Of course, it could be a group that writes code from scratch in a modern development environment like node.js, RoR, or Drupal. Or, if you are one of those (shameful) organizations that actually does write code in some legacy platform like .NET or Java (hang your heads!), your Dev group needs double the DevOps that everyone else does. DevOps works in IT organizations from 2 to 20,000 people, and everything from high-tech startups to traditional large enterprise -- it's not constrained by age or size.
What is DevOps? DevOps is a grand unification of philosophy around how to manage Development (programmers, application analysts, application owners, project managers) and IT Operations (system admins, network admins, security, data center, storage, database admin) in a tightly-integrated way. DevOps is the belief that working together as a collaborative team will produce better results, and break down barriers and finger pointing. This is accomplished through a combination of cultural and technical changes. Note that DevOps comes in all 65,536 shades of grey -- you can implement all, lots, some, or very little of it in your organization, your choice!
The cultural changes necessary to bring Dev and Ops together in a useful way vary between organizations, but here are a few suggestions:
- Bi-weekly meetings with Dev and Ops staff (mandatory) presence that are short (15 min) deep dives into a technical topic of interest (promotes better shared understanding of environment).
- Find those special individuals that have both Dev and Ops skills and make them official liaisons between Dev and Ops.
- All Production environments mirrored by identical Development environments that can be used for experiments and evaluation. Promotes a safe learning enviornment.
- Dev and Ops staff all have scheduled "office hours" at least once a week where anyone can come and ask any question or request help with a chronic or puzzling problem.
- Focus on automated testing of all infrastructure and software components. Dev and Ops commit to create and share their testing scripts. No app lauches without automated testing in place at both the infrastructure and app level.
- Automated monitoring or platform monitors infrastruture and software layers 7x24, and pages Dev and Ops 7x24. Both Dev and Ops have 7x24 accountablity for the performance and availability of the environment.
- Regular code reviews are required, and Ops is involved with code reviews. Regular infrastructure architecture/config/outtage reviews are required, and Dev is involved with infrastructure reviews.
- Shared sign-off by Dev and Ops before any application goes live.
Bryan W. Taylor has an excellent post about the origins and components of DevOps. For some suggested technical aspects of DevOps, I've included Bryan's list here (condensed and adapted slightly):
- Infrastructure Automation. Use cloud and virtualization. Have standard images. No exceptions. Eliminate questions and "creativity" in the provisioning process. Use tools like puppet, Jenkins, and Job. Measure your provisioning time. Cut it to minutes or seconds.
- Standardized Runbooks. Each application and service that you build can't have it's own story. Developers don't get to change how their app is started, what it's installation looks like, where its logs go, where it's configuration goes, what container it deploys to. DevOps writes this once and stuff that doesn't comply doesn't ship.
- Fully Automated Deployments. The app should be in one artifact, it's configuration in another. The deployer takes one bit of information (the app/service name) and looks for updates in the one standard way. If they exist, they are pulled down and installed. One click deployments, then...
- Continuous Deployment. One click deployment is one click too many. Build a pipeline and when all the tests pass, no clicks happen and the code is promoted and installed.
- Advanced Test Driven Development. Not just unit and integration tests. I'm talking english language Behavior Driven Development (eg: Cucumber) Including for your UIs. 100% no exceptions. Have your quality/compliance team do audits to make sure. Even this is not enough!
- Minimal Marketable Features (MMFs). If it's possible to split your feature, do so. When developers finish stuff, assign them to in-flight MMFs first until those are "full". Stop starting and start finishing. Only pull new features into WIP when forced to because a free developer can't help anything in flight go faster. Management can juggle the roadmap or backlog all they want until it's WIP.
- Ship When Done. I've never understood timeboxed iterations. I call them calendar complacency. Many agile proponents haven't heard: Timeboxed Iterations are Dead.
- Runbook Automation - Take common failure modes and automate their responses. Have a socket leak slowly filling up your file handles. No!? Good. Monitoring them and automate the bounce anyway. Have bad memory?!? No!? Good. Automate a from scratch deployment anyway.
- Perpetual Beta. Let customers control who can see "Beta" stuff. Call this user acceptance testing, so that you can get rid of the waste. Let internal customers pull value by controlling when "beta" ends for a feature. Deliver features fast enough so there is always something in Beta.
- Automated Recovery. When web server #3 has some issue, what should the response be? Spin up a new VM, redeploy the app, put it in the pool, and throw away #3. Hone your ability to do this quickly. Measure it in seconds.
- Metrics. Measure stuff that matters. Time is money, so measure how long things take. MTTR is critical, so things like failover time, rebuild from scratch time, app start time, need to be measured. Performance is important, so latency and throughput, etc... should be quantified. There's only a couple other useful things: measuring test code coverage is good, measuring cyclomatic complexity of your code is good.
- Process Tooling - Have a single source control solution. Allow all techs to see everything in it, across all teams. Invest in continuous integration environments, monitoring, and both runbook and infrastructure automation tools. DevOps should own administration of these, and be expected to use them to demonstrate waste eliminate in the IT process. DevOps delivers metrics data, as above, and owns the plan and its execution to improve those metrics by leveraging software process tooling.
Interested in DevOps? If you're in/near Boulder, join us at the DevOps Boulder MeetUp (www.meetup.com/DevOps-Boulder).
Happy New Year -- make it a DevOps year!
[*1] Text adapted from http://en.wikipedia.org/wiki/DevOps under the Creative Commons Attribution-ShareAlike License.

