When Things Go Right

One of my favorite things about the devops community is blameless postmortems. John Allspaw at Etsy first popularized this with a blog post in 2012, but the theories go back to work that Sidney Dekker's "Just Culture" work on human error in medicine, emergency services, and other fields. They're the same ones that gave rise to agile software development and the devops movement in general:

Complex systems have complex failures.

When you do a blameless postmortem, rather than looking for a single root cause for a problem, you look at the whole situation, gathering stories from everybody involved. You assume that people were acting in good faith based on the best information they had at the time, and ask what particular confluence of beliefs and actions led to things breaking down. You ask "what about this system made it easier for things to go wrong, and how do we make it harder for things to go wrong in the future?"

It's magical to see in action. When people don't fear punishment, they're more likely to accurately recount their actions in a situation, and much better at problem-solving together.

The fundamental attribution error applies to more than failures, though. How many times have you seen something go well, and attributed it to how smart, driven, or just "awesome" a particular individual or team is?

Like with failure, when good things happen, us humans tend to credit the most visible face of the success, rather than focusing on the environment and situation that stacked the odds. Psychologists call this the affect heuristic. We're more likely to think we made a great hiring decision in a PM or an engineer than we are to think about the way we structured the team, or the way we project-planned to ensure multiple paths to success. Not looking at the broader situation makes success harder to replicate, both with the same team in the future as well as with other teams across the business.

I think systems thinking can be a scary to organizations, because it subverts your usual outcome-effecting levers. Business should be simple, right? Reward those who execute well, punish those who fail. But in the complex systems we work in, punishing failures makes subsequent failure more likely, and crediting a single person for success makes it harder to understand how to replicate that success.

We all think we're only hiring the top 1%. We all think we’re better than average, but statistically this can't be true. Thankfully, though, there are a lot of problems in the world that need solving, and many of them come down to care, experience, and persistence. The runaway success of Agile, to me, comes down to recognizing that average people can do great work when they understand the problem and work in a system that supports them.

When my team does a sprint retrospective, we separate out "what went well" from personal thank-yous and gratitude. Sometimes there are heroics or somebody going above and beyond, and it's always good to recognize those efforts, but simultaneously, we want to think about how to create a world in which heroics aren't necessary. "What went well" is for the ways we worked well together that we aim to build on the next cycle. And it works— I regularly see one person's great idea one sprint turn into standard team practice the next.

It's natural to ask "why?" when things go wrong. Try asking the same question when things go right, and you might be surprised at how right things start going.