Agile in the Ether, IRL: Beyond DORA

In February 2024, Agile in the Ether had its first in-person event – I wrote a bit about that in a previous post. Every part of the day gave me lots of ideas of things to look into further, or go and try – I’m making a few blog posts so I remember what I wanted to do next.

Inside the Museum of Liverpool: a map of the city, in front of a window with a view of the Royal Liver building.

Going beyond DORA: exploring the SPACE framework (or ‘DORA becomes a SPACE explorer’)

Sophie Weston talked through the DORA metrics (made famous by the brilliant Accelerate book), and some frameworks that its lead author Dr Nicole Forsgren has gone on to develop with other researchers (SPACE, DevEx). This talk, the group discussion following it, and various things I’ve read/heard before and since have all given me piles of things I want to look into further.

The Accelerate book described in detail how the initial DevOps Research and Assessment (DORA) research programme was carried out and what it found. There’s lots of interesting stuff in there, but the part that’s caught lots of people’s attention was what’s been called the “4 key metrics” or the “DORA metrics”. You can see them, and answer a few quick questions to see how you compare to others, here: Take the DORA quick check.

I’ve heard from lots of people who’ve had positive experiences from examining these metrics; they’re simple to understand, there’s coherent stories about why they’re important, and seeing how the rest of the industry does with them can prompt you to investigate where you can start doing something different.

But there are also stories of dissatisfaction with them. These metrics don’t cover much about user needs, business value, and other outcomes we know are important. And there are examples of people feeling pressure to improve them, of getting focus on the day-to-day changes in these stats and being expected to keep them trending in the right direction. If the DORA metrics are all you focus on, there’s no measure of what it’s like to work in that environment; isn’t this driving us towards the feature factory model that we all want to stay away from?

This isn’t at all what I’d expect using the DORA metrics to be like – but I’ve heard examples from people, both in-person and online, so that’s worth discussing.

Misunderstanding metrics

The first thing this made me think of: this kind of problem isn’t specific to one set of metrics. I was reminded of a Mastodon post from Willem van den Ende, which prompted me to quote the original wording of Goodhart’s Law:

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

Pressure …. control.

Common re-wordings of the law kind of miss that focus, e.g.:

“When a measure becomes a target, it ceases to be a good measure”

Willem linked to a post from Dr Cat Hicks that has lots of thoughtful examples of how measurement can stay useful and what gets in the way. That’s lead me to read/watch lots more from Cat: absolutely stacks of info on the exact topics I was interested in. She’s been working on a “Developer Thriving” framework, researching learning cultures, and more.

In a blog post, she explains how much context matters when deciding whether a metric is useful:

“For example: inside of a learning culture, software teams might use a velocity measure entirely differently, and use slowdowns as a signal for learning investment. But software teams that have a high contest culture, and expect to get punished for learning? Those teams might shy away entirely from velocity measures and consider them wildly inaccurate. So we can see learning culture as a core thing that changes the effect of “using a velocity measure” on a software team.”

This is a much-better-explained version of something I reflected on when writing about estimation approaches: “In a high-trust environment, almost anything can work. In a low-trust environment, almost nothing will.”

For the DORA metrics in particular:

They are very focused on mechanics … but that’s what they’re for. The research highlighted the surprising variation in these across companies, and how strongly these are linked to business outcomes. Being able to change software frequently and confidently gives you the ability to try things with users, and put all your good strategic ideas to the test.

They don’t mention the human side … but read further, the book and reports talk about that a lot. Improvements in DORA metrics are linked to reduced stress and burnout: in some places, every release needs months of planning, out-of-hours working, and possibly days of frantic fixing and finding out who messed up. To quote Charity Majors: “This is a quality of life issue”.

They can be hard to constantly track … but I’m not sure that’s useful anyway! Did you Take the DORA quick check? You’ve got a good sense of where you are already. And if you decide you want to try moving performance levels on one of them, what you need to do isn’t “pay more attention on the next release so it doesn’t fail”, or “try to squeeze in one more release this week” – you need to look at substantial changes to your process and capabilities. It’s probably not useful to check the metrics again for a few months.

Take a look at the State of DevOps Report 2023, good advice starts right in the prelude:

“Fixating on performance metrics can lead to ineffective behaviors. Investing in capabilities and learning is a better way to enable success. Teams that learn the most improve the most.”

SPACE and DevEx frameworks

I heard of these frameworks a while ago, but hadn’t had a chance to look into them – so I was very pleased Sophie’s workshop included a tour through them, and a chance to discuss what people thought. There’s an overview (and interview with the authors) on The Pragmatic Engineer site.

These have a different focus than the DORA metrics; they’re about developer experience, because the authors believe improving that will improve developer productivity. The “DevEx: What actually drives productivity” paper shows the three core dimensions that framework uses:

Label says "figure 1: three core dimensions of developer experience".

A large triangle saying "DevEx" in the middle, with labelled corners: "Flow State", "Feedback Loops", "Cognitive Load".

“Flow state” caught my attention: I’ve heard a bit about it in the past, including some views that it’s hard to define and might be more linked to enjoying an activity than achieving productivity. But I’ve barely looked into it, and don’t have any kind of expertise that would help me judge which claims are right. Best I can do for now is read a lot of sources and try to see if there’s a consensus.

Wikipedia has a page full of links and quotes about flow state – I’ll keep coming back to this. One paper (freely available) describes the concerns I mention above, in detail.
Dr Cat Hicks has a thread on flow state, and its use in DevEx particularly: “A continual critique of “flow” from psychs is that this is just a word we have used to mean “everything good about problem solving.””
I enjoyed Cedric Chin’s detailed discussion of the book Deep Work, which discusses flow state and lots of other concepts. Flow state is described as “filling your days with inherently enjoyable periods of concentrated work”.
I’ve been pointed to the book Flow, which introduced the concept … but I’m currently more interested in what the wider opinion is on it.

Another topic that came up: why developer experience, particularly? How separate is that from all the other roles that make up a team?

I think it’s often unhelpful to single out one role; lots of things are just about how humans work, and we’d do better to consider the wider picture. Dr Cat Hicks (yes, again) started an excellent discussion on this.
Some aspects of developers’ work is different from others on a team, in ways that aren’t obvious – Eleanor Mollett moved from delivery manager to a developer role and has shared lots of interesting reflections (3 months in, and other posts).
I’d love to see a helpful combination of these – here’s what’s important for everyone, here’s some things to keep in mind for specific roles. And I’d hate to see any measures or metrics get mis-applied in the ways discussed earlier in this post.

More to come

That’s my “further thoughts and reading” done for two of the four workshops we had on our IRL meetup day. If I’m quick we’ll cover the whole lot before the next day gets booked.