Why do we measure things, or in this case, agile teams? Well, we might want to know what is likely to happen in the future (prediction), or we might want to know if we’re getting any better (progress), or we might want to see how much value we’re getting out of one team versus another (productivity). In order to make sound business decisions, we’ve got to know about all these things. But too often we forget that measurements of human teams building new software are not exact. Not only that, they sometimes conflict with the qualitative results we see in reality. As an agile leader, it’s important to counterbalance graphs and and calculations with quite a bit of Management By Walking Around, a healthy dose of Gut Feel, and a heaping portion of Actually Using The Software Yourself.
Please realize that I’m not arguing against metrics. They’re a crucial part of any agile endeavor and the lean enterprise. But let me tell you a story about measuring teams.
Which Teams Are Giving The Most Value?
There once was a product that had teams working in two locations. Several of those were in the United States, where tech salaries tend to be quite high. Other were in a different country – let’s call it Freedonia as a Marx Brothers tribute – where the team cost was significantly less (60-80% of the US team). Both teams estimated stories using Fibonacci series points (1, 2, 3, 5, 8, 13 …) and management tracked the teams’ velocity over time.
After about a year of this, management looked at the velocity charts and it was very apparently that the Freedonia teams were consistently delivering more story points per sprint. When they factored in the lower cost of the Freedonia teams, the difference was even more stark. Conclusion? Invest in expanding the Freedonia teams!
Hm, you think. He wouldn’t be telling this story if that was the end of it. And you’re right. Here are just a few of the problems with the conclusion that was drawn.
Problem 1 – Lack of Normalization – This one’s the easiest to spot and fix. The teams in the two countries were not estimating relative to the same baseline. Instead, The Freedonia teams tended to assign much larger point values for comparable stories. So the US teams would consistently estimate and complete 35 points while the Freedonia teams sometimes did up to 80!
Problem 2 – Externalities – If you’ve studied economics, you know that externalities are (roughly) costs or benefits that don’t get quantified as part of an economic transaction. In this case there were some major unbalancing issues in play. Most notably, the US teams and their associated business colleagues spent a very large amount of time attending to the technical and requirements needs of the Freedonia teams, because of their overall lower level of familiarity with the product. That is, the Freedonia teams were creating some drag on the US teams (through no fault of their own). Also, due to time zone, language, and product owner location issues, the cost to transfer information to the Freedonia teams was much higher.
Problem 3 – Customer Value – Remember, story points or SAFe’s 1-10 scale business value don’t equal customer value. Customer value equals customer value. At some point, someone decided to step way back and look at a list of the features delivered by each set of teams over a period of about 12 months. It looked something like this:
|US Teams||Freedonia Teams|
|Internationalization of product|
Metrics dashboard and widgets
HTML5 screen template builder
Complex international accounting
Third party accounting software sync
Roles and permissions system
|Accounting sync enhancements
Bulk document upload
Task management (later redone by US teams)
Spike - doc management (not implemented)
Spike - additional accounting sync
Without knowing too much about the specific product, it’s pretty obvious that the US teams delivered many more meaty, releasable features, while the Freedonia teams took more of an assisting role extending existing features. There were also several spikes that ended up as lost sunk costs since they didn’t end up as released features, and most notably a feature that had a large number of design flaws and bugs which had to be rewritten by a US team.
This doesn’t imply that the US teams were “better”; but they did have more domain knowledge and the advantage of being collocated with the Product Management, architecture, and UX teams. At the very least, deciding to expand in one location over the other should have taken into account the qualitative track record and the other situational advantages of the US teams.
Change Is Gonna Do a Number on Your Metrics
My background is in science, where experiments are tightly controlled and the goal is, when possible, to change one variable at a time in order to determine its effects on a system. It’s important to realize that although we talk about agile experiments, they are not and will never be scientific experiments. You’re not building the same feature multiple times (I hope), and products, teams, and environments change so frequently that most metrics need to be taken with a shaker of salt.
Off the top of my head, here are a few things that can affect team velocity (and delivered business value), which is one of the most important metrics in agile teams:
- someone new joined the team
- someone left the team
- the Definition of Done was made more stringent
- production issues or other operational distractions
- new Product Owner or ScrumMaster
- vacation or illness of key personnel
- larger organizational changes
- moving offices
- variable sprint length (yes, people do this)
- quality of stories and requirements for a specific feature
- new technologies being introduced
I’m sure there are others, but you get the point. You really aren’t ever comparing apples to apples when you’re measuring your teams.
Baby, Bathwater, Etc.
Having said all this, I love me some metrics. I love pretty graphs and charts, especially when I can draw a trend line through them that shows teams getting faster and better. But I try to be aware of their limitations in describing what’s really happening in my teams and why. Most importantly, I balance them with cultivating a deep knowledge of my teams and products as they exist in the real, non-mathematical world. I talk to the developers and testers. I play with the software and read users’ anecdotal feedback.
I’ll leave you with a short checklist you can use to evaluate your metrics and be appropriately careful about the conclusions you draw from them:
- Certainty – What is the level of certainty of this metric, and do all parties understand the error margin? Or are some parties taking SWAGs (Sophisticated Wild-Ass Guesses) as hard truth?
- Hackability – How likely is it that people are deliberately or subconsciously gaming the metric to please management?
- Variability – What changes to the teams or system are occurring over time that might affect the metric?
- Predictiveness – If you step back and look at the team or system in a qualitative, not quantitative, way, does the metric square with what you’re seeing on the ground?
Please think about this and balance metrics with other observations, and do your best to ensure that more perspectives than just raw statistics make their way up and down the leadership chain. Good luck and happy measuring!