Winning the Numbers Game

Nate Buchanan
February 8, 2024

The Path #4

I’ve been in consulting long enough to have learned a few self-evident truths about the corporate world:

  1. Nobody wants to attend meetings on Friday afternoon
  2. People will occasionally talk on mute no matter what
  3. Proving anything is difficult

We could spend entire editions of this newsletter unpacking the first two - in fact, at Pathfindr we have our top people working on technology to prevent meetings from being scheduled after noon on Friday and we have a “talking on mute” jar that offenders must contribute to after each incident. It’s going to fund our EOY get-together.

But that third one is what we’re focusing on this week.

It’s hard to get a roomful of executives to definitively agree on something, particularly when it comes to things like savings from a particular tool implementation or transformation program. There are often too many stakeholders with competing agendas, too many sources of information that can contradict each other, and lots of ins, outs, and what-have-yous.

For example, I spent a lot of time early in my career managing teams tasked with automating test cases as a way of accelerating the SDLC (Software Development Lifecycle). We would dutifully review the manual test suite to identify the best test cases to automate, spend time automating them, and execute them in place of the manual tests to speed up our next regression run. In theory, we should be able to calculate a time saving - and, if you know the going hourly rate for a manual tester, a cost saving - by comparing the time it takes to execute the test manually against the automated run. Pretty simple, right?

One would think.

In reality, it can be incredibly difficult for the benefits from something like test automation to be felt (or even measured) concretely. There could be a wide range of reasons for this, but here are a few:

  1. People don’t trust the time savings. Although it’s relatively easy to measure the speed of an automated test run, manual testers rarely “punch” in or out as they execute tests, so it’s hard to compare the automation against a baseline.
  2. People don’t trust the tests. If you’re really skilled at automation, it can seem magical, like something too good to be true. Leaders start getting concerned that coverage slips when a test run that previously took a week was cut down to 1 day, then a few hours. Worries about quality issues abound, however unfounded they may be.
  3. There’s no consistent system of record. You might be able to measure benefits at the individual team level, provided everyone on the team uses their test management system the same way. But in larger enterprises, it’s very common for different teams to use the same system in different ways, or sometimes to use a different system entirely. This makes it very difficult to aggregate data on benefits and communicate it in a way that is consistent at the enterprise level - something that everyone can agree on.

You can massage numbers until you’re blue in the face, but ultimately the real benefit from test automation needs to be felt at the stakeholder level. If you’re a Product Owner or Delivery Manager able to run tests faster, you can either A) execute more tests in the same time or with the same size team, to give yourself more confidence in the quality of the solution or B) execute the same number of tests with fewer people, so that they can work on something else. If you do either with no drop in quality, then you can KNOW you’ve achieved some kind of benefit. And if you look hard enough, you can find the data to quantify it.

These lessons that I learned the hard way many years ago have new relevance today, when many people seem to “know” (as opposed to KNOW) that LLM-powered applications are beneficial but hard numbers on exactly how are difficult to come by. Here are three different ways of thinking about quantifying benefits from AI.

  1. Unit-Based - these are improvements in processes that can be measured in discrete units. A factory making widgets is one example, but perhaps a more interesting one is that of a call centre for customer support. If you implement an AI solution that helps the call centre agents handle more calls and to leave customers happier with their service on average, it’s easy to quantify the benefit.
  2. Rubric-Based - these are processes that involve units of work that are difficult to measure against each other because they vary in size and complexity. A development team writing code is a good example of this. Because each new application or change request could involve 30 lines of code or 30,000, it’s difficult to do an apples-to-apples comparison of two teams (one using a tool like GitHub CoPilot and one coding manually) if they are not working on the exact same project. This is where an external rubric - such as story points or t-shirt sizing - can be helpful to adjust metrics for different types of projects so they can be compared against each other, similar to a pound-for-pound ranking in boxing.
  3. Experience-Based - these are processes that are very difficult to quantify because they vary so widely in how different people execute them. Examples of these include common consulting tasks such as creating decks, analyzing spreadsheets, or preparing for a sales conversation. Because every engagement, client, and opportunity is different, it’s almost impossible to put hard numbers against savings from implementing AI to help with these activities. However, asking practitioners whether they feel more productive, and comparing that with slightly more objective measures of their performance - such as customer satisfaction or general “throughput” - can be useful in putting a bit more rigour around it.

As AI gets more user-friendly, and as teams’ workflows adjust to incorporate it more fully, it will get increasingly easier to quantify benefit because it will be integrated. You won’t need to pull data out of a different system to see that before you updated your processes to replace manual bottlenecks with AI solutions, your team produced X whereas now they produce Y. Until then, I encourage you to take a big-picture view of benefits, and don’t be afraid to test hypotheses, experiment, and hitch your wagon to numbers you believe in.

Discover more articles

Let’s help you run a better business.

Book a consultation with us. It only takes a minute!