Functional Zero Bugs

A perennial activity in fast-growing companies is managing defects. Equivalently, fixing bugs. Bugs get reported, then routed, then triaged (or triaged, then routed), then scheduled, then at some point far in the future of the bug’s reported occurrence, it gets fixed. Maybe. Which is a lot of cognitive activity by a lot of people to rework something.

“Bug free” is almost surely unreachable, shipping new code virtually guarantees new bugs. What’s needed is a process where bugs can be fixed very quickly, which is the notion behind Functional Zero.

Backstory

Functional Zero apparently emerged in the social services sector, specifically, homelessness management. The following links provide more context:

We’d like bugs to be rare and brief. How do we make this happen? Consider the following:

  • Brief: Fix bugs immediately after they are reported. Immediately really does mean right away. Whatever feature story is currently in progress, put a pin in it and switch context to fixing the bug.

  • Rare: Reduce Change Failure Rate. Easier said than done.

We need to consider both brief and rare in some detail, but let’s check out a couple of teams which operate with a Functional Zero mindset.

Case studies

Both of the following teams implemented and practice Functional Zero. For each team, key characteristics of the team and its workflow are listed. Both are teams in a technology-driven health care provider company.

Team 1

Team 1’s remit included: insurance eligibility, revenue cycle management, and data engineering support for business intelligence. Internal stakeholders included the CEO, who depended on the team for supplying accurate revenue numbers for fund raising.

  • All engineers had some form of direct customer facing responsibility.
  • Stakeholders were both internal (eligibility, BI) and external (RCM).
  • Code base was mostly greenfield development by the team, fully tested (> 95%), fully linted from the start. Mostly Python and Ruby services, with some responsibility in a Rails monolith.
  • Sentry errors handled immediately.
  • “You ship it you fix it.”

The engineering maturity of this team, coupled with mission clarity outside the purview of normal product operations allowed this team to mostly self-manage. They achieved very high trust with their stakeholders (which included the CEO), and were able to make many technical decisions unilaterally. The team was even able to function without a Product Manager when the then current PM entered graduate school.

Team 2

Team 2’s primary mission was internal tooling, with platform support as an important secondary mission.

  • Internal tools supporting logistics team, feature support.
  • Rails monolith
  • Mostly internal stakeholders
  • All engineers had at least some direct customer interaction, even if intermittently depending on project.
  • Bug backlog 5-7 week over week, mostly from long running issues Product would neither prioritize nor close.

In contrast to Team 1, Team 2 was oriented more along traditional product lines. In this case, the team rapidly paid down the bug backlog until a number of very long running issues were exposed. As mentioned, Product was unwilling to either dedicate resources to fixing these bugs, or closing them as “Won’t Fix.” Accordingly, each of these bugs was revisited every sprint during backlog review.

Functional Zero Bugs

Having examined two teams which implemented Functional Zero, let’s turn to implementing for ourself. The first step is understanding that Functional Zero is a process, and it’s permanent. Remove the Functional Zero process and the bug backlog will start to grow. Having committed to the process, here’s one route to success:

  1. Establish a measurement cadence which is longer than the average lead time of bugs. Daily is too short, biweekly probably too long, weekly should be about right.
  2. Put all the existing bugs into a bug backlog, or label them as backlog, or find some way of denoting these as pre-existing bugs.
  3. Pay down new bugs within the the measurement time. That is, the lead time for bugs must be less than the measurement cadence.
  4. Pay down backlog or pre-existing bugs at a consistent rate, for example, fixing 1 or 2 bugs from the backlog every measurement period.
  5. Measure and record on the predetermined cadence. For example, Fridays close of busines, or Wednesday noon. The day and time don’t matter as much ensuring the bug count is recorded every period.
  6. Given incoming bugs can be fixed within the bounds of the measurement period, over time, the number of open bugs will trend to 0. It’s inexorable, a mathematical fact.

Step 3 is worth repeating: when a bug is reported, fix it right away.

Here’s a cartoon to help visualize Functional Zero:

This illustrates the “brief” axpect of Functional Zero. Putting it into practice will address the “rare” aspect.

Functional Zero in practice

Implementing Functional Zero ensures programmers have an opportunity to learn from their mistakes, hopefully not making the same mistakes in the future. The benefits compound: the faster bugs are fixed, the faster the benefits accrue, which further reduce the number of bugs.,

How can this work?

First and most necessary, absolute, unwavering commitment from leadership to allow teams to implement the Functional Zero process. Half measures will not work. Functional Zero is All In Or Not In.

For the teams responsible for fixing defects, the following with help ensure success:

  1. A very clear notion of what the involved code is attempting to accomplish, from a business perspective.
  2. Technical expertise to understand why the bug is occurring, and fix it as far upstream as possible. Ideally and if required, enough refactoring to design the bug out of existence. Make it so that bug is impossible going forward.

Once a team has committed to Functional Zero, feature delivery will probably slow down at the start. The team and company gains once the process is in place and working. Having really good test coverage helps tremendously; Functional Zero gets easier as test coverage increases.

Benefits

One of the great things about this system is that it is a system, a process which once put into place, need not be revisited. Once the Functional Zero process is in place, benefits include:

  1. Fixing bugs immediately after being reported is an excellent forcing function to encourage people to write better code. Everybody hates being interrupted.
  2. There is no bug backlog to review; reduces the number of humans in the loop.
  3. Bugs do not need to be assigned severity levels, further reducing the number of required humans in the loop.
  4. Fewer, shorter meetings
  5. Customers are happy, possibly raising NPS score.
  6. Developers are happy, possibly raising eNPS score.

These benefits compound over time.

Challenges

Functional Zero may be a large initial investment which slows down feature delivery. It may not be appropriate for many business cases for which exist a high tolerance level for bugs.

When facing existential crises around product market fit, Functional Zero is probably not appropriate.

In an existing code base riddled with technical debt, implementing Functional Zero will be very challenging. Bugs may be very old, and may be triggered by previously unknown side effects. Test coverage may be low, further increasing the challenge.

The process has to be adopted in whole to accrue the benefits. Partial adoption will not remove humans from the loop. As above, Function Zero is All In Or Not In. If it can’t be fully adopted, it may not be worth attempting.