Sustaining Agility

Here are four core steps in a software development lifecycle and six essential practices that any engineering team would benefit from.

Jacob Singh

Published February 18, 2022

sustaining agility

Jacob Singh, who was the CTO for Grofers, is currently part of Sequoia India’s CTO in Residence program.

This post is the last in a four part series on building an Engineering Product and Design (EPD) organization for startups.

In the previous three posts we learned that:

This final post covers how to maintain these practices over time through minimally viable processes.  Although many of the techniques discussed originate from Scrum, this is not a recommendation to “do Scrum” or any particular off-the-shelf framework; it is an invitation to distill the meaning of these practices and experiment on your own to find what works for you, your team, and this moment in time.

I’ll be sharing my recipe of the four core steps in a software development lifecycle and six essential practices that any engineering team would benefit from.

But first, a very crucial question I would not like you to skip:

Why have process at all?

Process is simply a bunch of rules we follow instead of following our innate desires in the moment.  If your desires in the moment are already optimal, you wouldn’t have a process to begin with.  So let’s be clear: a process is not a pleasant thing. No one likes to follow rules.

However, we do want to spend our willpower and energy on the most important things.  Processes help us to achieve that by providing a template of how and when to do certain activities that don’t need to be reconsidered every day, freeing us up to spend our energy on the important things.

There is no more miserable human being than one in whom nothing is habitual but indecision, and for whom the lighting of every cigar, the drinking of every cup, the time of rising and going to bed every day, and the beginning of every bit of work, are subjects of express volitional deliberation. Full half the time of such a man goes to the deciding, or regretting, of matters which ought to be so ingrained in him as practically not to exist for his consciousness at all.

William James, Psychology: Briefer Course.

A well implemented process eventually disappears into our organizational subconscious with time and becomes implicit.  This frees our creativity up for the things that matter.

And while that’s true, process is also a costly thing.  Yes, it takes time to build new habits, and it can annoy some people, but there’s a bigger cost to be wary of: The removal of purpose from work. A machinist in a factory does not see their work the same way a blacksmith does. A McDonald’s employee is not the same thing as a chef.

So while reading the following, please consider Dude’s law, by the late David Hussman, a deep thinker and influential teacher of mine.

david hussman

All process is “How”. All purpose, customer empathy, vision, joy, craftsmanship and camaraderie is “Why”. You can’t divide by zero, you need some “How”, but always ensure it is balanced with a generous helping of “Why”.

With that disclaimer out of the way, on to the four core steps and six practices.

The virtuous cycle: Four core steps

As opposed to a vicious cycle, a virtuous cycle is one in which the outcome of a series of actions makes the next attempt at those actions even better.  In software development, this is typically known as a sprint or an iteration.

It has four essential steps:

  • Plan: Set a goal for the next X hours/days/weeks/months
  • Execute: Work on our goal intensely and collaborate quickly
  • Review: Check how we did against our goal, align stakeholders
  • Retrospect: Inspect how we worked and make a plan to improve next sprint

Do you remember the difference between complex and complicated environments from the first post?  In complex environments, you don’t know what you don’t know. Only through regular introspection of our own behaviors and processes can we make progress towards better behaviors and better processes. So we need to view our own ways of working not as a well factored, easy to rationalize machine, but as a series of experiments and improvements we make over time.

Notice in the above definition, I’m not saying that every sprint cycle needs to result in a single release. I’m not saying you need to write user stories. I’m not saying you need to estimate in story points. Nor am I saying you need to do standups or hold hands and sing Kumbaya in a retrospective.

You can use a Kanban board, you can release 50x a day, you can release once a year and you can do all your check-ins via carrier pigeons if you want.  But each step here is important.  You have to have a reliable “probe – measure – respond” loop.  And every time you complete the loop, you should be getting just a little bit better at it.

This post is not going to go into exactly how to plan, execute, review, or retrospect.  These topics are huge in and of themselves (I have linked to some excellent resources in the footer), but I’d like to you to ask yourself:

  • Do all four steps happen in your current cycle?
  • Do you perform all four consistently?
  • If so, are they perfunctory or actually useful?

 If you answered no to any of the above, read on for some practices which might help you.

The six essential practices

Of course, there are dozens more I’d like to put on this list, but here is my stab at six things you’d have to work very hard to convince me not to do on any team I’m leading:

  1. Checklists and the five whys
  2. Fists to Five
  3. At least one test
  4. Never cry wolf: Clean logs and alerts
  5. Accessible product metrics
  6. Early and often code reviews

Checklists and the Five Whys

Checklists are the primary vehicle for changes.  When we finish a sprint and retrospect on why we missed our goal, it generally falls into one of four categories:

  1. We didn’t estimate well because we didn’t have enough information or experience
  2. Something came up from a third party outside the team we didn’t anticipate
  3. We had production issues due to previous code
  4. We got the work “mostly” done, but not quite finished so it “spilled over”.

These are all normal failures, but none are actually inevitable.

First start by asking the Five whys, and when you come to a hypothesis about the reason, add it to your checklists.  I generally keep three, but you could have more.  Example items are for illustration only, you have to make your own:

Definition of Ready: What we need to have before taking work in the sprint

  • Designs are attached to the story in JIRA for stories with UI
  • Deployment budget is already approved for new services

Definition of Done: When is work considered “done-done” or ready to release

  • All regression tests pass
  • Code is formatted as per guidelines
  • Code has been reviewed by another engineer
  • Deployment script is ready to run with no manual steps.

Sprint planning checklist: What all should we check for before we are finished planning

  • Do any stories in the sprint have external dependencies?  If so, have we made a risk mitigation plan?
  • Do any stories involve database updates? Have we informed the data engineering team?

If you look at these example items, you can imagine a scenario where we screwed up and then put this item in place. That’s the idea. Every retrospective you can add (and delete – yes – please delete) items as we learn more. Some teams have a release checklist, an analysis checklist, etc.

Fists to Five

This is a very simple and often considered silly practice, but I find it underpins all other practices so it’s worth understanding. 

At the end of sprint planning, the whole team shows a number of fingers from fist (0) to five (5) simultaneously. The number indicates their confidence for the team (not the individual) to achieve the goal.  0 = Attempting this plan will probably result in Covid-22, 5 = we will nail this entire plan and probably solve that whole Israel-Palestine problem on the side.

If anyone votes two or below, we keep planning and then vote again.

I know this sounds like a preschool game, but the point is that once everyone votes three, we’re all in.  Everyone, regardless of their role, regardless of previous arguments, understands the plan and is committed to it.  There will be no finger pointing or “I told you so” later.  We are confident not that we will get “our work” done, but the entire team will succeed at its goal.

At least one test

The point of operating in complex environments vs complicated ones is that in complex environments, we are optimizing for learning. Learning without changing is a waste of time. But to change software as we move, not over-architecting, but releasing in small vertical slices, requires that we re-architect or “refactor” our code regularly.  Doing so puts great risk on the system.

As such, it is impossible to practice agile software development without tests. You can get away with manual testing at early stages, but as you scale the number of engineers and number of features in the app, you get “developers to the power of features” complexity.

I’m not saying aim for 100% test coverage and all parts of the testing pyramid, I’m saying make sure you at least have one test which runs automatically and regularly to start – build a culture of developers relying on that test, and they will be inclined to add more as they add complexity.

Never cry wolf: Clean logs and alerts

This seems obvious, but very quickly teams tend to get messy with logs and alerts become false positive heavy. In my last job, I joined, got added to a mailing list and started getting 5k alerts per day. I asked someone how they manage all these alerts and their response was to show me how to make a gmail filter.

Don’t be that team.

Every alert should be taken seriously. Logs should be analyzed on a monthly or quarterly basis and pruned. Think of it like keeping your finances in order – a little discipline goes a long way.

Accessible product metrics

Similar to the one above, if product metrics become confusing, hard to read, or unreliable, you end up with a couple problems:

  1. You don’t know what your users are doing, so you make worse decisions
  2. People stop caring about the impact of features and start focusing on “the requirements”

Point 2 is where software teams go to die.  Don’t do that.

Everyone on the team (not just PMs and analysts) should know exactly how users are using their product / component and should know the impact of all their changes. This is crucial for both motivation and for impact.

Please use tools like CleverTapMixpanelSegment and/or Amplitude for tracking and analyzing user behavior. 

Early and often code reviews and trunk based development

Linus Torvalds (founder of the Linux Kernel) got so frustrated trying to manage his mailing list full of patches being sent in text files that he sat down and banged out git, which went on to become by far the most dominant version control system in history.

One big reason this happened is the site github which pioneered the concept of pull reviews.  PRs are fantastic for the original use case of thousands of developers around the world working independently, on their own schedule, with their own goals and collaborating on various features with no roadmap. They allow for asynchronous and decentralized code reviews and complex branches.

Is this how your team’s code works?  Most likely it does not. Most likely, you’ve got a small team who talks all the time, with shared context, working on less than a dozen repositories, with a shared roadmap.

In this case, it makes much more sense to limit branches (which have the potential for complexity and error) and to avoid formal asynchronous PRs. Here’s how I typically prefer to do this:

Task: Make a social login with Google

Devs: Priya (primary) and Lauren (reviewer)

  • Priya: “Let’s talk about how we should architect this on the white board”.
  • (1hr later)
  • Lauren: “Okay cool, I think we’re on the same page, why don’t you bang out the stubs and get a basic round trip going with no UI, I’ll take a look then.”
  • (2hr later)
  • Priya: “yo, I just put the method signatures in, how’s this look?” (simple screen share)
  • Lauren: “This looks good, but I think that we should use a singleton for config. Try this library…”
  • (1d later)
  • Priya: “Okay, I got this working, can we pair a bit at 2 while I work out the react stuff, I’m kinda new to it”
  • Lauren: “Sure thing…”
  • (etc, etc.)

Now imagine in a week when the whole thing is ready for formal QA. When Priya sends the PR across, Lauren will need to spend almost no time on it. She will be looking only for any errors in the last iteration, and maybe helping to write a test. Contrast this with getting a 20k PR the day before the sprint ends on something you’ve never looked at and finding a ton of issues, but writing “LGTM” because time is running out. Yeah, don’t do that.

If you only take away one thing

There is no right way.

Anyone who tells you otherwise is trying to sell something you don’t need. The most important thing in all of this is discipline. It all starts with discipline and accountability.

If you have, at the very least, the discipline to hold a brief retrospective every week on how things are going and commit to one change to try for the next week, you have the power to create a world class organization.

That’s it. The process of incremental progress. Small changes, made slowly and intentionally, hardened with disciplined practice.

This post is part four of a four post series on leadership in high uncertainty environments. Read part one, “Optimize for the speed of learning, not the speed of doing,” part two, “Be wrong faster“,and part three “Start Less, Finish More” on our blog. 

Some additional resources

I hope you found this series useful.  I’d like to share a few resources I’ve found helpful specifically related to software development practices:

  • Heart of Agile (Alistair Cockburn) – similar to this post but better, distilling the core values.
  • Retromat – Great guide on running innovative retrospectives
  • Atlassian playbooks – no nonsense clear guide to typical Scrum practices and more
  • Mountain Goat – Fantastic insights and guides to running practices from Mike Cohn.

And some people who’ve influenced me greatly and I respect (amongst MANY MANY others):

  • Martin Fowler: Architect and software philosopher.  Deep look at many topics related to refactoring and testing in particular.
  • Dave Thomas: Hilarious and insightful author of the Pragmatic Programmer.  TDD legend who doesn’t write tests.
  • Jez Humble: Father (or at least godfather) of SRE and DevOps.  Author of the state of devops report. 
  • John Cutler: Product and EPD management thinker, evangelist for Amplitude.
  • Jeff Patton: Product management guru / humourist.