How to tackle technical debt

comments 2
Photo by Ehud Neuhaus on Unsplash.

Mining for debt

Recently on Slack one of my colleagues shared this comic from Monkey User.

I thought it was a great metaphor. 

The world of software moves extremely fast. Inside a given company the codebase is constantly changing with the addition of new features. Outside the company is an entire world of open source software development, shipping updates to all of the libraries, frameworks and databases that are being used.

With time, piling on ever more code creates moments where the team needs to stop and take a step back. They will need to think of a different way of moving forward that is more maintainable, controlled and less prone to bugs. 

Even if the internal codebase changes extremely slowly, external dependencies are always releasing new versions, requiring the team to upgrade them before they reach end of life. This can create further technical debt as APIs deprecate or breaking changes are introduced.

Quite often engineers struggle to make their case for prioritizing tech debt work. Why?

  • Lack of empowerment: They might not think it is their place to speak up about it; instead they expect that more senior people will dictate when to take stock, refactor, upgrade libraries or storage or so on.
  • Inability to persuade: They might not be able to construct an argument to spend time on it in a way that non-technical people that dictate the work streams will understand.
  • Apathy: They may have already lost hope that Product or any higher-ups will listen to them, and therefore silently let codebase or system degrade. “Features are more important,” they say. “They’ll never listen to us.”

All of these situations are a shame. They’re also not acceptable. But they’re fixable. Let’s have a look at them in turn.

It’s someone else’s problem

As an engineer, if you think that it is someone else’s problem to point out that there is a technical debt issue beginning to get out of hand, then – and I’m sorry to say it – you’re wrong. There are a number of traits that an excellent engineer will have, and a pride for their work and keen interest in the future state of the code are two of them.

Those committing code will know best about how the codebase is currently written and organized. They will be the first to begin to notice the bad smells. As they realize that continual dirty hacks are the only way of moving forward, it’s their duty to raise the flag.

The creation of technical debt is inevitable; as inevitable as the slow erosion of a chalk and lime coast by lapping waves, or the weathering of a old building. We should be comfortable with the fact that it is going to happen, and is likely happening right now, and we should be especially comfortable with alerting others when it starts to feel bad. We should fix the broken roof tiles before they become a leak.

Talk to your team about it. Talk to the other engineers that work on that codebase. Build consensus that there is a problem and that something should be done about it. Don’t wait for someone else to point it out. It is as much your responsibility as it is everyone else’s.


Constructing the argument succinctly

Now that a technical debt problem has been identified, we’ll need to think about how best to argue for getting the time and space to fix it. 

Many engineering departments are building a product that makes the company money by selling it to external users. Some service internal users. I work in SaaS, and I would say that the expectations of our users are:

  • That our applications are available no matter the time of day or day of week.
  • That we’ll be continually adding new and innovative features to our products.

These expectations are pretty well understood by everyone in the business, regardless of whether they work in commercial, engineering, product, marketing, or wherever. That’s a good thing, because if you use one or both of them to construct your arguments about tackling particular pieces of technical debt, then it’s hard to be ignored.

Rephrasing the above two bullet points with a focus on thinking about technical debt:

  • The platform should be acceptably fast, correct (enough) and should have a very low likelihood of going catastrophically wrong with no prior warning. It is a very bad thing for business when this happens.
  • The codebase should be easy and efficient to work in as we continually add more stuff to it. If we can’t maintain a reasonable speed of adding new stuff, we begin to lose out to competitors, and the rest of the business wonders why we are getting slower, inviting lots of fruitless arguments about developer productivity.

We need to tie our arguments to these reasons. If engineers argue for doing technical debt work in a way that doesn’t make sense to the non-technical layperson, then it’s very hard to them to win hearts and minds in the business. They’ll wonder what they’re up to rather than shipping features.

Technical debt shouldn’t be fixed because it’s “obvious” or “the code could be better” or “it’s annoying” or a particular framework is now “the latest thing”. Those reasons may be entirely true, but the argument needs work.

Let’s have a look at some different scenarios.

  • “We need to upgrade Postgres.” OK, I totally understand. But we need to think of a better way of phrasing this to the non-technical person. What does the upgrade bring us? Is it some critical security patches? Does it have a positive effect on the speed at which the application is going to work? Does it have new features in the query language that will allow us to query the data in a new or better way?
  • “We need to refactor AnalysisPipeline.scala!” Nobody has any idea what AnalysisPipeline.scala does. Probably only a few in the department even know. Does it lack tests and is there causing a lot of bugs in written documents that are challenging to fix once they’re committed to storage? Is the class such a big monolithic mess that it is too hard to add new features at the rate that the business expects? Is it taking five times as long to work on as it would if it was split out into multiple classes, methods, modules or services?
  • “This service needs a rewrite.” Sure, it probably does. But what’s the real reason? Is it stuck on a framework that is now years beyond end of life and nobody knows how it works? Is it an area of the code that is going to have a lot of changes in the coming year, but the risk of it breaking is too high to keep adding to it quickly? Will the speed or stability of this particular service be much better if instead of working with it we just start again instead, taking advantage of the knowledge and technology that we have now?

Getting better at justifying why technical debt needs to be fixed isn’t just a skill that helps you get the clearance of your team lead or product owner to start working on it: it can also help you make up your own mind as to whether something is a real long term issue for the coming year or just a short term frustration for the current sprint.

Nobody will listen

If nobody will listen to your arguments about addressing technical debt, then first check that you’re constructing those arguments properly, as mentioned in the sections above. You are? Ace.

If a common pushback is that there are too many features queued up to build, then there may be an underlying worry from your product manager or line manager that fixing the technical debt will be a slippery slope that goes on forever and destroys productivity. 

One answer to this is to try your best to estimate the effort that it will take to fix it, and, better still, break that down into phases or milestones that can be incrementally worked on.

A tactic that works well to please both Product and Engineering is to balance periods of feature delivery with periods of tidy up and refactoring. In It Doesn’t Have To Be Crazy At Work, the creators of Basecamp pitch for periods of 6 weeks building followed by 2 weeks paying down technical debt. 

At Brandwatch we have employed similar tactics with a period of a team delivering a big ticket feature being followed by a fallow period where the team prioritizes and executes their most pressing technical debt concerns, such as refactoring, improving monitoring and writing documentation. The bonus to this way of doing things is it gives your product managers and designers time to ruminate on the next big thing.

Sometimes, however, there is a massive elephant in the room: a technical debt project so big that nobody wanted to talk about it, yet the swell has grown to the point where the wave is going to break – either with the codebase continuing to become a complete mess, or the platform becoming increasingly slow and unstable.

In this situation, honesty and transparency is the best policy. It is the job of the leaders in Engineering to elevate a large technical debt problem into a separate work stream in order to give it the recognition, space, and resources that it needs; typically a dedicated team over a longer period of time. 

In doing so, the principles above are just as valid: raise the flag, gain consensus, plot an approach, and make the problem understandable to the layperson. Make it clear that the future is brighter by doing this work.

Convince them that it would be silly not to do it because the future of the business depends on it. Then sort it out.

In summary

Remember that if you are an engineer, it’s your job to raise technical debt issues as early as possible, and to make sure that you are able to explain their impact in succinct and meaningful ways. Managers: it’s your job to listen and to create the space for the issues to get worked on.

Building a successful SaaS business requires a stable application and the ability to work quickly and efficiently: both of these things are impacted severely by technical debt, so don’t let it build up. Pay it down.

Switching to a remote manager

Leave a comment
Photo by Marius Christensen on Unsplash.

git merge

In the last four weeks, I’ve made a transition from having my line manager based in the same office, which has been a situation I’ve been used to for all of my professional life, to having them be remote. In my case this has happened because of the merger of Brandwatch and Crimson Hexagon. The CTO of the combined company is now based in Boston, and I’m in Brighton, England.

I have a VP Engineering role, which, silly job title aside, means that I have a division of the Engineering department reporting to me, focussed around building our Analytics and Audiences applications. We have other divisions of Engineering focussed around our infrastructure and compute, our data platform and the Vizia product. At the time of writing, I have 38 people in my division.

I’ve been fortunate to have always had the CTO in the same office over the recent years. As the company has continued to grow at a fairly fast pace, I’ve had local support. Ideas, thoughts, gripes: they’ve been there in the same place or on the same timezone.

There have been a number of benefits to having the leader of the department co-located:

  • My staff have been able to get to know him easily. We’re all just around most days. This makes them feel connected all of the way up the chain with minimal effort.
  • The general narrative of what’s going on, such as happiness, morale, stress levels, has been observable by both myself and my manager.
  • If there’s ever a crisis – of people or of production systems – then, most of the time, 35 steps is all I’ve needed to get some counsel or a second opinion.

However, things are now quite different.

After our companies merged, the CTO role was given to the Engineering department leader in the other company, putting myself in an interesting position:

  • I now have a manager who is not in the same physical location, so I lose out on all of the informal in-person contact that I had before.
  • My manager is now 5 hours behind me, meaning I have less times of the day in which to speak to him.
  • The new CTO doesn’t initially know me or any of my people; only what we’re responsible for. The rest is a black box.

Letters across the pond

Over the last few weeks, as was expected by the merger, we’ve both been very busy, both with logistics and with traveling. Our weekly hourly 1 to 1s often end before we’ve managed to cover everything off, and then we’re sliding into another meeting before clearing all items on our agenda, which has been frustrating.

Because these weekly catch ups didn’t seem like enough time, and because email chains typically devolve into stasis, I started writing a weekly digest which I send each Friday afternoon. The idea was that I could take some time to properly summarize everything that was going on in my world and flag anything that I needed help with. 

This has been working really well. 

I write it in a Google Doc, which means that a lot of the smaller items can get covered off asynchronously via the comments. Larger items that are worth spending some more time on become the focus of our conversation in our 1 to 1, and that more precious face to face time is spent on the meat of the main issues, rather than on the periphery. Both of us enjoy written communication too, so this works very well. It also gives us an ideal chance to poke fun at our Britishisms and Americanisms.

Here’s roughly what I cover in the weekly document. It takes me about 30 minutes to write:

  • Any interesting developments in any of the ongoing work streams, such as new links to demos, updates on estimates, or anything particularly good or bad that’s unfolding.
  • The latest on what’s next in the project pipeline from conversations with Product.
  • The general feel within the teams, such as happiness and morale. Are any of them overworked, or, on the contrary, spinning the wheels while waiting for a decision on the next thing? Are the teams right sized and is this looking true for the coming months?
  • An in-depth look at anything that’s front of mind right now, such as hiring, or thoughts about backend architecture and scaling, or contemplations over cool ideas we could pitch to the Product team.
  • A list of “documents of interest”, such as designs for upcoming features or architecture, or the fortnightly product and engineering updates that get sent out. I don’t expect any of these to be read in detail, but they’re there to satisfy any curiosity.
  • Occasionally a light sprinkling of GIFs. Because life’s too short to not use that one of Kermit furiously slapping the typewriter.
Yes, that one.

Soap opera rather than novel

I’ve been trying to open up my black box as much as possible to give my new manager a view into the decisions that I make on a day to day, and to allow my thought processes to be observed and discussed. However, the style of writing was challenging at first: how do I make the digest interesting and not a labour?

Given that my new manager was taking the role of the reader and I was the author, I didn’t really know where to start or how to collate my thoughts. But then I came to realize that it wasn’t my job to be the creator of a novel, thoroughly documenting everything that happened. Instead I needed to take the position of a screenwriter of a soap opera: an inventor of a regular rolling feed of narrative that is easy to soak in, letting the reader learn the characters and plot lines gradually by osmosis.

Tuning into The Wire halfway during Season 3 can leave you feeling a little lost and overwhelmed by the detail, but switching on Eastenders a couple of times during the week allows you to (assuming you want to…) follow along pretty easily. I decided to be more Eastenders, except with less arguing and fighting in the Queen Vic.

I scatter the document with parts prefixed with “Your thoughts please…” where I’d like to get some input. We usually chat on the comments around these parts.

Getting comfortable with async await

Although I thought that the experience may be more jarring at first, I think that I am getting better with a predominantly asynchronous relationship. 

There can be some benefits to having a remote manager, after all:

  • Because our face to face time is more valuable, we prepare more for when we do talk, meaning that conversations are rewarding.
  • We do a lot of written communication, which allows us to think more deeply about what we’re saying and how we’re saying it before presenting it to one another.
  • We have to continually operate from a place of trust, since we cannot easily insert ourselves into each other’s worlds to observe and come to our own conclusions. I like this.
  • I feel like I have to step up and represent my people more, in terms of my personal accountability and in promoting their cause, which can only be a good thing.
  • The introduction of even more extreme timezone differences across the now global Engineering department means we need to get better at being a company that supports flexible remote working, fast. I would like to think that being forced to break our predominantly European timezone habits will make it easier for us, in time, to hire people remotely all over the world.

But, still, a quick chat in the kitchen is nice, and is missed.

Why do we have process anyway?

Leave a comment
Photo by Med Badr Chemmaoui on Unsplash.

Have you ever had a colleague complain about there being too much process? Or pointed out, powerlessly, that an existing process is pointless? I would posit that you have. In fact, it may have been you doing the complaining. 

I’ve certainly been there. 

I’ve been there in times where my jaw has dropped when opening up the description of the deployment process for an older part of the system, and in times where I’ve torn my hair out as a simple approval for a flight has gone round and round in endless repeating circles.

“Surely it must be easier than this,” I hear you cry and hear myself cry.

Yes, processes can be extremely annoying. But they don’t have to all be annoying. And, in fact, they should do more help than harm.

What is a process anyway?

Process is a very abstract term: in the dictionary, which I have just opened, it is defined as a series of actions or steps taken in order to achieve a particular end. In business in general, this could be the process in order to get some budget signed off or to hire a new person. 

If you’re an engineer reading this post, then the word process may make you think about the steps that your JIRA tickets take from being created to being complete, or the way that deployments of your live application are done, or even the forms you need to fill in for your annual review.

Regardless of whatever the end is that the process is there for, the definition of that process is meant to make that end result more replicable, predictable and consistent regardless of the individual(s) that execute it. 

Process as code

You could think of a process as the code that the company has written about how to do a particular thing. Everyone else following that process is executing it to achieve the same desired, consistent result.

But, as all programmers know, code evolves over time. Old code gets stale and misses out on the latest updates and features. The original authors of that code have also since learned more and become better programmers since they wrote it.

Have you ever gone back to code you wrote years ago and wondered what on earth you were doing at the time? The same is true of processes. Companies can grow, shrink, mature or completely reform: you can’t expect the processes to stay the same as a result.

Processes, like code, shouldn’t be set in stone. They should be revisited, tweaked, refactored, rewritten and deleted. 

Arguing against processes

If processes don’t change despite being inefficient, then you’ll need to work on getting them changed. But why do processes get left alone and not continually updated?

Sometimes a business won’t want to change processes because those in charge are lazy: they’d rather not have to invest any effort into something that works well enough, regardless of how inefficient it is in practice. Tut.

Additionally, a process can often become orphaned because the creation of that process is to ensure that knowledge can be handed over (e.g. the process of deploying the live application) or to ensure the consistency expected by a particular individual (e.g. the annual review process as defined by the head of HR). As time passes, the relevance of those processes fades and need updating. But those who defined them may no longer care as deeply about them, since their attention is now elsewhere.

Still, in the same way that the management of a business would expect their employees to be efficient and diligent with their work, that same management should also expect to continually update the collective ways of working that they have codified in order to demonstrate that they are being efficient and diligent also. 

A process should fundamentally serve those that are continually applying it. If those closest to the ground want to make it better, then they should absolutely be allowed to change that process for themselves. It’s likely they know more than the original authors about the latest and greatest way to get that work done.

Let chaos reign, rein in chaos

Still, even the best processes don’t have their place. There is also a good argument for process to not exist.

When a team is embarking on innovative or unknown work, process seems only get in the way, constraining ways of working and thinking. Chaos often breeds innovation, so why should we stifle it? But, on the other hand, if there’s no process, aren’t we all going to hurtle into the sea in a fiery ball?

There’s a neat quote that suggests a way of dealing with these situations:

Let chaos reign, then rein in chaos.

Andy Grove

Often during innovative or chaotic periods, it is best to let go of wanting to control things and let them unfold in a natural way, even if it seems like bedlam.

Chaos often gives birth to a new way of working; an emergent practice which can be the new best way of tackling a problem. After people converge on this new way naturally, then the emergent practice can be codified into the new process. Chaos is transitory, but necessary. Things will stabilize eventually.

A practical example is running R&D projects. Often truly innovative developments need space and lack of constraints for those that are working on them to explore all aspects of the problem. Too many meetings, checkpoints, reporting upwards and real-world engineering constraints can kill creativity.  

Although it can be deeply uncomfortable, especially for directive-driven managers, to think that an investment into R&D is “not being managed”, letting go of the situation and trusting staff to do things the way that they want to can, paradoxically, get better results.

As the R&D project begins to move into a prototype and requires more input from designers and engineers, then chaos can be reined in. 

But not before.

So, in summary

Process is good when it creates predictable results. However, like code, processes must be continually updated and revised to stay relevant, and the updating should be done by those closest to the process: the staff actually executing it. 

Creative and innovative projects can benefit from having no process to let chaos become an emergent behavior which will eventually become the new way of doing things. Let chaos reign, then rein in chaos.

Now, according to my own process, this is where I hit “publish”.