Algorithms to make you more effective

comments 2
Growth
Shakey the Robot.
Shakey the Robot. (Source: Computer History Museum)

Your focus and how to protect it

Earlier on in my career, I was under the impression that success was strongly tied to saying yes.

That quick favor? No problem. That interesting idea that someone just mentioned to me in the kitchen? I should probably prototype that. Jumping on that call to a client? Why not. 

Always saying yes is how I thought I could be most helpful and how I could open myself up to the most opportunity. I mean, there’s even a very funny book about it.

Well, unfortunately saying yes all of the time, even with good intention and kindness, is a path towards being extremely nice but ultimately ineffective. 
Being effective, on the other hand, involves two strands of management of the self:

  1. Being able to organize my time and my mind so that I have the best chance of being as productive as possible.
  2. Saying yes to the most impactful pieces of work and politely refusing those that are not.

We’ll get on to exploring the types of things that you should be spending your time on shortly, but first, let’s zoom in a little more into how you spend your time.

Taking inspiration from algorithms

You, just like everyone else at your company, and like everyone else in your industry, partners and competitors alike, have exactly the same amount of time in the day. What really matters is how best you use it. People who understand how they best function and subsequently arrange their day around their productivity traits can be dramatically more productive than those that do not. 

To get some inspiration, let’s look at computers: specifically CPUs.

Context switching

Your operating system is doing a great number of different things at once. 
All of the applications that you are running execute inside many running processes. Now, just out of curiosity, what’s my laptop doing right now? As I open up Activity Monitor on my MacBook Pro as I write this sentence, this is what it looks like:

Activity Monitor.
Processes running on my laptop at the time of writing.

As you can see from the bottom of the window, there are hundreds of different processes and thousands of active threads. However, my laptop only has 8 CPU cores on which these processes can be executed. All of these processes are able to operate on a handful of cores by a neat trick: context switching. 

In order to give us – the slow humans – the illusion that the computer is doing many things in parallel, all of these tasks rapidly switch between one another, executing the next bit of work, then switching to the next process, executing for a bit, then switching, and so on. This switching happens extremely quickly, often at the rate of hundreds or thousands of times per second. To us mere mammals, everything appears to be happening at once.

However, context switching is expensive

Instead of being able to execute instructions continuously in one process, all of this multitasking requires administration: stopping a process involves saving state and then loading in new state for the new process. The CPU isn’t doing anything useful while this context switching is happening. The less context switching that occurs, the more instructions that are executed on the CPU cores. 

Do you see where this is going?

The first step of protecting your focus is to realize that context switching frequently between your own tasks is involves expending effort on administration but not impactful output. The longer that you can spend working on one task continuously, the more effective that you will be in aggregate. 

Aside from being interrupted by someone, you can manage your own environment to ensure that you don’t context switch excessively:

  • Close all other windows and tabs while you’re working on something. Resist the temptation to peek at your notifications, or just disable them.
  • Block out periods of deep work in your calendar where you declare yourself uninterruptible. There is the concept of offline hours which we’ve tried at various times in the office.
  • Drive your focus away from reactive messaging. Batch process emails, DMs and chats at specified times of the day. Again, you’re just like a computer: batch processing is often more efficient than serial execution.
  • Don’t start on a new task until you’ve finished the one that you’re doing. Although having multiple tasks on the go gives the illusion of productivity through busywork, just remember that CPU loading and saving state again, and again, and again. Inefficient, inefficient, inefficient.

Some context switching isn’t bad. Often a manager’s job relies on context switching between many different issues. But limiting it increases throughput of individual tasks.

So we’ve looked at CPUs to direct out thinking about how to focus better on tasks you’re working on. But can computing teach us anything about saying no to work that isn’t impactful?

I think that it can.

Pruning search trees

Search is a classic computer science problem. I’m not talking about Internet search engines here, though – I’m thinking about pathfinding. Given two places on a map, how do you decide the best route from A to B?

Let’s get our imaginations working.

Pretend that you are in London, standing at Trafalgar Square. You need to get to Regent’s Park. 

You have absolutely no idea how to get there and you have nothing at your disposal to help you out: no map, no people to ask, and no phone. The only way that you can probe your way to Regent’s Park is to effectively guess by walking in random directions for an unbounded amount of time, and it doesn’t take a stretch of the imagination to predict that you’re going to be quite lost quite quickly. 

A map of London.
Where do I go? (Source: Google Maps)

That’s not going to work.

Now imagine this time that you are standing at Trafalgar Square looking at a map. This time you can see the destination on the map, and you put your finger on it. That’s your first heuristic: the straight line measurement between your current location and the destination. 

But which way should you go? There are sixty thousand roads within the six square miles of central London, so plotting out the shortest route ahead of time is a massive, complex search space. Iterating through all of the possibilities will leave you standing here for weeks. 

Instead, you decide to use your heuristic: you walk down the first road that’s roughly in the straight-line direction of your destination, walk to the next junction, and then look at the best direction to go based on your new orientation towards your destination. You repeat this process, and eventually you get there.

Neat! This is something that we’ve been making computers do for over 60 years. 

The A* search algorithm works in a similar way. It is a best-first search that intends to optimize the route taken from point A to point B to be the one with the lowest cost calculated via a heuristic, preventing the need to exhaustively explore each potential path ahead of time. 

In the London route finding scenario above, the smallest cost is the least amount of walking required between the starting point and the destination. Typically the A* search algorithm performs this “walk” over a graph data structure.

A weighted graph.
A weighed graph. (Source: wikimedia.org)

Routes between places (A-E in the diagram) are represented as weighted edges on a graph, where the weights (the numbers in the diagram) represent the distance between those places. At each step of the algorithm, like in the scenario walking through London above, the algorithm expands the next possible steps and applies a heuristic – typically adding together the weights – to pick the one that costs the least. 

Repeated application of this heuristic ensures a speedy arrival at the destination.

An animation of A* search.
A* search in action. Note how the path is found without needing to explore the whole search space. (Source: imgur.com)

But how does this algorithm apply to the way that you manage your focus? I believe there are two themes which are related:

  • You can define a heuristic to prove that what you are working on is the most impactful task at any given time.
  • Then you can apply your heuristic to guide your choice of work, dramatically pruning your mental search space by focussing on the most important thing. You can say no to everything else with good reason.

Defining your own heuristic

I can’t predict what the most important thing that you should be working on right now. What is it? 

At a very high level, as a manager, I typically follow the formula that Andy Grove stated in High Output Management:

A manager’s output = the output of their organization + the output of the neighboring organizations under their influence. 

This formula allows me to prioritize the numerous things I could be working on each day, especially on days where I have free time and the luxury to choose activities. Instead of getting overwhelmed, I can prune my own search space accordingly by making sure that what I am doing is making an impact to the largest possible amount of people.

For example, if you have an important product launch coming up, then de-risking that launch as soon as possible may be your primary heuristic. If it’s growing your organization after receiving funding, then it’s that. If you’re an individual contributor working on the architecture of your application, then your heuristic could be continually improving the speed to serve data, or designing a plan to scale that architecture over the coming years. 

Optimize towards taking the shortest path to achieving that goal at all times. Yes, you can be A* search!

Shakey the Robot.
Shakey the Robot: invented by the researchers that also invented A* search. Yes, this is you now. (Source: Wikipedia).

The best part about defining your own heuristic for choosing the work that you should be doing is that you have a bulletproof reason for how you are prioritizing your time. 

Refusing that meeting where your attendance is not completely necessary, or opting out of other periphery work is no longer a matter of letting anyone down personally: your reasons are justifiable because you are laser focussed on a goal that will be maximally impactful for the company.

So what can you do?

Think about how to create the best conditions for you to work. Prevent those context switches as much as you can. Then be smart about what you are working on throughout the week: what is your heuristic that guides you towards your goal? Prune everything else away.

The case for building from source

Leave a comment
Growth
Photo by Oscar Nord on Unsplash.

Previously I wrote about the inflection point that a particular part of your architecture will reach before you need to roll your own specialized piece of infrastructure. The summary of that article was that you probably won’t end up doing that unless you reach product market fit and then have a real success that drives scale.

What this means for most of us is that we’ll be reusing existing software for the bulk of our application infrastructure. This is absolutely fine. There is a wealth of fantastic open source projects out there. Observed from my origins as a backend engineer, I know that there’s pretty much always going to be a great storage system that does what I want it to do.

At Brandwatch we run large Apache Solr and HBase clusters in production, and we are extremely grateful for the open source community that has enabled us to use them to support our business.

However, as good as these projects are, occasionally there are bugs. Sometimes there are big bugs. If they occur, not only are all of your customers dealing with unexpected downtime, but you might have absolutely no idea what the problem is, or how to fix it.

Let’s explore some advantages of building open source projects that comprise your core infrastructure from source. But first, let’s consider our mindset when looking at the downloads page of a popular open source database.

Which should I download?

Earlier on in my career, when navigating to a project’s website in order to download a database to use, I would often see two options and have the following thoughts:

  • Downloading the binary: This option is for people who want to use the database; developers like me who just want to run it and store some data.
  • Downloading the source: This option is for people that want to have a dig around and see how it works, or for those that want to contribute to the project themselves.

For a long while, this is what we used to do in production as well: download the binary and then deploy and run it. If we needed to upgrade it, we’d download the latest binary and replace the older version. But, with time, we realized the decision to download the source was more nuanced.

This change came with scale.

Dependence

As time passed, and as the company grew and our data storage needs did also, we began to elevate the demands we placed on our storage technologies beyond the levels that you can easily find help and documentation for. We’d see errors or odd behavior that Google couldn’t help with, and those same issues also perplexed contributors on the project mailing lists.

This is where we started to worry.

At large data volumes you begin to discover that not all new features added to open source projects have been thoroughly tested at the same scale you are running at – and who would expect them to be? After all, this is free software that you just so happen to be running a successful SaaS business with. We should all be grateful that we get such a head start.

But, regardless of that head start, you begin to get locked in. Any dramatic increase in scale results in an increase in your dependence on these systems: your decision to use it becomes more complex to revert when it is supporting your customers around the clock.

Keeping up to date

Due to your increased dependence on a given system, you’ll want to more closely track the progress and roadmap of the project so that you can continue to upgrade and reap the benefits of new features and optimizations. You’ll also want to make sure that you swiftly apply new security patches.

However, even the simple act of upgrading to the latest version opens the door to more risk on your production environment:

  • Breaking changes. Some open source projects change extremely quickly and can move through non-backwards compatible version with regularity. If these are properly communicated, you’ll need to do the work to support them. This is especially hard with storage systems. And, even worse, sometimes there are breaking changes that aren’t so well communicated, and you only notice when your own application breaks!
  • Weird regressions. These are typically a pain to track down. You may notice that your application is getting slower or buggier with time, only for the problem to ultimately lie within code that you haven’t written yourself. The code of open source systems isn’t necessarily the first place you’d look, either. Surely someone else has thoroughly tested it, right?
  • Unknown maturity of a subset of features. You may eagerly upgrade to the latest version of a database because new functionality has been released that you are eager to use, only for you to find out that it hasn’t been tested extensively at scale. We struggled with the maturity of Solr being able to store indexes in HDFS to the point that we gave up and returned to local storage on SSDs. This took many weeks of testing to prove otherwise.

If you heavily depend on a technology in production to the point that it is business critical, there is a case for building projects from source and running your own builds in production, in order to be in control of your own destiny.

But why?

Reasons to build from source

Over the last few years, we have been migrating one of our larger data stores into Solr. We store hundreds of terabytes of data that users interact with via searches and facets (a word used to mean aggregations in Solr terminology). Data is continually ingested, updated and deleted: it is mutable.

While we were scaling our deployment, we hit numerous pain points that we had to overcome. In fact, some of these issues were deal breakers for that technology being able to work. We were able overcome these hurdles much more easily by building our Solr deployment from source.

Familiarization with the build process

To begin with, building a project from source expands your knowledge of how it is put together. Large projects don’t always have trivial build systems and they can take some time for you to get your head around. You’ll learn about the dependencies and required configuration. You’ll learn about what gets packaged up and bundled ready for running on your production servers, and what really happens when the system is started and stopped.

This process might help you learn techniques that you can use in your own projects going forward, even if that is how not to do things! Also, if you happen to need to debug or patch the system yourself in the future, you will save a lot of panic down the road if you have already experienced how the build process works.

You can also decide how you roll out updates of the system when they are released upstream. When you create a new build, how long should you allow it to run on your testing or staging environment before allowing it to go live? How can you maximize your chances of experiencing any unexpected behavior or broken functionality before it hits production? Should you write your own integration tests to prove that the new build still works with your own application in the way you expect?

Speed of patches to production

If you’re depending on a technology in production at scale, it’s likely you’ll experience a few bugs. Building from source allows you to isolate them and fix them fast. Rather than an upstream bug causing panic while you wait for the project to push out an official patch, if your system is broken then you can patch the source yourself, build it, and release it locally for use immediately whilst waiting for the rollout of official patches upstream. If you fix it first, then you’ve just contributed to the project.

The same is true for quickly applying the patches of others. While waiting for the maintainers to approve and release a submitted patch that affects your system, you can apply that patch to your own version and see if it solves your issue. Later, when everything is fixed officially upstream, you can build the latest upstream version and get back on track.

Not only does building from source allow you this flexibility in patching, but since it has already forced you to go through the process of building and releasing yourself, the stressful times of production system bugs should be isolated to just diagnosing and fixing bugs, rather than additionally needing to work out how to build the project and release it.

Debugging, configuring and testing

When investigating issues that you may suspect have come from an open source project, building from source allows you greater flexibility in debugging and testing.

When finding issues we suspected to originate from Solr, running our own build from source made it easier to attach the debugger. Solr is written in Java, and possessing the exact source code that produced the JAR file allowed us to remotely use the Java debugger via our IDEs to test our assumptions about where performance issues were occurring. This further improved our understanding of the system and allowed us to submit a number of patches upstream that improved performance specifically for our use case.

We also noticed that a number of open source storage projects have hard-coded – but useful – variables that are not configurable via the command line. Building from source allows you to make them configurable. Notable examples for us include the Solr recovery thread pool size, of which we now use different settings in production than the default.

So what should you build from source?

The pragmatic reader will have realized. clearly, that you don’t want to build all of your dependencies from source. As mentioned previously, at Brandwatch we build two storage technologies from source: Solr and HBase. Our decision to do this was guided by the following principles:

  • They are our primary storage systems. Our core competencies are in data enrichment, storage and analysis. These systems enable a lot of that functionality.
  • We run them at a large scale compared to what is documented. We’re nowhere near the size of deployments at Facebook or Apple, but we do run at a scale where documented evidence of this scale is scarce. We absolutely need to know how these technologies work.
  • They are written in technologies where we have in-house expertise. Both Solr and HBase are written in Java, and most of our backend developers use Java as their primary language. We have lots of people who can understand the code and make changes to it if needed.
  • Bugs and breaking changes affect our bottom line. If there is a big issue in either Solr or HBase then our customers are going to be affected. We need to be able to take control of the situation if this was ever to occur. We’ve noticed that HBase doesn’t change much, but Solr moves at a fast pace and the risk of regressions is high.

Being able to contribute upstream is also fantastic for the motivation and engagement of your engineers: what better than to be paid to contribute to the greater good? It’s nice to give back to the community, and it helps attract new engineers who want to work on these technologies at scale.

In summary

Building your mission critical systems from source allows you to be in control of your own destiny. We’ve been able to fix issues and improve the speed of Solr to make it even more of a great fit for our use case. Notably we recently improved the speed of updates to existing documents and also added the ability to facet on functions.

That massive email

Leave a comment
Growth

Goodbye, cruel world. Photo by MARVIN TOLENTINO on Unsplash.

My laptop is going out of the window

That email.

It’s been watching you all day. Lurking.

You’ve skirted around it, and you’ve turned your attention to other things: to Slack conversations, to pull requests, and even to writing that API documentation that’s been dangling at the bottom of your to-do list for weeks.

But it’s still there.

And now you have nothing else to distract you.

You open it. A wall of text appears on your screen. It’s even got proper formatting and numbered lists. What on Earth is this gigantic essay all about?

You feel your energy escape from your soul via your eyes, sucked into the event horizon of wordy dialogue that must have taken the author hours to write.

You sigh and begin reading through it.

Time passes.

By the time you get to the end, you’ve lost your train of thought and have forgotten the points that you wanted to write in your reply. You scroll right up to the beginning of the email and start reading it again.

Like before, you reach the bottom of the text, but – surprise, surprise – it’s taken so long to read that you’ve forgotten what you wanted to say.

You sigh again, but for longer this time.

Frustrated, you try a different technique.

You hit the “Reply” button so that you can write your response in sequence as you read through it. The in-line reply window opens, and now you can’t fit your response and the original email on the screen at the same time: no matter which way you resize it, all of the text and boxes dance around like a marionette.

Your brow furrows and you scratch the back of your head.

Feeling inspired, you open a new window, side by side with the original window, so you can fit the email on your screen and concurrently compose your reply at the same time.

So far so good.

You slowly read through the wall of text, making bullet point notes on it as you go. There are about six main talking points that you’ve extracted, and you expend some effort in polishing them up to form a solid narrative. You read, then re-read what you’ve written.

It looks reasonable. You sound intelligent. That makes a change.

You move your mouse cursor towards the “Send” button. As you click, you notice a message popping up at the bottom of the screen.

2 new message(s) in thread. Click here to show.

Ah, damn it.

You click to show the messages. Two more walls of text. You read them both.

The first reply has said everything you’d already written. Why did you bother? At least your opinion has been validated.

The second reply is quite confused. You don’t think they’ve read the original email properly, and they’ve tailed off into the realms of the bizarre, almost as if they’ve translated the text into Spanish, then into French, then into Klingon, then back into English.

What are they on about?

A blip sound.

1 new message(s) in thread. Click here to show.

You click.

It’s the original author’s out of office email.

A blip sound.

1 new message(s) in thread. Click here to show.

Oh, wonderful: your colleague who was very slow to write their initial reply has just chimed in on some points in the first email, taking things in a completely disparate direction.

You begin scanning the content to understand why they didn’t quite get the point.

1 new message(s) in thread. Click here to show.

Click.

Out of office again. Sigh.

You rest your head in the palm of your hand.

This conversation makes no sense any more, and you’ve been sitting here reading, writing and peeling apart these replies for 20 minutes.

You consider what your employer’s insurance is like on their equipment, and whether it covers your laptop “accidentally” launching itself out of the window in a bid for freedom.

Smash.

There’s a time and place for email

Let me begin by stating that email is brilliant. I love email.

It’s archival, the threading system works well, and since Ray Tomlinson sent the first ever message in 1971 (to himself, as a test – allegedly it may have read “QWERTYUIOP”), I would argue that email has done a fantastic job of bringing the world even closer together. The asynchronicity bridges timezones. Whole businesses are run via email communication.

Yet, there are times that email isn’t as good as other forms of communication. But let’s stay positive and look at what it’s good at first:

  • Archival notices. Since email hangs around forever, and since that it is easily searchable, email is perfect for making timestamped announcements that everyone will see and refer back to.
  • Newsletters. In my experience, whatever may feel like oversharing rarely is received that way. Sending out regular newsletters to your team, department or company is an excellent use of email to inform and increase your visibility.
  • Conversations with a narrow focus. Emails that cover one concise topic can allow people globally to contribute, assuming that the purpose of the message is to solicit opinion.
  • Follow ups to ratify decisions. After having a meeting or a decision point, a follow up email is a perfect way of putting in writing what has just happened so everyone is aligned.

So that’s the good stuff. But what’s email bad for?

  • Conversations with many active authors. The little story above is obviously an exaggeration, but “hot” email threads with lots of active participants begin to feel like a series of sliding doors. Everything gets confusing, communication is poor, effort is wasted, and nobody gets anything done. Consider a flurry of email thread activity as a signal to jump in a Slack channel, or do a video call, or start a shared document.
  • Anything requiring a quick response. Email isn’t like a DM, and comes with no guarantee of timeliness of response. People have very different approaches to their email. Some batch, some practice inbox zero, some simply get so much that they forget to reply. If you want a quick response, send a DM or make a call.
  • Topics that have many layers of context. Extremely complex subjects with many sub-contexts become extremely difficult to reply to. The email format doesn’t support levels of nesting without being fairly creative. Maybe another medium is better. If you feel like you need a deep breath after reading something complex, suggest another medium to discuss it further.

If you’re going to write an email for the reasons above, don’t. Please save others the pain!

Help your reader out

As well as using email for the right purpose, there are some ways that you can be courteous of the fact that when your recipients open your message, they are giving up their time to you.

For non-trivial email content, you can specify the actions that you want the readers to take, even if those actions are just to read it and do nothing else. The recipient, upon digesting the first couple of lines, can rest assured that even if a big block of text is coming up, they need only understand it, and not compose a short essay in response. They’ll thank you for it.

Additionally, for longer emails you can provide a short summary at the top to ease the reader in gently, or let them make the decision as to whether they want to read the whole thing or not. You may even find that as you write the summary, you can delete large parts of the proceeding text as they’re not needed after all.

And, if after all, it isn’t something that is best suited to being communicated via email, you could maybe try some alternatives:

  • Having a discussion in a private meeting room
  • Walking over to someone’s desk to ask them a question
  • Creating a Slack channel or group DM
  • Writing your thoughts in a shared document and soliciting comments
  • Going for a walk around the block to chat about it
  • Chatting about it over lunch
  • Just getting on with something using your best judgement

There’s plenty of alternatives that you could be doing rather than firing up Gmail.

So, in summary

Be a good email citizen. Otherwise this laptop gets it.