Why do smart engineers do stupid things?
If you understand how smart people can blow things up through a series of seemingly logical decisions, then you’ll know when you need a management solution and whether the process du jour might work.
When software is brand new and managed by a small team, it’s easy to make good decisions:
The code base is small, and you don’t think about the future because you’re just trying to get off the ground. As the team grows and the product matures, it becomes increasingly difficult to see the big picture when making decisions.
This is fine when local goals align with global priorities, like writing high-quality code.
However, the best local decisions can sometimes be harmful when extrapolated to a broader context over time.
Here we’ll look at a few common ways this problem can manifest.
When faced with a problem, engineers like to find the best solution. However, this often requires complex analysis. The alternative is to apply a general heuristic that works well on average but may not always be the best.
This situation comes up frequently with software-development guidelines. It’s tempting to make exceptions for extenuating circumstances — but doing so erodes consistency. The cost of this can be hard to grasp since the effects are often subtle and long-term, while the benefits of flexibility are more apparent and immediate.
While test-coverage flexibility can make better local decisions possible, the inconsistency it can create may take a major toll over time.
First, flexibility is only better if people actually make the right decisions; less experienced engineers may make more mistakes. For decision flexibility to be valuable, one must specify who can make decisions, train those people, establish a review process, and evaluate decision quality as part of performance reviews.
Another cost of flexibility is it leads to a lot more decision-making overhead, which comes in the form of discussions and the documentation of the decisions.
One way sensible local decisions can lead to bad global outcomes is when there are significant costs or benefits that only have an impact in aggregate. Because no single decision will tip the scale, people don’t consider aggregate effects when making each decision.
Perhaps the most familiar aggregate effect in software engineering is the fixed cost of using a particular technology or method, which further compounds the more in play.
Each language requires recruiting engineers who know the language, establishing training programs, creating development practices, purchasing tools, managing knowledge, etc. Having multiple languages also makes it harder to allocate developers to projects.
Even a small amount of code in a different language can incur high global fixed costs, and eliminating a language entirely will have huge savings.
The same principle applies to using other technologies and practices, like different libraries or frameworks, operating systems, or even agile planning processes.
Keeping up with best practices is an ongoing battle for engineering teams. This is particularly challenging when you have a large codebase, making it impractical to migrate to new technology all at once.
The common way to deal with this is to refactor code progressively, moving things over piece by piece. Because the main benefit of refactoring is lowering maintenance overhead, it makes sense to prioritize modules that require the most maintenance first.
Here are some questions to consider to avoid falling into the trap of overlooking aggregate effects:
In this article, we focused on local versus. global optimization and covered a few ways people make bad decisions.
While these examples will hopefully prevent certain mistakes, they’re by no means comprehensive.
Lasting success requires ongoing vigilance by engineering leaders who have a global perspective.