More time is spent reading software than writing it, so software (and most other forms of writing) need to be optimized for readability.
Readability is subjective. There is a term called Cognitive Complexity (not to be confused with Cyclomatic Complexity), used by SonarQube, that tries to objectively measure readability. There is an excellent white paper explaining it here (alt). While that’s a good start, I believe it’s worth discussing other factors affecting readability.
The ISO CPP guidelines have some great recommendations, of which some may be repeated here.
Naming
Your name is one of the most important parts of your identity. It is the same with code. Each module, class, function, and variable has a unique identity and fulfills a unique purpose identified by its name.
The precision and length of a name can correspond to the scope: if a name is seen often, it’s probably a highly reusable and generic entity. These entities can have shorter, more vague (reusable) and generic names. If a name is seen less often, it’s probably in a small scope and may be an implementation detail. These entities can have longer, more precise and narrow names. Using a generic name for a precise concept or a precise name for a generic concept forces readers to think about the concept in the wrong scope, and reduces readability.
The precision of naming takes away from the uniqueness of seeing — Pierre Bonnard [1]
Pierre Bonnard was an artist, and made the statement above in the context of art. Writing is also a form of art, as is writing software. I interpret this statement as claiming that the more precisely something is named, the less mysterious the thing is: the intention and functionality has been narrowed down. If the name is not precise enough, readers need to make assumptions, and may be wrong. If the name is too precise, readers may be overwhelmed with the level of detail, and the definition may be too brittle to survive a refactor.
I don’t mean to understate the difficulty of finding an appropriate name. However, if this is a consistent issue, it may be because the roles of the components in the system are not properly defined. It may also be due to a lack of understanding or application of design patterns. Design patterns have ubiquitous terminology. While one may unknowingly use design patterns in their code with their own custom terminology, a reader who is already familiar with design patterns would grasp the concepts quicker if existing terminology is reused.
When it comes to naming standards like capitalization, camelCase or snake_case and so on, it’s extremely important to consistently uphold those standards throughout the codebase. Naming standards can help a reader identify, at a glance, what the type or scope of something is. If the standards are not consistently applied and enforced, then readers will either not trust the standards, in which case they’re no longer effective, or worse, they will trust the standards and be misguided.
> My preferred naming conventions (click to expand)
Namespaces
- Namespaces must be lower case
- This allows differentiating free functions from static class member functions
- Allow common abbreviations but don’t enforce short namespaces
- Use one top level namespace with one nested layer such that cross-namespace interactions are reduced
- Exceptions to the one-nested-layer may be inline namespaces, such as for ABI versioning, or detail/implementation namespaces that aren’t meant to be seen by the users
Classes
- Classes must start with capital letters
- PascalCase
- Only the first letter in an acronym is capitalized, e.g. Rpc instead of RPC
Variables
- Lower case
- Forbid Hungarian Notation dependent on type
- Do not prefix/suffix pointer types with “p”, booleans with “b” and so on.
- The type of an object is less relevant than the purpose of the object. If the object survives a refactoring, it may have a different type but fulfill the same purpose.
- Must use Hungarian Notation dependent on scope
- The scope is relevant to the purpose of a variable
- Prefer adding an
m_
ands_
prefix for class member and static class member variables respectively. I don’t use non-const globals, but I might suggest ag_
prefix for those who do. - Prefer ALL_CAPS for const global variables or enumerations
- Discourage prefixing local variables or function parameters
- snake_case
Reducing Scope and Context Switches
All code is technical debt. More code is typically more debt. Less code is typically less debt.
On average, the less code a reader has to read, the easier they will understand a concept. This is one of the tangential benefits of software abstraction layers: you only need to know the abstraction, not the myriad of implementations or implementation details on the other side.
The same applies on a smaller scale.
Class Definition Layout
When we begin reading a class definition, we read top-down. When we want to know how to use a class, we need to know how to construct it (via the constructor), then we look at the public member functions. Ideally, we never look at the private (or even protected) attributes: they are implementation details.
class MyClass {
public:
void func1(int, char);
void func2(long, int);
void func3(int*);
// more functions here...
// Constructor
MyClass() = default;
};
In the example above, the reader encounters member functions before they see the constructor. This is counter-intuitive. If the constructor was at the top, the caller would have a stronger idea of what this class was capable of, whether it’s default constructible or not, and what it’s dependencies (constructor arguments) are. Consider re-ordering it as follows:
class MyClass {
public:
// Constructor
MyClass() = default;
void func1(int, char);
void func2(long, int);
void func3(int*);
// more functions here...
};
Now, if someone is only interested in how to create the object and call func1
, they need not bother skim through func2
and func3
to find all that they need.
This example did not show protected or private members, but the same logic applies. Since users of the class do not need to know what the protected or private members are, they should be declared at the bottom of the class definition.
Reducing Indentation and Returning Early
Consider the following function that calculates a price based on a provided price, tax, and coupon code:
enum class ErrorCode { INTERNAL_ERROR, BAD_COUPON };
std::expected<int, ErrorCode> calculate_subtotal(int price, float tax, std::string coupon_code) {
if (is_valid(coupon_code)) {
int price_after_discounts = price - calculate_discount(coupon_code);
if (price_after_discounts > 0) {
if (tax >= 0) {
return price_after_discounts + (price_after_discounts * tax);
} else {
return std::unexpected{ErrorCode::INTERNAL_ERROR};
}
} else {
return 0;
}
} else {
return std::unexpected{ErrorCode::BAD_COUPON};
}
}
There are 3 nested if-statements, each with a corresponding else
case. It’s fairly obvious that each condition statement is checking for a special case that won’t impact the general case. As we read through the function we mentally “queue” the special cases that may be handled in an else
block down the road. Finally, when we get to the beginning of the chain of else
blocks, we spend extra effort matching the else
to the corresponding if
, and again to recall the semantics of that if
. This is a nasty context switch, even if only for a few seconds.
Consider the refactored version:
enum class ErrorCode { INTERNAL_ERROR, BAD_COUPON };
std::expected<int, ErrorCode> calculate_subtotal(int price, float tax, std::string coupon_code) {
if (!is_valid(coupon_code)) {
return std::unexpected{ErrorCode::BAD_COUPON};
}
int price_after_discounts = price - calculate_discount(coupon_code);
if (price_after_discounts < 0) {
return 0;
}
if (tax <= 0) {
return std::unexpected{ErrorCode::INTERNAL_ERROR};
}
return price_after_discounts + (price_after_discounts * tax);
}
Now, there is a maximum of one indentation level. More importantly, we immediately see how the various cases are handled. If we’re debugging and trying to find out what happens with a bad coupon, we can stop reading this function almost immediately, whereas previously we had to manually find where the if
block ended. Yes, most IDEs nowadays can fold the if
expression, but that’s still an extra step to do, and more code to read.
Furthermore, the early return immediately allows readers to know that the function ends. The alternative not only involves looking for the matching else
, they also have to keep in mind that the function may continue even after the block has ended.
Const Correctness
ISO CPP has a better writeup on const correctness.
Most people have probably seen or heard of in
, out
, or in-out
function parameters. I’ve seen code that looked like this:
#define IN
#define OUT
#define INOUT
struct LargeStruct { /* some large member variables */ };
/**
* @param input[in] Some input parameter
* @param in_and_out[inout] Some input and output parameter
* @param output[out] Some output parameter
*/
void foo(IN LargeStruct& input, INOUT LargeStruct& in_and_out, OUT LargeStruct& output);
What if I told you there was a way to have compile-time versions of the preprocessor IN
, OUT
, and INOUT
macros? Yes, the keyword we’re looking for is const
. If a parameter is only an input parameter, then it is logically const, and can be passed by const&
instead. If there is no const
, then it must be either an output, or an input-output parameter. This is now enforced by the compiler.
struct LargeStruct { /* some large member variables */ };
/**
* @param input[in] Some input parameter
* @param in_and_out[inout] Some input and output parameter
* @param output[out] Some output parameter
*/
void foo(LargeStruct const& input, LargeStruct& in_and_out, LargeStruct& output);
Better yet, if a parameter is only an output parameter then it can also be returned from the function:
struct LargeStruct { /* some large member variables */ };
/**
* @param input[in] Some input parameter
* @param in_and_out[inout] Some input and output parameter
* @return Some output parameter
*/
LargeStruct foo(LargeStruct const& input, LargeStruct& in_and_out);
I know most legacy codebases return error codes, but the introduction of std::expected
, or a absl::StatusOr
are much more readable alternatives.