Easy Type-Safe Integer Types In C++

a post by @snej in Thought Palace on

I'm working on a somewhat intricate C++ project (a persistent storage manager) that internally uses a lot of different integral types: page numbers, page offsets, page-cache indexes, bucket indexes, hash codes, transaction sequence numbers... It's very easy to get these mixed up, especially by passing parameters in the wrong order when a function takes more than one of these types; the results of that would be pretty bad.

It would be great if I could declare each of these as a different type, and the compiler would stop me from assigning a value of one type to a different one. The Nim language has an easy way to do this: I can declare type PageNo = distinct int and the "distinct" keyword tells the compiler to forbid implicit conversions between PageNo and any other integer type.

tl;dr: enum class

It turns out C++ can do this too, it's just not as intuitive. The secret is enum class. Added in C++11, this is a more restrictive version of the familiar C enum, which cannot be implicitly converted to or from an integer. Another C++11 addition is that enums can specify which integer type represents them. Put this together and you get an extremely simple way to declare a type-safe integer type. For example:

enum class PageNo : uint32_t { };   ///< Represents a page number in the database file

It may seem weird to declare an "empty" enum with no constants, but it's perfectly valid. You have always been able to store any number in an enum, so the constants aren't necessary. What we get here is a type-safe form of uint32_t. The "class" keyword in the declaration means we must use explicit conversions to create PageNo values:

...
long filePos = ftell(file);
PageNo result = PageNo{filePos / kPageSize};

...or to get their integer values:

std::string readPage(PageNo page) {
    fseek(file, uint32_t(page) * kPageSize, SEEK_SET);
    ...
}

Explicit conversion isn't much trouble, since the only thing creating PageNos out of thin air is the low-level page allocator, and the only thing that needs to convert them into file positions is the I/O module. Everything else can just treat them as opaque tokens.

Note: I know I'm not the first person to figure this out, or even to blog about it. But I suspect this trick isn't as well known as it should be, so I felt inspired to spread the news.

Note: Turns out C++17's std::byte type is defined this way; it's simply enum class byte : uint8_t { }.

Initialize Safely

Breaking news! I just today learned that, starting in C++17, there are two ways to initialize an enum class value, and one is safer than the other. The drawback of the usual functional style -- PageNo(1234) -- is that since it's an explicit conversion, it will happily truncate its argument with a "narrowing conversion". So for example PageNo(0x100000000) turns out to be identical to PageNo(0), because the upper bit of 0x100000000 gets chopped off in conversion to uint32_t.

The safer style of initialization uses curly braces: PageNo{1234}. This is not a conversion, so it will fail at compile time if the argument is too big to fit in a PageNo. (But again, this is only available if you're using C++17 or later.)

Adding Constants

As a bonus, you can of course add constants to your declaration if it's appropriate. For example, maybe my sequence numbers start at 1 and I want to use 0 to mean "none" (assuming I'm not using std::optional, which is another discussion):

enum class Sequence : uint64_t { None = 0, First = 1 };

In an enum class the constants are scoped, so I have to refer to Sequence::None, which is of course safely wrapped as a Sequence, not a raw integer.

Readability FTW!

Type-safety isn't the only benefit. I find that code becomes more readable when more variables and parameters are clearly named after specific types. In function prototypes, the parameter name often becomes unnecessary:

class InteriorNode {
    ...
    PageNo childAtIndex(BucketIndex);
    ...
};

That's so much more informative than uint32_t childAtIndex(int16_t)!

Adding Functionality

Opacity is great, but in real code you do need even opaque tokens to have some minimal functionality. Most importantly, you want to tell whether two of them have the same value. Fortunately C++ allows == and != comparisons between two values of the same enum class, as well as <, <=, >, >=.

Arithmetic Operators

You can't do arithmetic, though. enum class types do not have operators for +, -, etc. This is good, since in many domains these wouldn't make sense. Why would you need to add two PageNos, or multiply two Dollars? (Though if you really wanted to, you could cast them to integers first.)

But in each domain, some arithmetic operators may make sense. Adding or subtracting two Dollar values produces another Dollar value. Multiplying two Pixels produces a SquarePixels result. Sometimes it make sense to combine enums and plain numbers, like adding an integer to a BucketIndex while searching a bucket.

You can get these operators, by defining them yourself:

using BucketIndex_t = int;
enum class BucketIndex : int { None = -1, First = 0 };

static constexpr inline BucketIndex operator+ (BucketIndex b, int i) {return BucketIndex(BucketIndex_t(b) + i);}
static constexpr inline BucketIndex operator- (BucketIndex b, int i) {return BucketIndex(BucketIndex_t(b) - i);}
static constexpr inline BucketIndex& operator++ (BucketIndex &b) {b = b + 1; return b;}
static constexpr inline BucketIndex& operator-- (BucketIndex &b) {b = b - 1; return b;}

Note: You may notice that for DRY purposes I've declared a type alias for the underlying int type. If I always use BucketIndex_t instead of int in my conversions, I isolate the underlying type of BucketIndex in one spot, making it easy to change in the future.

Formatted Output

Another roadblock I run into is writing these type-safe values to std::cerr in my logging code. Without implicit conversions to integers, std::ostream has no idea what to do with them and gives me errors. I started out by just wrapping them in explicit conversions, but when that became too annoying I added some custom conversion operators:

static inline std::ostream& operator<< (std::ostream &out, PageNo p) {
    return out << "p." << PageNo_t(n);
}

As you can see, I took the opportunity to add some adornment to make it clear that a logged number is a page number. (With other types, I've done things like writing them as hex or zero-padding them.)

Limitations And Further Steps

The problem with the enum class approach is that, once you start adding functionality, the type declaration starts to sprout boilerplate and become less clean. Unfortunately there's no way to factor out this boilerplate (short of using the preprocessor, ick.)

There is a more powerful way to make type-safe values in C++, requiring less boilerplate, but it needs more work up front. It involves creating a template class that wraps the raw type. (And the raw type can be anything, not just an integer; it could be a double or even a std::string.) If this interests you, take a look at foonathan's strong_typedef template library.

Personally, I think that library looks great, but I haven't yet gotten around to using it. It's just due to the friction involved in adding a new dependency to my project, getting it to build, and learning its API.

The thing I like about enum class is that it's built-in, with no setup. Now whenever I find myself about to implement a distinct type with a plain int or uint32_t or whatever, I stop myself and quickly add a one-line enum class declaration. The result is cleaner and much safer code. And I know that if I need enough functionality that this becomes unwieldy, I can grab a library to simplify it.