I'm working on a somewhat intricate C++ project (a persistent storage manager) that internally uses a lot of different integral types: page numbers, page offsets, page-cache indexes, bucket indexes, hash codes, transaction sequence numbers... It's very easy to get these mixed up, especially by passing parameters in the wrong order when a function takes more than one of these types; the results of that would be pretty bad.
It would be great if I could declare each of these as a different type, and the compiler would stop me from assigning a value of one type to a different one. The Nim language has an easy way to do this: I can declare type PageNo = distinct int
and the "distinct
" keyword tells the compiler to forbid implicit conversions between PageNo
and any other integer type.
tl;dr: enum class
It turns out C++ can do this too, it's just not as intuitive. The secret is enum class
. Added in C++11, this is a more restrictive version of the familiar C enum
, which cannot be implicitly converted to or from an integer. Another C++11 addition is that enums can specify which integer type represents them. Put this together and you get an extremely simple way to declare a type-safe integer type. For example:
enum class PageNo : uint32_t { }; ///< Represents a page number in the database file
It may seem weird to declare an "empty" enum with no constants, but it's perfectly valid. You have always been able to store any number in an enum, so the constants aren't necessary. What we get here is a type-safe form of uint32_t
. The "class
" keyword in the declaration means we must use explicit conversions to create PageNo
values:
...
long filePos = ftell(file);
PageNo result = PageNo{filePos / kPageSize};
...or to get their integer values:
std::string readPage(PageNo page) {
fseek(file, uint32_t(page) * kPageSize, SEEK_SET);
...
}
Explicit conversion isn't much trouble, since the only thing creating PageNo
s out of thin air is the low-level page allocator, and the only thing that needs to convert them into file positions is the I/O module. Everything else can just treat them as opaque tokens.
Note: I know I'm not the first person to figure this out, or even to blog about it. But I suspect this trick isn't as well known as it should be, so I felt inspired to spread the news.
Note: Turns out C++17's
std::byte
type is defined this way; it's simplyenum class byte : uint8_t { }
.
Initialize Safely
Breaking news! I just today learned that, starting in C++17, there are two ways to initialize an enum class
value, and one is safer than the other. The drawback of the usual functional style -- PageNo(1234)
-- is that since it's an explicit conversion, it will happily truncate its argument with a "narrowing conversion". So for example PageNo(0x100000000)
turns out to be identical to PageNo(0)
, because the upper bit of 0x100000000
gets chopped off in conversion to uint32_t
.
The safer style of initialization uses curly braces: PageNo{1234}
. This is not a conversion, so it will fail at compile time if the argument is too big to fit in a PageNo
. (But again, this is only available if you're using C++17 or later.)
Adding Constants
As a bonus, you can of course add constants to your declaration if it's appropriate. For example, maybe my sequence numbers start at 1 and I want to use 0 to mean "none" (assuming I'm not using std::optional
, which is another discussion):
enum class Sequence : uint64_t { None = 0, First = 1 };
In an enum class
the constants are scoped, so I have to refer to Sequence::None
, which is of course safely wrapped as a Sequence
, not a raw integer.
Readability FTW!
Type-safety isn't the only benefit. I find that code becomes more readable when more variables and parameters are clearly named after specific types. In function prototypes, the parameter name often becomes unnecessary:
class InteriorNode {
...
PageNo childAtIndex(BucketIndex);
...
};
That's so much more informative than uint32_t childAtIndex(int16_t)
!
Adding Functionality
Opacity is great, but in real code you do need even opaque tokens to have some minimal functionality. Most importantly, you want to tell whether two of them have the same value. Fortunately C++ allows ==
and !=
comparisons between two values of the same enum class
, as well as <
, <=
, >
, >=
.
Arithmetic Operators
You can't do arithmetic, though. enum class
types do not have operators for +
, -
, etc. This is good, since in many domains these wouldn't make sense. Why would you need to add two PageNo
s, or multiply two Dollar
s? (Though if you really wanted to, you could cast them to integers first.)
But in each domain, some arithmetic operators may make sense. Adding or subtracting two Dollar
values produces another Dollar
value. Multiplying two Pixels
produces a SquarePixels
result. Sometimes it make sense to combine enums and plain numbers, like adding an integer to a BucketIndex
while searching a bucket.
You can get these operators, by defining them yourself:
using BucketIndex_t = int;
enum class BucketIndex : int { None = -1, First = 0 };
static constexpr inline BucketIndex operator+ (BucketIndex b, int i) {return BucketIndex(BucketIndex_t(b) + i);}
static constexpr inline BucketIndex operator- (BucketIndex b, int i) {return BucketIndex(BucketIndex_t(b) - i);}
static constexpr inline BucketIndex& operator++ (BucketIndex &b) {b = b + 1; return b;}
static constexpr inline BucketIndex& operator-- (BucketIndex &b) {b = b - 1; return b;}
Note: You may notice that for DRY purposes I've declared a type alias for the underlying
int
type. If I always useBucketIndex_t
instead ofint
in my conversions, I isolate the underlying type ofBucketIndex
in one spot, making it easy to change in the future.
Formatted Output
Another roadblock I run into is writing these type-safe values to std::cerr
in my logging code. Without implicit conversions to integers, std::ostream
has no idea what to do with them and gives me errors. I started out by just wrapping them in explicit conversions, but when that became too annoying I added some custom conversion operators:
static inline std::ostream& operator<< (std::ostream &out, PageNo p) {
return out << "p." << PageNo_t(n);
}
As you can see, I took the opportunity to add some adornment to make it clear that a logged number is a page number. (With other types, I've done things like writing them as hex or zero-padding them.)
Limitations And Further Steps
The problem with the enum class
approach is that, once you start adding functionality, the type declaration starts to sprout boilerplate and become less clean. Unfortunately there's no way to factor out this boilerplate (short of using the preprocessor, ick.)
There is a more powerful way to make type-safe values in C++, requiring less boilerplate, but it needs more work up front. It involves creating a template class that wraps the raw type. (And the raw type can be anything, not just an integer; it could be a double
or even a std::string
.) If this interests you, take a look at foonathan's strong_typedef template library.
Personally, I think that library looks great, but I haven't yet gotten around to using it. It's just due to the friction involved in adding a new dependency to my project, getting it to build, and learning its API.
The thing I like about enum class
is that it's built-in, with no setup. Now whenever I find myself about to implement a distinct type with a plain int
or uint32_t
or whatever, I stop myself and quickly add a one-line enum class
declaration. The result is cleaner and much safer code. And I know that if I need enough functionality that this becomes unwieldy, I can grab a library to simplify it.