It’s a smol_world

a post by @snej in Thought Palace on (edited )

About a month ago, for some reason I got a bee in my bonnet to build a simple garbage-collected memory manager and object model. Maybe it’s because in my day job I’ve been working with a very complex one (V8)? So I did. It’s called smol_world. You can read all about it in the README at that link.

It even has a theme song — a first for me! — which goes

It’s a world of laughter
a world of song
It’s a world where pointers
are four bytes long…

How it’s smol

As the song implies, I used 32-bit “pointers”, which aren’t literally pointers but relative offsets. One of the goals was to make everything as compact as possible, because on modern CPUs cache locality is more important than the number of instructions executed.

There’s only so much cache
We must clean up our trash
It’s a smol world after all!

Heap blocks & objects

I’m proud that I got the overhead for a typical heap block down to 2 bytes, and that includes the block length so that variable-size objects like strings and arrays don’t have to store a separate size field. So the string wow takes 5 bytes, and an array of length 10 takes 42 bytes. Speaking of data types…

It’s a world of strings and arrays for you
Dictionaries and ints can be found here too
Everything’s quite compact, there’s nothing that we lack,
It’s a smol, smol world!

More prosaically: it has the ubiquitous JSON data types, plus blobs (binary strings) and symbols (unique strings used as dictionary keys.)

The uniqueness of symbols is managed with a sparse hash table, a clever merger of open hashing and sparse arrays that has only a few bits of overhead per entry.

The GC is a pretty typical Cheney copying collector. It’s very simple, aside from some complexities introduced by using relative pointers: when copying from heap A into heap B there’s no guarantee that a smol pointer [32-bit offset] in heap A can point to heap B, because the two heaps might be more than 2GB away from each other. (I could have required all heaps to be located in the same 2GB address space, but didn’t want to.)

Thoughts & Lessons

At this point I feel like I’m done with it, for now. I wouldn’t say it’s ready for production use, but my flighty attention has moved on to other things. It’s likely I’ll circle back to it someday, though. What are my takeaways?

Moving objects is troublesome

I’m still town between copying and non-copying garbage collectors. Copying makes memory allocation so simple (and fast): you just bump a pointer. Actually I should say relocating, not copying: a compacting collector has the same benefits, it’s just a lot harder to implement.

But the drawback of relocating is that external pointers into the heap have to be updated, which means you have to know where they are. You can’t use the conservative approach of scanning the stack looking for any 8-byte value that matches an object’s address, because if you guess wrong and update that ‘pointer’ you’ve just corrupted something in the stack. So this means that any external reference to an object has to be tracked explicitly somehow.

The approach I used was to have a C++ Handle<> template, a simple class that wraps an object reference. Its constructor adds its address to a list maintained by the heap, and its destructor removes it. When the GC runs, it scans that list and treats all the pointed-to references as ‘roots’ to be updated.

The problem is that I’m reluctant to use handles everywhere I refer to objects, because they’re expensive to construct/destruct, but if I make a mistake and have a non-handle reference, a plain String or Array value, and then call something that allocates memory, which might trigger GC … then afterwards, that value is now a garbage pointer. In the later part of my work, after I added test cases that parse large JSON files into object trees, I ran into quite a few bugs like this.

The approach V8 uses, apparently, is to add indirection. The V8 C++ API only exposes handles, and they appear to be pointers into an array managed a HandleScope you have to create. The nice thing is the Handles themselves are cheap to pass around, with no special ctor/dtor behavior. The drawback is that every reference to the object now goes through a double-dereference. Maybe it’s worth it?

Object tables?

I’m sort of tempted by the old idea of an object table, as found in the original Smalltalk-80 VM. Object references aren’t pointers at all, rather integer indexes into an Object Table, an array of pointers. The advantages are (a) you can relocate an object easily, you just have to update its table entry; and (b) object references can be tiny. In Smalltalk-80 they were 16 bits!

(Of course that limited the VM to 65536 objects. Worse, one bit in an object reference was a tag that made that value a small integer instead, so really it could only have 32768 objects. Really. Nevertheless, ST80 was pretty capable, I spent two summers while in college working on application software written in it.)

But what if object references were 24 bits? That gives you 8 million objects, and integers in the range ±4 million. Seems reasonable to me for what is after all a smol world.

“But Jens!” you say. “That means array items are 3 bytes! You’re talking crazy!” That’s actually a feature I didn’t get into above: smol world totally ignores byte alignment, it just puts everything on byte boundaries. It turns out on modern CPUs you pay almost no cost for that, and it eliminates a lot of wasted padding.

But what about the drawbacks? (a) Extra indirection for every object access, and (b) additional 8 bytes of memory per object, for its table entry. Yeah, maybe not.

Actual crazy talk

I’ve started thinking back to a key-value storage manager I wrote a few years ago (also unfinished…) and what synergy there might be. Databases almost always break the file up into fixed-size pages that contain keys and values and references to other pages. In practice, since keys and values are usually variable-length, storing them in pages is a bit of a Tetris game. In fact, in my implementation at least, each page had a malloc-type heap in it, a quite compact one since I knew the blocks would be smaller than 4KB.

But what if I made the pages bigger than 4KB (that’s supposed to be good for performance nowadays), like a few megabytes, and put my smol_world heaps in them? And created a big persistent object graph? One of my other interests is in alternatives to filesystems, especially the “Soup” used on the old Apple Newton device…

            do {
                It's a smol world after all...
                It's a smol world after all...
                It's a smol world after all...
                It's a smol, smol world
            } while(!insane);