Monday, July 30, 2007

Aligned strings

Rigid Bodies

Rigid-body dynamics has been slow going. I did some work on changing over from a GLUT-based application to my DirectX-based framework; I want to start making it into a real little game, and my DirectX framework gives me smoother animation updates, full keyboard and mouse control, and minimizes CPU usage (which makes it play nice with other running applications). It's a pain to switch from OpenGL to Direct3D. They're both decent APIs now but OpenGL (when paired with GLUT) just takes a lot less coding overhead for quick-and-dirty programs. It's much easier to think in OpenGL, at least for me. I've been using both for a decade, so it's not really a matter of familiarity with one over the other.

I could keep OpenGL as the rendering API and use DirectInput by itself for keyboard and mouse input. Some games do this, and it works all right. Microsoft has made sure that OpenGL never works quite as nicely as Direct3D for games, though. It's been years since I've done professional games programming in OpenGL on Windows, so maybe I'm all wrong, but back when I was doing it, you had to use an undocumented interface to get full-screen access with OpenGL, and when you did, all the other windows on the desktop got moved down to the lower-right corner of the screen. You may remember playing PC games like Quake 2 that had this effect. You'd exit your game and find all your windows huddled down in the corner.

Aligned Strings

When I was in high school in the early 90s I worked in a group that made an operating system. The OS had been originally written in PL/1 and semi-automatically translated to C shortly before I signed on.

In the C family of languages #include dependency management is a continual struggle, although the same problems apply to some extent to any large system written in a modular language. During development (i.e. all the time) the interfaces between modules are in flux; functions move about. Before IDE assistance it was tedious to figure out which header files to include. Moreover, over time source files end up including more header files than necessary, which slows down compilation.

Precompiled header files address this problem in one way. You essentially have one giant header file with every declaration, and every source file includes it. The gimmick is that that header file is pre-parsed so it loads more quickly.

We put together a different solution that was used for a decade or so. The idea was to scan all the source files and create a database of global symbol declarations. A second scan for uses of global symbols allowed the tool to fabricate a custom header file for each source module containing just the global declarations it needed. Each source file included exactly one header file, so compilation was quick.

The key to this was being able to understand the C code enough to extract global symbol declarations, and to identify global symbol usages. In C if you stay away from tricky use of the macro preprocessor it's not too hard to do. I wrote a parser for the subset of the C grammar dealing with global definitions. In theory we would have had to understand more of the language in order to recognize global symbol usages; we fudged on that by simply including declarations for global symbols whenever we saw their identifiers, regardless of whether the identifier was really a global symbol usage.

This brings me to the subject of parsing. In my tests at the time something like 80% of the time was spent breaking the input into tokens. It takes a lot of time simply to traverse all that text. It turned out that a lot of that time was being spent in string comparisons, looking up identifiers to see if they were reserved words.

To speed up string comparisons we came up with aligned strings. The idea is to treat strings as sequences of 32-bit (or 64-bit) words instead of as sequences of 8-bit characters. The string is kept aligned on a word boundary, and the end is padded with zeros to fill an integral number of words. Then, string comparisons can be done four times (or eight times) faster.

There's a small amount of memory overhead required due to the padding (1.5 bytes per string on average, if you're using 32-bit words), and you have to copy characters out of the input stream into aligned strings before comparing them against other aligned strings.

The only other downside I can think of off-hand is that strings don't necessarily sort in alphabetical order. On a big-endian architecture they do, since the character lowest in memory compares first, then the next. On a little-endian architecture the effective character comparison order is: 3, 2, 1, 0, 7, 6, 5, 4, and so on.

If the specific string ordering is not important, though, I think aligned strings are hard to beat. I'm surprised there hasn't been a string class with this behavior. Maybe there are some other downsides I haven't thought of. It worked great for us all those years ago, though.

No comments: