Sunday, December 08, 2013

Abusing the Preprocessor, Almost

I keep running into a mildly annoying problem in my C-language embedded software applications. It involves the declaration, definition and initialization of constant data tables. On a couple of recent projects, it became such a headache that I decided to do something about it, and to document it on my blog. Let’s get started with a simple example.

MAKING A LIST OF THINGS

It’s fairly common to use defines or enumeration to declare states within a state machine. Consider the below states from a fictitious breathalyzer:
#define STATE_INIT              (0)
#define STATE_IDLE              (1)
#define STATE_TAKING_SAMPLE     (2)
#define STATE_STONE_COLD_SOBER  (3)
#define STATE_TIPSY             (4)
#define STATE_DRUNK             (5)
#define STATE_SHIT_FACED        (6)
#define STATE_PASSED_OUT        (7)
#define STATE_NUM_ITEMS         (8)

Or alternatively, we can use an enumeration, which will automatically assign the states sequentially.
enum BREATHALYZER_STATES {
  STATE_INIT=0,
  STATE_IDLE,
  STATE_TAKING_SAMPLE,
  STATE_STONE_COLD_SOBER,
  STATE_TIPSY,
  STATE_DRUNK,
  STATE_SHIT_FACED,
  STATE_PASSED_OUT,
  STATE_NUM_ITEMS,
};

Often we don’t care about the actual value assigned to such states, but there can be advantages to having them in sequence. One reason would be to easily obtain a text name representing the state. For example:
const char *state_names[STATE_NUM_ITEMS]={
  “initializing”,
  “idle”,
  “taking a sample”,
  “stone cold sober”,
  “tipsy”,
  “drunk”,
  “shit-faced”,
  “passed out”,
};

Another advantage to using the enumeration to define the states is that the code becomes easier to maintain if you need to insert a state later on. You don’t have to manually change all the numbers in the list of defines, as the enum statement automatically assigns them in sequence.

So, why am I not happy with this? Because the combined information about the states, state numbers and text strings, most often must exist in two different files. We properly put the enumeration in the header file, but the definition and initialization of the name strings has to be in the .c file. For this trivial case, this is just a nuisance. But it can get out of hand easily with more and larger pairs of integers / strings.

CONSTANT DATA TABLES

Here is another, similar, application of the same concept. Consider a lookup table containing information about different units of measure (in this case, lengths):
enum UNITS {
  UNIT_METER=0, 
  UNIT_KILOMETER, 
  UNIT_CENTIMETER, 
  UNIT_MILLIMETER,
  UNIT_MILE,
  UNIT_YARD,
  UNIT_FEET,
  UNIT_INCH,
  UNIT_MIL,
  UNIT_NUM_ITEMS,
};

typedef struct tagUNIT_DEFN {
  const char const *name;
  const char const *abbr;
  const double fact;
  const double off;
} UNIT_DEFN;

const UNIT_DEFN unit_defn[UNIT_NUM_ITEMS]={
  { "meter", "m", 1.0L, 0.0L },
  { "kilometer", "km",  1000.0L,   0.0L },
  { "centimeter", "cm",  0.01L, 0.0L },
  { "millimeter", "mm", 0.001L, 0.0L }, 
  { "mile",  "mi", 1609.344L, 0.0L },
  { "yard",  "yd", 0.9144L, 0.0L },
  { "foot", "ft", 0.3048L, 0.0L },
  { "inch", "in", 0.0254L, 0.0L },
  { "mil",  "mil", 2.54E-05L, 0.0L },
};

Here again, it’s important to keep the indices and the constant table synchronized as changes are made, perhaps to add other length-type units such as furlongs, angstroms, and light-years.

There is one solution to the synchronization problem, which I found searching around the web. Although the downside is that it’s only available on compilers which support C99 extensions. But if you have C99, then you can specify the location of an array being initialized using what are called “designated initializers”. Therefore, the above array would look like this:
const UNIT_DEFN unit_defb[UNIT_NUM_ITEMS]={
[UNIT_METER]={"meter", "m", 1.0L, 0.L},
[UNIT_KILOMETER]={"kilometer", "km", 1000.0L, 0.L},
[UNIT_CENTIMETER]={"centimeter", "cm", 0.01L, 0.L},
[UNIT_MILLIMETER]={"millimeter", "mm", 0.001L, 0.L}, 
[UNIT_MILE]={"mile", "mi", 1609.344L, 0.L},
[UNIT_YARD]={"yard", "yd", 0.9144L, 0.L},
[UNIT_FEET]={"foot", "ft", 0.3048L, 0.L},
[UNIT_INCH]={"inch", "in", 0.0254L, 0.L},
[UNIT_MIL]={"mil", "mil", 2.54E-05L, 0.L},
};

That solves the concern about getting the data table out of sync with the enumeration. But, we still have the issue of maintaining the table in two different files. Now is the time to abuse the preprocessor (1).

LET THE PREPROCESSOR DO THE WORK

I found an obscure solution to my problem, using the preprocessor in a most unusual manner (2). After a bit of head scratching, I hit on the following approach. First of all, make a “table” in the following format, which is actually one huge preprocessor macro:
#define UNIT_TABLE(F) \
F(UNIT_METER, "meter", "m", 1.0L, 0.0L)\
F(UNIT_KILOMETER, "kilometer", "km",  1000.0L, 0.0L)\
F(UNIT_CENTIMETER, "centimeter", "cm",  0.01L, 0.0L)\
F(UNIT_MILLIMETER, "millimeter", "mm", 0.001L, 0.0L)\
F( UNIT_MILE, "mile",  "mi", 1609.344L, 0.0L)\
F( UNIT_YARD, "yard",  "yd", 0.9144L, 0.0L)\
F( UNIT_FEET, "foot", "ft", 0.3048L, 0.0L)\
F( UNIT_INCH, "inch", "in", 0.0254L, 0.0L)\
F( UNIT_MIL, "mil",  "mil", 2.54E-05L, 0.0L)\
/**/

With this “table” defined, an in just ONE place (the header file), it’s possible to enumerate the indices,
// Make the enumeration of the unit types
#define EXTRACT_ENUM( ID, NAME, ABBR, FACT, OFF ) ID,
enum UNIT_TYPES {
  UNIT_TABLE(EXTRACT_ENUM)
  UNIT_NUM_ITEMS,
};
#undef EXTRACT_UNIT_ENUM

and define/initialize the table automatically:
// Makes the constant table of units
#define EXTRACT_DEFN( ID, NAME, ABBR, FACT, OFF ) \
  { NAME, ABBR, FACT, OFF },
const UNIT_DEFN unit_defn[UNIT_NUM_ITEMS]={
  UNIT_TABLE(EXTRACT_DEFN)
};
#undef EXTRACT_DEFN

When expanded by the preprocessor, we get exactly what we want:
enum UNIT_TYPES {
  UNIT_METER, UNIT_KILOMETER, UNIT_CENTIMETER, UNIT_MILLIMETER, UNIT_MILE, UNIT_YARD, UNIT_FEET, UNIT_INCH, UNIT_MIL,
  UNIT_NUM_ITEMS,
};

const UNIT_DEFN unit_defn[UNIT_NUM_ITEMS]={
  { "meter", "m", 1.0L, 0.0L }, { "kilometer", "km", 1000.0L, 0.0L }, { "centimeter", "cm", 0.01L, 0.0L }, { "millimeter", "mm", 0.001L, 0.0L }, { "mile", "mi", 1609.344L, 0.0L }, { "yard", "yd", 0.9144L, 0.0L }, { "foot", "ft", 0.3048L, 0.0L }, { "inch", "in", 0.0254L, 0.0L }, { "mil", "mil", 2.54E-05L, 0.0L },
};

Well, not EXACTLY, that’s almost unreadable - the macro expanded into one monster (wrapped) line of code! Where are the newlines? It is a limitation of the preprocessor that you can’t force a newline, and therefore that’s one pitfall of this method. If you are tracking down a typo, and need to examine the preprocessor output, you have to contend with this monster-long-line, wrapped format. I wrote a script to “un-wrap” the file, but for occasional debugging, it’s probably sufficient to do it manually in your editor. In vim (I’m old-school), these ex commands do it:
:s/, /,^M  /g      (for the enum)
:s/}, /},^M  /g    (for the table)

Reformatting the output thusly yields the expected result:
enum UNIT_TYPES {
  UNIT_METER,
  UNIT_KILOMETER,
  UNIT_CENTIMETER,
  UNIT_MILLIMETER,
  UNIT_MILE,
  UNIT_YARD,
  UNIT_FEET,
  UNIT_INCH,
  UNIT_MIL,
  UNIT_NUM_ITEMS,
};

const UNIT_DEFN unit_defn[UNIT_NUM_ITEMS]={
  { "meter", "m", 1.0L, 0.0L },
  { "kilometer", "km", 1000.0L, 0.0L },
  { "centimeter", "cm", 0.01L, 0.0L },
  { "millimeter", "mm", 0.001L, 0.0L },
  { "mile", "mi", 1609.344L, 0.0L },
  { "yard", "yd", 0.9144L, 0.0L },
  { "foot", "ft", 0.3048L, 0.0L },
  { "inch", "in", 0.0254L, 0.0L },
  { "mil", "mil", 2.54E-05L, 0.0L },
};

This is exactly what we started with, but because it is automatically generated from the same “table” in the header file, there is no chance for the table to get out of sync, and furthermore we only have to edit one file to chance the data table.

WRAP IT UP

Therefore, we have the following code snippets to place in the header and source files:
                   unit.h:

typedef struct tagUNIT_DEFN {
  const char const *name;
  const char const *abbr;
  const double fact;
  const double off;
} UNIT_DEFN;

#define UNIT_TABLE(F) \
F(UNIT_METER, "meter", "m", 1.0L, 0.0L)\
F(UNIT_KILOMETER, "kilometer", "km",  1000.0L, 0.0L)\
F(UNIT_CENTIMETER, "centimeter", "cm",  0.01L, 0.0L)\
F(UNIT_MILLIMETER, "millimeter", "mm", 0.001L, 0.0L)\
F( UNIT_MILE, "mile",  "mi", 1609.344L, 0.0L)\
F( UNIT_YARD, "yard",  "yd", 0.9144L, 0.0L)\
F( UNIT_FEET, "foot", "ft", 0.3048L, 0.0L)\
F( UNIT_INCH, "inch", "in", 0.0254L, 0.0L)\
F( UNIT_MIL, "mil",  "mil", 2.54E-05L, 0.0L)\
/**/

// Make the enumeration of the unit types
#define EXTRACT_ENUM( ID, NAME, ABBR, FACT, OFF ) ID,
enum UNIT_TYPES {
  UNIT_TABLE(EXTRACT_ENUM)
  UNIT_NUM_ITEMS,
};
#undef EXTRACT_UNIT_ENUM

extern const UNIT_DEFN unit_defn[UNIT_NUM_ITEMS];


                   unit.c:
#include “unit.h”

// Makes the constant table of units
#define EXTRACT_DEFN( ID, NAME, ABBR, FACT, OFF ) \
  { NAME, ABBR, FACT, OFF },
const UNIT_DEFN unit_defn[UNIT_NUM_ITEMS]={
  UNIT_TABLE(EXTRACT_DEFN)
};
#undef EXTRACT_DEFN

It isn’t pretty. In fact, it’s too ugly for this simple example. But, it shows a “standard” method for defining, declaring and initializing constant data, which can be maintained in just one place. The real reason I explored this solution was because I have some much larger applications where this “ugly” solution is actually pretty, and solves some serious maintenance headaches.

Next article, I will expand on this approach, making a more general purpose module .


NOTES:

(1) this phrase is shamelessly stolen from Mr. Michael Tedder’s blog post,

(2) this method is inspired by a posting on StackOverflow by user Eyal. His method involves initializing a table in RAM, while I’m trying to initialize constant data. Each type of initialization using this method presents it’s own challenges.

1 comment:

Tuttle said...

I think I see what your problem is: you are getting drunk at the beginning of the process, instead of waiting a bit.