Designing a small API: Bit manipulation in C#

Published by marco on

A usable API doesn’t usually spring forth in its entirety on the first try. A good, usable API generally arises iteratively, improving over time. Naturally, when using words like good and usable, I’m obliged to define what exactly I mean by that. Here are the guidelines I use when designing an API, in decreasing order of importance:

Static typing & Compile-time Errors: Wherever possible, make the compiler stop the user from doing something incorrectly instead of letting the runtime handle it.
Integrates into standard practices: That is, do not invent whole new ways of doing things; instead, reuse or build on the paradigms already present in the language.
Elegance: Ideally, using the API should be intuitive, read like natural language and not involve a bunch of syntactic tricks or hard-to-remember formulations or parameter lists.
Clean Implementation: The internals should be as generalized and understandable as possible and involve as little repetition as possible.
CLS-Compliance: Cross-language compliance is also interesting and easily achieved for all but the most low-level of APIs

Using those guidelines, I designed an API to manage bits and sets of bits in C#. Having spent a lot of time using Delphi Pascal, I’d become accustomed to set and bit operations with static typing. In C#, the .Net framework provides the Set<T> generic type, but that seems like overkill when the whole idea behind using bits is to use less space. That means using enumerated types and the FlagsAttribute; however, there are some drawbacks to using the native bit-operations directly in code:

Bit-manipulation is more low-level than most of the rest of the coding a C#-developer typically does. That, combined with doing it only rarely, makes direct testing of bits error-prone.
The syntax for testing, setting and removing bits is heavy with special symbols and duplicated identifiers.

To demonstrate, here is a sample:

[Flags]
enum TestValues
{
  None = 0,
  One = 1,
  Two = 2,
  Three = 4,
  Four = 8,
  All = 15,
}

// Set bits one and two:
var bitsOneAndTwo = TestValues.One | TestValues.Two;

// Remove bit two :
var bitOneOnly = bitsOneAndTwo & ~TestValues.Two;

// Testing for bit two:
if ((bitsOneAndTwo & TestValues.Two) == TestValues.Two)
{
  …
}

As you can see in the example above, setting a bit is reasonably intuitive (though it’s understandable to get confused about using | instead of & to combine bits). Removing a bit is more esoteric, as the combination of & with the ~ (inverse) operator is easily forgotten if not often used. Testing for a bit is quite verbose and extending to testing for one of several flags even more so.

Version One

Therefore, to make things easier, I decided to make some extension methods for these various functions and ended up with something like the following:

public static void Include<T>(this T flags, T value) { … }
public static void Exclude<T>(this T flags, T value) { … }
public static bool In<T>(this T flags, T value) { … }
public static void ForEachFlag<T>(this T flags, Action<T> action) { … }

These definitions compiled and worked as expected, but had the following major drawbacks:

At the time, we were only using them with enum values, but code completion was offering the methods for all objects because there was no generic constraint on T.
Not only that, but much of the bit-manipulation code needed to know the base type of the arguments in order to be able to cast it to and from the correct types. There were a lot of checks, but it all happened at runtime.
The ForEachFlag() function was implemented as a lambda when it is clearly an iteration. Using a lambda instead makes it impossible to use break or continue with this method.

This version, although it worked, broke several of the rules outline above; namely: while it did offer compile-time checking, the implementation had a lot of repetition in it and the iteration did not make use of the common library enumeration support (IEnumerable and foreach). That the operations were available for all objects and polluted code-completion only added insult to injury.

Version Two

A natural solution to the namespace-pollution problem is to add a generic constraint to the methods, restricting the operations to objects of type Enum, as follows:

public static void Include<T>(this T flags, T value)
  where T : Enum
{ … }

public static void Exclude<T>(this T flags, T value)
  where T : Enum
{ … }

public static bool In<T>(this T flags, T value)
  where T : Enum
{ … }

public static void ForEachFlag<T>(this T flags, Action<T> action)
  where T : Enum
{ … }

.NET enum-declarations, however, do not inherit from Enum; instead, they inherit from Int32, by default, but can also inherit from a handful of other base types (e.g. byte, Int16). This makes sense so that enum-values can be freely converted to and from these base values. Not only will a generic constraint as defined above not have the intended effect, it’s explicitly disallowed by the compiler. So, that’s a dead-end.

The other, more obvious way of restricting the target type of an extension method is to change the type of the first parameter from T to something else. However, since enum types don’t inherit from Enum, what type do we use? Well, it turns out that Enum is a strange type, indeed. It can’t be used in a generic constraint and does not serve as the base class for enumerated types but, when used as the target of an extension method, it magically applies to all enumerated types!

I took advantage of this loophole to build the next version of the API, as follows:

public static void Include<T>(this Enum flags, T value) { … }
public static void Exclude<T>(this Enum flags, T value) { … }
public static bool In<T>(this T flags, Enum value) { … }
public static void ForEachFlag<T>(this Enum flags, Action<T> action) { … }

This version had two advantages over the first version:

The methods are only available for enumerated types instead of for all types, which cleans up the code-completion pollution.
The implementation could take advantage of the Enum.GetTypeCode() method instead of the is and as-operators to figure out the type and cast the input accordingly.

Version Three

After using this version for a little while, it became obvious that there were still problems with the implementation:

Though using Enum as the target type of the extension method was a clever solution, it turns out to be a huge violation of the first design-principle outlined above: The type T for the other parameters is not guaranteed to conform to Enum. That is, the compiler cannot statically verify that the bit being checked (value) is of the same type as the bit-set (flags).
The solution only works with Enum objects, where it would also be appropriate for Int32, Int64 objects and so on.
The ForEach method still has the same problems it had in the first version; namely, that it doesn’t allow the use of break and continue and therefore violates the second design-principle above.

A little more investigation showed that the Enum.GetTypeCode() method is not unique to Enum but implements a method initially defined in the IConvertible interface. And, as luck would have it, this interface is implemented not only by the Enum class, but also by Int32, Int64 and all of the other types to which we would like to apply bit- and set-operations.

Knowing that, we can hope that the third time’s a charm and redesign the API once again, as follows:

public static void Include<T>(this T flags, T value)
  where T : IConvertible
{ … }

public static void Exclude<T>(this T flags, T value)
  where T : IConvertible
{ … }

public static bool In<T>(this T flags, T value)
  where T : IConvertible
{ … }

public static void ForEachFlag<T>(this T flags, Action<T> action)
  where T : IConvertible
{ … }

Now we have methods that apply only to those types that support set- and bit-operations (more or less[1]). Not only that, but the value and action arguments are once again guaranteed to be statically compliant with the flags arguments.

With two of the drawbacks eliminated with one change, we converted the ForEachFlag method to return an IEnumerable<T> instead, as follows:

public static IEnumerable<T> GetEnabledFlags<T>(this T flags)
  where T : IConvertible
{ … }

The result of this method can now be used with foreach and works with break and continue, as expected. Since the method also now applies to non-enumerated types, we had to re-implement it to return the set of possible bits for the type instead of simply iterating the possible enumerated values returned by Enum.GetValues().[2]

This version satisfies the first design principles (statically-typed, standard practice, elegant) relatively well, but is still forced to make concessions in implementation and CLS-compliance. It turns out that the IConvertible interface is somehow not CLS-compliant, so I was forced to mark the whole class as non-compliant. On the implementation side, I was avoiding the rather clumsy is-operator by using the IConvertible.GetByteCode() method, but still had a lot of repeated code, as shown below in a sample from the implementation of Is:

switch (flags.GetTypeCode())
{
  case TypeCode.Byte:
    return (byte)(object)flags == (byte)(object)value;
  case TypeCode.Int32:
    return (int)(object)flags == (int)(object)value;
  …
}

Unfortunately, bit-testing is so low-level that there is no (obvious) way to refine this implementation further. In order to compare the two convertible values, the compiler must be told the exact base type to use, which requires an explicit cast for each supported type, as shown above. Luckily, this limitation is in the implementation, which affects the maintainer and not the user of the API.

Since implementing the third version of these “BitTools”, I’ve added support for Is (shown partially above), Has, HasOneOf and it looks like the third time might indeed be the charm, as the saying goes.

[1] The IConvertible interface is actually implemented by other types, to which our bit-operations don’t apply at all, like double, bool and so on. The .NET library doesn’t provide a more specific interface—like “INumeric” or “IIntegralType”—so we’re stuck constraining to IConvertible instead.↩

[2]

Which, coincidentally, fixed a bug in the first and second versions that had returned all detected enumerated values—including combinations—instead of individual bits. For example, given the type shown below, we only ever expect values One and Two, and never None, OneAndTwo or All.

[Flags]
enum TestValues
{
  None = 0,
  One = 1,
  Two = 2,
  OneOrTwo = 3,
  All = 3,
}

That is, foreach (Two.GetEnabledFlags()) { … } should return only Two and foreach (All.GetEnabledFlags()) { … } should return One and Two.

↩