bit-hacking

Bit packing and unpacking is a handy way of consolidating multiple data values into a single variable, thus compressing the amount of data being stored or transmitted. The number of values that can be stored depends upon the width of the data to be stored as well as the type of the value that it is packed into. A simple byte can store up to 8 bits of data. Larger types such as ints can store up to 16, 32 or 64 bits. This is an especially efficient technique for storing several small-width values, often smaller than the smallest width supported by a platform (such as byte or boolean flags) into a single large value such as a 32-bit integer.

Bit flags are commonly used for implementing low-level features, such as storing file access-permissions or packing values into a single value before transmitting across a bus. However, they can be applied with equal ease to higher level tasks such as storing user preferences or choosing which widgets to display from an enumerated list. We will see here how to use bit flags to store font formatting preferences, and apply them later to a label.

Bitwise Operators

There are a couple of operators we need to understand before we can move on to the implementation. Bitwise operators, by definition, work on individual bits inside a value. Since they are implemented directly by the processor itself, they are much faster than arithmetic operators such as division and multiplication. We will use bitwise AND (&), bitwise OR (|) and left shifts (<<) in this exercise.

A bitwise AND operation takes the binary representations of two values and performs a logical AND operation on each bit. The result is 1 in every position where both the bits are 1, and 0 if either or both bits are 0.

    001010
AND 011011
    ------
    001010

Bitwise OR on the other hand, compares two bits in corresponding positions, and sets the result to 1 if either of them is 1, or to 0 if both of them are 0.

Bitwise left shift operator moves individual bits within a single value by the number of places specified in the second operand. The value is padded with 0s on the right, and the left-most bits are dropped off.

    001010 << 1 = 010100

Implementation

We set up a simple Windows Forms project and draw three checkboxes and one label on the form. The aim is to have the checkboxes control three font properties of the label – weight, style and underlining. All checkboxes are given appropriate labels and configured to execute the _changeFormatting method of the form every time the CheckStateChanged event is fired. The code for this method is shown below.

private void ChangeFormatting(object sender, EventArgs e)
{
    byte flags = 0;

    flags = (byte)(
        Convert.ToByte(this.chkUnderline.Checked) << 2 |
        Convert.ToByte(this.chkItalic.Checked) << 1 |
        Convert.ToByte(this.chkBold.Checked)
        );

    Font f = new Font(this.label1.Font.FontFamily, this.label1.Font.Size, (FontStyle)(
        (flags & (byte)FontStyle.Underline) |
        (flags & (byte)FontStyle.Italic) |
        (flags & (byte)FontStyle.Bold)
        ));

    this.label1.Font = f;
}

Packing

In the first statement, the flags variable is populated with the values of each checkbox. We want to store the three flags in the last three bits of a single byte.

Position	Setting
7	Unused
6	Unused
5	Unused
4	Unused
3	Unused
2	Underline
1	Italic
0	Bold

In order to do so, we take the value of each boolean (either true or false), convert it into a byte, then shift it by an appropriate number of positions. The value of the underline flag is to be stored in the 2nd bit (starting from 0). So we left-shift its value by 2. Similarly, the italic flag is stored in the 1st position, so its boolean value is shifted by 1. The value of the bold flag does not need to be shifted at all.

    00000001 << 2 = 00000100 // Underline
    00000001 << 1 = 00000010 // Italic
    00000001                 // Bold (no shifting required)

A consolidated value can be generated by ORing the three values together.

    00000100
 OR 00000010
 OR 00000001
    --------
    00000111 // Decimal value 7

    00000000
 OR 00000010
 OR 00000001
    --------
    00000011 // Decimal value 3

    00000100
 OR 00000000
 OR 00000001
    --------
    00000101 // Decimal value 5

The decimal value can then be stored in a database or other persistent storage system as an integer or byte. This is better than having to store three boolean fields. This information can transmitted across systems too as a packed unit, to be unpacked later only when the preferences have to be applied to a display element.

In our example, we are unpacking and applying the values immediately for brevity. But a more practical situation would probably involve serializing the value somewhere, then deserializing and applying the font properties later at another location.

Unpacking

In order to apply the font styles on a display element, the individual values of each style parameter must be extracted from the packed value and then applied. The .NET framework defines enumerations for each of these style parameters in the System.Drawing.FontStyle enum. The values for each style parameter are listed below.

Setting	Decimal Value	Binary Value
Regular	0	00000000
Bold	1	00000001
Italic	2	00000010
Underline	4	00000100

You will notice that each enumeration is double the value of its predecessor, hence moving the digit 1 by one position leftwards with every increase. This is a key feature of bit flags. Each element differs from the others only in the position of the 1 bit. Thus, the value of a given flag can be extracted from the packed value by ANDing the packed value with the value of the enumeration.

     00000111 // Packed value decimal 7
AND  00000100 // Underline enum decimal 4
     --------
     00000100 // Result - show underlines

This operations shows that the value of the underline flag is true. If the packed value was the decimal 3 instead of 7, then the operation would play out as shown below, resulting in the value 0 for the underline flag.

     00000011 // Packed value
AND  00000100 // Underline enum
     --------
     00000010 // Result - hide underlines

All that is needed then is to convert the result byte into a boolean and apply it wherever required. In our example above, the constructor of the Font class requires the values packed together any way as a FontStyle enum. To do this, each bit is ANDed with its corresponding enum, then all of them are combined together again using an OR operation. The resultant byte is cast into a FontStyle before being passed to the constructor.

Binary clocks are probably one of the epitomes of geek cred. Everybody can read an analog or digital clock that represents numbers in base 10. But it takes a geek to read the time from a gadget that uses an obscure and cryptic number system. Call it the hipsterdom of technology.

Understanding Number Systems

Modern number systems are remarkably similar to each other conceptually. The only difference is their applicability in different scenarios. The decimal system is in common use every day all over the world. Many fundamental concepts that are carried forward into other systems were refined using base 10 numerals. The most essential of these are naming unique digits, and positional notation.

Unique Digits

Numbers are a strange beast in that they have no end. The most primitive counting systems used scratches in the dust or pebbles to keep count. It became easier to represent larger values with the advent of separate symbols to identify different numbers. Roman numerals had special symbols for many numbers such as 5 (V), 10 (X) and 50 (L). While this made representation of larger values more compact, it still wasn’t perfect. It took the Indians, and later the Arabs, to finally come up with an extensible, yet concise number system that could represent any imaginable value.

Since it is impossible to have unique representations for every number when they are essentially infinite, the Hindu-Arabic numeral system instead has 10 unique symbols in base 10 to represent the digits from 0 to 9. By applying positional notation, all possible numerals can be represented by using these 10 symbols. Numbers greater than 9 are represented by stringing digits together. The leftmost digit has a greater magnitude than the one to its right, and the value of the numeral is a sum of its digits multiplied by their magnitudes.

Positional Notation

The magnitude itself is a power of the base. In the decimal system, the base is 10. Hence, the magnitude is 10 raised to a power that increases by one for every leftward shift. The rightmost number is multiplied by the 0th power of 10 and represents the ones position. The position to its immediate left is multiplied by 10 raised to 1, the next by 10 raised to 2, and so on.

Let us take the numeral 256 to illustrate this.

256 = 2 × 10² + 5 × 10¹ + 6 × 10⁰
    = 2 × 100 + 5 × 10 + 6 × 1
    = 200 + 50 + 6
    = 256

Binary uses the same concept to represent values. The only difference is that the rollover value in binary is 1 since it has only two digits – 0 and 1, and the number is multiplied by a power of 2 instead of a power of 10. It is also more verbose than the decimal number system. Even a relatively small value like 25 requires 5 digits in binary – 11001.

11001 = 1 × 2⁴ + 1 × 2³ + 0 × 2² + 0 × 2¹ + 1 × 2⁰
      = 1 × 16 + 1 × 8 + 0 × 4 + 0 × 2 + 1 × 1
      = 16 + 8 + 0 + 0 + 1
      = 25

In this sense, number systems are identical to odometers. When the rightmost digit reaches a certain maximum value, it goes down back to zero and the digit to its immediate left increases by one. If the second column is also at the largest digit, then it too resets to zero and the increment is applied to the third column.

Being able to read binary representations is obviously an essential requirement to read the time in a binary clock.

Structure of a Binary Clock Face

There are two types of binary clocks possible – ones that use binary coded decimals, and true binary clocks.

A clock face that uses binary-coded decimal notation

A BCD clock face is divided into six columns – two for each component of the time. Each column contains up to four rows, one for each power of two. The leftmost two columns represent the hour, the middle two are minute columns and the last two represent seconds. Each column represents a single base 10 digit of time. For example, if the value of column one is 1 and that of column 2 is 0, then the clock is representing the 10th hour of the day, or after 10 am. Similarly, if the value in column three and four is 3 and 2 respectively, the clock is in the 32nd minute of the current hour.

A clock face that represents time components in pure binary notation

True binary clocks represent each time component in a single column. Such clocks require only three columns, and up to six rows in each column to adequately cover all required values. Each column represents the absolute value of the component in binary encoding. For example, 0b001010 in the hour column represents 10 (1 × 2³ + 1 × 2¹). 0b100000 in the minutes column represents the 32nd minute of the hour. Together, the two values indicate the time as 10:32 am.

Even if you are not very accomplished at converting from binary to decimal easily, there are only a few values required to display the time in binary. Most people can easily memorize the light sequences and the values they represent after a few days of practicing.

Tag: bit-hacking

Storing Values with Bit Packing and Unpacking

Bitwise Operators

Implementation

Packing

Unpacking

Reading Time on a Binary Clock

Understanding Number Systems

Unique Digits

Positional Notation

Structure of a Binary Clock Face

Favourites

Entity Framework Basics

Code First

Creational Design Patterns

Structural Design Patterns