Working with bits is fun. They’re useful and easy to understand, especially when working with combinations. What’s not so fun is maybe incorrectly reading them from a file.
I was assigned to investigate a problem at work where the information shown from a particular data seemed off. I started by outputting the raw values our application read and manually calculated them to see if there was anything wrong somewhere in the calculations. Once I confirmed that the raw values our program received was correctly calculated, I then checked whether this raw value was even correctly obtained.
So I went to this wonderful place called the internet to search for the definition of the format this file was in and I created a separate small program in python to read the file’s raw contents so I can see if our application was wrongly reading the values without having to go through all the confusing code in our giant solution.
And, after confirming my code against the format standard definition, as well as recomputing values manually so many times, I found out that it was, in fact, the data that was invalid (i.e. it made no sense)! The format standard was pretty clear on that. Depending on how my small program and our application handled that [supposedly nonexistent] case, the values were all different and I guess that can’t be helped.
According to the standard, the data will have information on how to read the actual values it contains. So I took those information first and found I have to read 16 bits at a time, and the data says each value will be represented in 16 bits. Okay, it just means I have to use all the 16 bits I read, I though. But here comes the contradiction. The data also says the most significant bit is in the 12th position! B-but but– if I start using from the 12th position, I won’t have 16 bits to use. I’ll only have 13!
For the data to be valid, the most significant bit should have been 15. Or, the representation of the data should only be 13 bits or less.
Now, the program that I created assumed that getting from the MSB position will always ensure having enough to fill in the required number of bits and I just dropped off any extra bits. On the other hand, our application assumed that all data is valid so it shifted the bits to the left in order to have the MSB position at the leftmost position, and then shifted it back to the right by the 16 – number of bits used.
In this case, if for example I had a value 0xFB73 or 0b1111101101110011. Using the program I created, I would just find the MSB position and get all the bits from there. In this case, that is a total of thirteen bits. I then use that value and read it as a two’s complement binary.
1111 1011 0111 0011
1111 1011 0111 0011 (get from MSB)
–> 1 1011 0111 0011 (drop 0 bits from the right because 13 – 16 = -3 => 0)
Using the same value, our application would shift the values accordingly in order to satisfy using the correct MSB and at the same time using the correct number of bits.
1111 1011 0111 0011
–> 1101 1011 1001 1000 (shift left 3 times, 15 – 12)
–> 1101 1011 1001 1000 (shift right 0 times, 16-16)