Java’s signed byte type is a mistake

The Java programming language has a signed byte type, but not an unsigned one. I think this design is a terrible mistake. From a wide range of experience, I find that the unsigned byte type has far more use cases and is easier to work with than the signed byte type.

In Java, it is not uncommon to convert an int (signed 32-bit integer) to and from an array of bytes manually. When packing signed bytes, each byte needs to be masked off because it is sign-extended, not zero-extended, to 32 bits. Consider the example of packing an array b of 4 bytes to an int x in big-endian, and compare the amount of code when the byte type is signed (actual) versus unsigned (hypothetical):

int x = ((b[3] & 0xFF) << 24) | ((b[2] & 0xFF) << 16) | ((b[1] & 0xFF) << 8) | ((b[0] & 0xFF) << 0);  // Signed bytes, clear code
int x = b[3] << 24 | (b[2] & 0xFF) << 16 | (b[1] & 0xFF) << 8 | b[0] & 0xFF;                          // Signed bytes, minimal code

int x = (b[3] << 24) | (b[2] << 16) | (b[1] << 8) | (b[0] << 0);  // Unsigned bytes, clear code
int x = b[3] << 24 | b[2] << 16 | b[1] << 8 | b[0];               // Unsigned bytes, minimal code

For reading or writing file formats and for implementing cryptographic algorithms, sometimes you need to declare a byte constant or a byte array constant. Often some of these values are greater than 127, which exceed the signed range. So you would either need to declare the constant normally and cast it (verbose), or convert the constant to its signed interpretation and declare that into the code (hurts readability). Examples:

byte[] b = {0xFF, ...};                         // Compile-time error
byte[] b = {(byte)0xFF, ...};                   // OK, but ugly
byte[] b = {-1, ...};                           // Obscures its unsigned value
byte[] b = intsToBytes(new int[]{0xFF, ...});   // Baroque[0]

When comparing a byte value to an int value, the byte is sign-extended to an int and then this value is compared to the other int. One likely mistake[1] is to test whether a byte value equals an unsigned int constant – for example, in file format checking – in a way that is always false:

byte[] b = (...);
if (b[0] == 0xFF) ...  // b[0] is in the range [-128, 127], not containing 255

As an aside, the C programming language has signed and unsigned versions of every integer type that is supported by the language. But this opens up a whole different can of worms, like casting between signed and unsigned types, comparing signed and unsigned values, mixed-type arithmetic, and more. Add to fact that the integer types have an implementation-dependent (i.e. compiler- and platform-dependent) bit widths, and that is why I hesitate to write code in C that involves absolutely precise and correct reasoning about value ranges, overflow, exact storage requirements, etc.

In practice, unsigned bytes are used as a native storage type for many things: 8-bit color channels in typical 24-bit RGB images, 8-bit extended ASCII characters (but that’s obsolete thanks to UTF-8), opcodes for microprocessors. In contrast, I can only think of one real application for signed bytes: PCM audio samples. But 8-bit audio sounds bad, so nobody prefers to use it anyway.

For the sake of completeness, I should mention that I have no problems with short, int, and long being signed types. It’s just byte that irritates me. But in light of this, we can see that having byte as a signed type preserves the consistency[2] of having signed integer types in Java.

So there you have it, the set of reasons for why I think Java’s signed byte type is a mistake and should have been designed as an unsigned type instead.

Notes



Feedback

Question? Comment? Contact me

ProjectNayuki: Like, comment, follow updates on Facebook