Tuesday, April 09, 2019

A teeny bug in jkstat char handling

While messing about with illuminate, I noticed an interesting oddity in the disk display:



See on the end of the product string is that "Revision"? That shouldn't be there, and iostat -En doesn't show it. This comes from my JKstat code, so where have I gone wrong?

This comes from the sderr kstat, which is a named kstat of the device_error class.

A named kstat is just a map of keys and values. The key is a string, the value is a union so you need to know what type the data is in order to be able to interpret the bits (and the data type of each entry is stored in the kstat, so that's fine).

For that Product field, it's initialized like this:

kstat_named_init(&stp->sd_pid, "Product", KSTAT_DATA_CHAR);

OK, so it's of type KSTAT_DATA_CHAR. The relevant entry in the union here is value.c, which is actually defined as a char c[16] - the field is 16 characters in size, long enough to hold up to 128-bit ints - most of the numerical data doesn't take that much space.

(For longer, arbitrary length strings, you can stick a pointer to the string in that union instead.)

Back to iostat data. For a SCSI device (something using the sd driver), the device properties are set up in the sd_set_errstats() function in the sd driver. This does a SCSI enquiry, and then copies the Product ID straight out of the right part of the SCSI enquiry string:

strncpy(stp->sd_pid.value.c, un->un_sd->sd_inq->inq_pid, 16);

(If you're interested, you can see the structure of the SCSI enquiry string in the /usr/include/sys/scsi/generic/inquiry.h header file. The inq_pid comes from bytes 16-31, and is 16 bytes long.)

You can see the problem. The strncpy() just copies 16 bytes into a character array that's 16 bytes long. It fits nicely, but there's a snag - because it fits exactly, there's no trailing null!

The problem with JKstat here is that it is (or was, anyway) using NewStringUTF() to convert the C string into a java String, And that doesn't have any concept of length associated with it. So it starts from the pointer to the beginning of the c[] array, and keeps going until it finds the null to terminate the string.

And if you look at the sd driver, the Revision entry comes straight after the product entry in memory, so what JNI is doing here is reading past the end of the Product value, and keeps going until it find the null at the end of the next name "Revision", and takes the whole lot. It is, I suppose, fortunate that there is something vaguely sensible for it to find.

There doesn't appear to be a way of doing the right thing in JNI itself, the fix has to be to copy the correct amount of the value into a temporary string that does have the trailing null added.

(And all the system tools written in C are fine, because they do have a way to just limit the read to 16 characters.)

No comments: