Friday, October 14, 2011

Optimizing mysql with DTrace

I've got a mysql server that's a bit busy, and I idly wondered why. Now, this is mysql 4, so it's a bit old and doesn't have a lot of built-in diagnostics.

(In case you're wondering, we have an awful lot of legacy client code that issues queries using a join syntax that isn't supported by later versions of mysql, so simply upgrading mysql isn't an option.)

This is running on Solaris, and a quick look with iostat indicates that the server isn't doing any physical I/O. That's good. A quick look with fsstat indicates that we're seeing quite a lot of read activity - all out of memory, as nothing goes to disk. (We're using zfs, which makes it trivially easy to split the storage up to give mysql its own file system, so we can monitor just the mysql traffic using fsstat.)

As an aside, when reading MyISAM tables the server relies on the OS to buffer table data in RAM, so you actually see the reads of tables as reads of the underlying files, which gets caught in fsstat and makes the following trivial.

But, which tables are actually being read? You might guess, by looking at the queries ("show full processlist" is your friend), but that simply tells you which tables are being accessed, not how much data is being read from each one.

Given that we're on Solaris, it's trivial to use DTrace to simply count both the read operations and the bytes read, per file. The following one-liners from the DTrace Book are all we need. First, to count reads per file:

dtrace -n 'syscall::read:entry /execname == "mysqld"/ { @[fds[arg0].fi_pathname] = count(); }'

and bytes per file:

dtrace -n 'fsinfo:::read /execname == "mysqld"/ { @[args[0]->fi_pathname] = sum(arg1); }'

With DTrace, aggregation is built in so there's no need to post-process the data.

The latter is what's interesting here. So running that quickly and looking at the last 3 lines which are the most heavily read files:

  /mysql/data/foobar/subscriptions_table.MYD         42585701
  /mysql/data/foobar/concurrent_accesses.MYD         83717726
  /mysql/data/foobar/databases_table.MYD          177066629

That last table accounts for well over a third of the total bytes read. A quick look indicates that it's a very small table (39kbytes whereas other tables are quite a bit larger).

A quick look in mysql using DESCRIBE TABLE showed that this table had no indexed columns, so the problem is that every query is doing a full table scan. I added a quick index on the most likely looking column and the bytes read drops down to almost nothing, with a little speedup and corresponding reduction of load on the server.

I've used this exact technique quite a few times now - fsstat showing huge reads, a DTrace one-liner to identify the errant table. Many times, users and developers set up a simple mysql instance, and it's reasonably quick to start with so they don't bother with thinking about an index. Later (possibly years later) data and usage grows and performance drops as a result. And often they're just doing a very simple SELECT.

It's not always that simple. The other two most heavily read tables are also interesting. One of them is quite large, gets a lot of accesses, and ends up being quite sparse. Running OPTIMIZE TABLE to compact out the gaps due to deleted rows helped a lot there. The other one is actually the one used by the funky join query that's blocking the mysql version upgrade. No index I create makes any difference, and I suspect that biting the bullet and rewriting the query (or restructuring the tables somewhat more sanely) is what's going to be necessary to make any improvements.

Saturday, October 08, 2011

JKstat 0.60, handling kstat chain updates correctly

I've just updated JKstat, now up to version 0.60.

The key change this time, and the reason for a jump in version number (the previous was 0.53) is that I've changed the way that updates to the kstat chain are handled.

Now, libkstat has a kstat_chain_update() function, which you call to synchronize your idea of what kstats exist with the current view held by the kernel. And you can look at the return value to see if anything has changed.

This only works if you're running in a single thread, of course. If you have multiple threads, then it's possible that only one will detect a change. Even worse if you have a server with multiple clients. So, the only reliable way for any consumer to detect whether the chain has been updated is to retrieve the current kstat chain ID and compare it with the one it holds.

This has largely been hidden because I've usually used the KstatSet class to track updates, and it does the right thing. It checks the kstat ID and doesn't blindly trust the return code from kstat_chain_update(). (And it handles subsets of kstats and will only notify its consumers of any relevant changes.)

So what I've finally done is eliminate the notion of calling kstat_chain_update(), which should have been done long ago. The native code still calls this internally, to make sure it's correctly synchronized with the kernel, but all consumers need to track the kstat chain ID themselves.

This change actually helps client-server operation, as it means we only need one call to see if anything has changed rather than the two that were needed before.