Wednesday, March 22, 2006

poking around with dtrace

Having released jfsstat, I've been using it on my test system just to see what would come up.

I left it running looking at file accesses - lookup, access, pathconf, readlink - and started to notice some odd patterns.

The first was that /proc was getting a very steady 25 lookups and 12.5 accesses per second. OK, that's easy - I had top running (refreshing every 5 seconds) and this system has 60 processes on it, so every refresh top is doing 1 access and 2 lookup per process.

The next was that / (and this machine just has the single filesystem) was seeing about 600 lookup and 10 access requests every 10 seconds.

The third part of the pattern was that roughly every minute there was a veritable storm of operations on the / filesystem.

Now, one of the longer term aims of my writing tools like jfsstat and jkstat is that you see some strange activity, and you can click on the activity there and then and drill down deeper - probably with dtrace. I'm not there yet, but I thought I would practice my dtrace skills and do it by hand.

OK, so what I'm after is ufs lookups. What can dtrace give me?

% dtrace -l | grep ufs| grep lookup
536 vtrace ufs ufs_lookup TR_UFS_LOOKUP_END
537 vtrace ufs ufs_lookup TR_UFS_LOOKUP_START
17836 fbt ufs ufs_lookup entry
17837 fbt ufs ufs_lookup return

Right. Let's have a look at what's generating those lookup requests:

% dtrace -n 'fbt:ufs:ufs_lookup:entry{@[execname]=count();}'
dtrace: description 'fbt:ufs:ufs_lookup:entry' matched 1 probe
^C

java 612

So, that 10 second repeating burst of 600 lookups is java. In fact, I know this to be tomcat.

Now I run the script again to catch the massive burst of activity that happens every minute:

% dtrace -n 'fbt:ufs:ufs_lookup:entry{@[execname]=count();}'
dtrace: description 'fbt:ufs:ufs_lookup:entry' matched 1 probe
^C

sched 24
pwd 28
sh 75
fping 91
server.sh 112
uname 150
mysqld 162
ping.sh 238
nscd 491
cron 498
init 988
load.sh 1014
rup 1539
java 1836
perl 2120
awk 2394
mysql 63666

I've hit the tomcat activity burst 3 times, but the once a minute is coming from something launched by cron - a system monitoring script that runs fping and rup and pokes the results back into a mysql database. But what on earth is mysql doing making 63666 lookup requests?

(The first question I asked was - is this one instance of mysql or many? If I aggregate on pid as well as execname then I see that I'm running a lot of copies of mysql, each of which generates 786 lookup requests on the filesystem.)

Next question: what are the pathnames that are being used in the lookup request? To get this, I need to understand the ufs_lookup call itself in a little more detail. So the source tells us that the 4th argument is a struct pathname, and the string I'm after is the pn_buf member. So let's see what pathnames are being looked up. First the little tomcat burst:

dtrace -n 'fbt:ufs:ufs_lookup:entry{@[stringof(args[3]->pn_buf)]=count();}'
dtrace: description 'fbt:ufs:ufs_lookup:entry' matched 1 probe
^C

/opt/XSload/tomcat/conf/Catalina/localhost 6
/opt/XSload/tomcat/webapps/ROOT/META-INF/context.xml 6
/opt/XSload/tomcat/webapps/jsp-examples/META-INF/context.xml 6
...
/opt/XSload/tomcat/webapps/xsload/WEB-INF 18
/opt/XSload/tomcat/webapps/xsload.war 20
/opt/XSload/tomcat/conf/Catalina/localhost/host-manager.xml 35
/opt/XSload/tomcat/conf/Catalina/localhost/manager.xml 35
/opt/XSload/tomcat/webapps/balancer/META-INF/context.xml 35
/opt/XSload/tomcat/conf/context.xml 90

Pretty clear, really - every 10 seconds tomcat goes round checking to see if you've gone and modified anything.

Now for the mysql burst. What are the pathnames here? There's quite a lot of output, so I've trimmed it a bit:

...
/usr/ccs/lib/libc.so.1 432
/usr/ccs/lib/libcrypt_i.so.1 432
/usr/ccs/lib/libcurses.so.1 432
/usr/ccs/lib/libgen.so.1 432
/usr/ccs/lib/libm.so.1 432
/usr/ccs/lib/libnsl.so.1 432
/usr/ccs/lib/librt.so.1 432
/usr/ccs/lib/libsocket.so.1 432
/usr/ccs/lib/libthread.so.1 432
/usr/ccs/lib/libw.so.1 432
/usr/ccs/lib/libz.so.1 432
/lib/ld.so.1 504
/opt/SUNWspro/lib/rw7/libCrun.so.1 540
/opt/SUNWspro/lib/rw7/libCstd.so.1 540
/opt/SUNWspro/lib/rw7/libc.so.1 540
/opt/SUNWspro/lib/rw7/libcrypt_i.so.1 540
/opt/SUNWspro/lib/rw7/libcurses.so.1 540
/opt/SUNWspro/lib/rw7/libgen.so.1 540
/opt/SUNWspro/lib/rw7/libm.so.1 540
/opt/SUNWspro/lib/rw7/libnsl.so.1 540
/opt/SUNWspro/lib/rw7/librt.so.1 540
/opt/SUNWspro/lib/rw7/libsocket.so.1 540
/opt/SUNWspro/lib/rw7/libthread.so.1 540
/opt/SUNWspro/lib/rw7/libw.so.1 540
/opt/SUNWspro/lib/rw7/libz.so.1 540
/opt/SUNWspro/lib/v8/libCrun.so.1 540
/opt/SUNWspro/lib/v8/libCstd.so.1 540
/usr/local/mysql/data/my.cnf 540
...
/opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus+vis/libCstd_isa.so.1 756
/opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus+vis2/libCstd_isa.so.1 756
/opt/SUNWspro/prod/usr/lib/cpu/sparcv9+vis/libCstd_isa.so.1 756
/opt/SUNWspro/prod/usr/lib/cpu/sparcv9+vis2/libCstd_isa.so.1 756
/opt/SUNWspro/prod/usr/lib/cpu/sparcv9/libCstd_isa.so.1 756
...
/opt/SUNWspro/lib/libthread.so.1 864
/opt/SUNWspro/lib/libw.so.1 864
/opt/SUNWspro/lib/libz.so.1 864
/lib/libm.so.2 888
/usr/lib/libc.so.1 972
/usr/lib/libcrypt_i.so.1 972
...
/opt/SUNWspro/lib/v8/libgen.so.1 1080
/opt/SUNWspro/lib/v8/libm.so.1 1080
/opt/SUNWspro/lib/v8/libnsl.so.1 1080
/opt/SUNWspro/lib/v8/librt.so.1 1080
/opt/SUNWspro/lib/v8/libsocket.so.1 1080
/opt/SUNWspro/lib/v8/libthread.so.1 1080
...
/opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus/../../libCrun.so.1 2160
/opt/SUNWspro/prod/usr/lib/cpu/sparcv8plus/libCstd_isa.so.1 2592
/platform/SUNW,Sun-Blade-1000/lib/../../sun4u-us3/lib/libc_psr.so.1 3360
/platform/SUNW,Sun-Blade-1000/lib/libc_psr.so.1 4480

What on earth?

So, most of these lookup operations are the mysql binary looking for shared libraries when it starts. And in some less than obvious places too! So why is this? I go and have a look at the binary:

% dump -Lv mysql
[INDEX] Tag Value
[1] NEEDED libcurses.so.1
[2] NEEDED libz.so.1
[3] NEEDED librt.so.1
[4] NEEDED libcrypt_i.so.1
[5] NEEDED libgen.so.1
[6] NEEDED libsocket.so.1
[7] NEEDED libnsl.so.1
[8] NEEDED libm.so.1
[9] NEEDED libCstd.so.1
[10] NEEDED libCrun.so.1
[11] NEEDED libw.so.1
[12] NEEDED libthread.so.1
[13] NEEDED libc.so.1
[16] RUNPATH /opt/SUNWspro/lib/rw7:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/usr/ccs/lib:/usr/lib
[17] RPATH /opt/SUNWspro/lib/rw7:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/opt/SUNWspro/lib/v8:/opt/SUNWspro/lib:/usr/ccs/lib:/usr/lib

The list of libraries matches what we see being looked for, so that makes sense. The problem is that the compiled in library search path contains places that it shouldn't (and some repeated), so it needlessly searches those locations (and multiple times at that) when mysql starts up.

So, problem solved. And, as I said earlier, the idea is that you be able to see this anomalous activity in one of the jkstat tools and click on it to drill down into the system to see what's going on, so that all the dtrace is done automatically on the fly for you.

It won't be that easy, of course. I'm relying on the fbt provider, so that I need to pretty well write the dtrace script by hand for each function I wish to investigate. (There isn't even a consistent naming or calling scheme - you can't just replace ufs by procfs and expect it to work.) But fortunately we have the OpenSolaris source to look at to see what's actually going on underneath the covers.

Tuesday, March 21, 2006

jfsstat

I've just updated jkstat.

The notable new feature in this version is the jfsstat utility:



This uses the new kstats introduced in Nevada recently - the kstats used by fsstat. So you need to be running Nevada build 35 or newer to see these. (ON downloads - you want SXCR build 35 or later.)

The other change that's part of this is that I've started to use JTable to display things. I don't know why I've avoided this in the past - I guess that JTable seemed rather complex, but once I had got the hang of it it turns out to be very easy. I'm also using TableSorter from the Java Swing tutorial, which gives me the ability to sort the columns for free.

Enjoy!

Friday, March 17, 2006

Home Networked

I'm writing this blog entry using my W2100z workstation upstairs in the spare room.

We've spent the last couple of weekends decorating Amanda's room, and I took the opportunity to run some cat-5 cable in the upstairs rooms. The next step was to run cable down the stairs to connect my little ethernet switch to the broadband router, and enable dhcp. Hey presto! It all works.

The girls each have a SunBlade 150. Well, I've got them, and it means I don't have to worry about them getting virus infestations. So a surfing we can go! (And it means that the main computer is free instead of being taken up by someone trying to do their homework half the evening.)

I used an all-in-one kit I got from Maplins. 50m of cable, connectors, boxes, faceplates, tools, and 4 patch leads. A great buy and a real bargain.

Wednesday, March 08, 2006

T2000 problem

I just tried to buy a pair of Sun's T2000 machines, and then discovered (very late) that the system specification doesn't match my requirements.

It's not the system itself that's the problem, but the peripheral connectivity - or rather lack of such.

In particular, the plan was to hook up a couple of 3120 SCSI arrays. Not everything wants a fancy raid array, and FC arrays (and associated HBAs) are pretty pricy anyway. In this case, I'm looking at raw spindles for database access, but also if you look at ZFS it wants plain drives - raid hardware just gets in the way. So the plan was to have 2 SCSI channels and mirror them.

This won't work in a T2000. There's only 1 free PCI-X slot (the internal SAS controller takes the other one), and the only supported SCSI HBA is single channel anyway. So at the present time you simply can't have more than a single SCSI chain on a T2000.

To say that this is annoying is an understatement. It also limits the usefulness of the T2000 for several other projects I have in the pipeline.

Now, I could use FC storage, because the T2000 does have PCI-E slots, and there are PCI-E fibre HBAs, so it would work, but you're looking at a 50% or so increase in cost, which I regard as unacceptable. (The cost differential is particularly bad in small configurations - as you push up it becomes much less pronounced.)

Ho hum. Time to construct a plan B.

Friday, March 03, 2006

OpenSolaris Installation and Packaging

It's here! Dave Miner made the announcement that the Installation and Packaging community, and the SVR4 packaging project are now up and running.

This is important stuff. Being able to install and manage software is a critical component of an operating system, and is something I do a lot of.

And the process - particularly the performance (or lack thereof) of the tools - could stand some improvement. Yes, they work, and they're solid, but with a bit of extra work they could be made better.

Wednesday, March 01, 2006

fsstat in latest OpenSolaris

I've updated my test machine to the latest OpenSolaris build.

I used the bfu archives this time. Normally I wait for it to make it into Solaris Express (SXCR) and jumpstart, but this release has two feature I wanted to investigate, and I want to try them out as soon as I can.

The two features in question are the Java binding to DTrace, and the new fsstat utility and associated kstat support. More on DTrace once I get jkstat updated to integrate with it, but I wanted to mention a little more about fsstat.

Now, I'm just a user, but I've played with fsstat a little and looked at the code, so I've got a little bit of understanding of what it's doing.

On its own, fsstat will give filesystem activity for the various filesystem types:

new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
1.22K 606 63 1.99M 569 10.2M 37.8K 993K 502M 216K 152M ufs
0 0 0 7.73K 0 14.8K 722 8.65K 3.18M 290 6.91K proc
0 0 0 1 0 10 0 0 0 0 0 nfs
0 0 0 0 0 0 0 0 0 0 0 zfs
0 0 0 87.1K 0 0 0 0 0 0 0 lofs
14.6K 2.95K 11.5K 30.8K 37 43.5K 26 32.9K 32.9M 45.9K 34.5M tmpfs
0 0 0 3.05K 0 0 0 32 5.25K 0 0 mntfs
0 0 0 0 0 0 0 0 0 0 0 nfs3
0 0 0 0 0 0 0 0 0 0 0 nfs4
0 0 0 6 0 0 0 0 0 0 0 autofs


If you give it a filesystem as an argument, it will show that:

# fsstat /tmp /var/run /etc/svc/volatile
new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
251 233 2 1.10K 21 719 26 29 34.9K 60 36.7K /tmp
20 0 0 44 8 16.8K 0 0 0 6 67 /var/run
14.5K 2.72K 11.6K 29.8K 8 26.4K 0 32.9K 32.9M 46.0K 34.5M /etc/svc/volatile

and it takes the normal interval and count arguments.

These numbers are all backed by kstats. For example:

# kstat unix:0:vopstats_ufs
module: unix instance: 0
name: vopstats_ufs class: misc
crtime 76.4978722
naccess 559104
naddmap 1280864
nclose 514400
ncmp 5179654
ncreate 1222
ndelmap 1276280
ndispose 1788458
ndump 0
ndumpctl 0
nfid 0
nfrlock 6953
nfsync 1641
ngetattr 1550291
ngetpage 5904478
ngetsecattr 2882
ninactive 30404
nioctl 114402
nlink 18
nlookup 10861279
nmap 1024824
nmkdir 17
nopen 479509
npageio 14
npathconf 70555
npoll 477742
nputpage 39301
nread 1031796
nreaddir 38829
nreadlink 785043
nrealvp 65352
nremove 602
nrename 29
nrmdir 4
nrwlock 1292387
nrwunlock 1293855
nseek 213405
nsetattr 531
nsetfl 179084
nsetsecattr 26
nshrlock 0
nspace 12
nsymlink 16
nvnevent 0
nwrite 223174
read_bytes 530866166
readdir_bytes 18340192
snaptime 9588.8833902
write_bytes 159609739

Now, that's a lot of data about what's going on.

The kstat names aren't spectacularly obvious. They all start with unix:0:vopstats_ with the unique identifier tagged onto the end. This can either be a filesystem type (ufs, lofs, nfs, zfs are some examples) or the id of a filesystem mountpoint. So in /etc/mnttab you might see

/dev/dsk/c0t1d0s0 / ufs rw,intr,...,dev=1d80008

(I've trimmed this for brevity). So the kstat you need to query for statistics on the root filesystem in this case is called unix:0:vopstats_1d80008.

And that's pretty much it. Now to get some more tools to make sense of all this new information!