Wednesday, November 22, 2006

T2000 vs bison

This wasn't a benchmark, or even meant as a test, but it was an interesting number nonetheless.

I installed bison today. As I have 3 different families of sytems (Sparc Solaris 8; Sparc Solaris 10; x86 Solaris 10) I have 3 copies of /usr/local, so build on 3 different systems. Anyway, after building bison (and the T2000 allows a very parallel make that some builds can take advantage of) I ran a make check.

The results were quite interesting. My sub $1000 Opteron 146 system took 107s. An old V880 (750MHz US-III) took 529s. The T2000 took a whopping 794s. This really isn't looking good.

7 comments:

Anonymous said...

Bison is single-threaded. I would have expected the results you got. Run a script that spawns off 1,000 jobs in parallel, each of which runs a dozen bison invocations. Capture the clock on the wall time from start to finish, and you'll see why folks buy T2000's - for the throughput. I suspect you knew this, but I know it's difficult for us to distance ourselves from the standard single-threaded benchmark, going way back to those Dhrystone and LINPACK days.

Giri Mandalika said...

By any chance did you measure the CPU utilization during the test run?

You should always compare various things like clock speed, CPU% along with the elapsed time.

I have no experience with Bison. But from the other comment, I'm assuming that it is a single threaded application. In that case the CPU utilization would have been ~[100/(#cores * 4)]%.

Unknown said...

What happens if you run "make -j 32 check"?

Unknown said...

Or to be clear, "/usr/sfw/bin/gmake -j 32 check".

gisburn said...

Is it possible that "bison" uses floating-point math for any math (for example to pass numeric parser results around) ?

Even small amounts of floating-point math can bring Niagara1 machines down, for example running the ksh93 script http://svn.genunix.org/repos/on/branches/ksh93/gisburn/prototype004/usr/src/lib/libshell/common/fun/mandelbrotset1.ksh with many worker jobs (>= 2000) can bring a T2000 box to a grinding halt (mailly behcause ksh93 represents math always as |long double| internally).

Peter Tribble said...

Following up on the comments in order:

Yes, this was single-threaded. But serving more customers isn't any good if each customer gets significantly worse response. And, to be honest, while the throughput on this box is pretty good, it's not exceptional. And the single-threaded performance is not simply unremarkable - it's dire.

While CPU utilization may be interesting from a benchmarking perspective, what users and customers are interetsetd in is how long it takes to get the answer back.

As for running it in parallel, that makes no difference. The check is essentially serial.

And yes, I suspect there must be some floating point in there. (I have heard also that perl does pretty badly for the same reason.) Certainly the scaling here is worse than most things I've tested.

Anonymous said...

I hear a diesel train at this very moment a few miles away from me. When I drive my car alongside these trains, I am always going faster than them, even at a lowly 72 kilometers per hour.
If I left the transportation of material goods to you, you'd pick a fleet of Porsches to handle your country's coal and lumber shipping, simply because you got big-eyed over (and blogged about) the 4X kmph single-threaded performance of a sports car vs. that locomotive engine. Which financial officer pays more to get their goods moved in unit time across the country, yours or the train company? Sure, laugh at this example, but ask yourself what the diesel train equivalent is in our industry, and be sure to consider power and cooling costs, and also floor space in your thinking.

While your hotshot Woodcrest/Opteron/whatever chip is spinning *wasted* hot cycles waiting for a memory fetch, the T2000 is happily giving the cycles to one of the threads, in one of the cores, that actually has some work to do. This is why I was wondering what your aggregate throughput might look like, I don't think it's quite a slam-dunk dismissal that just because it behaves poorly in a single-threaded invocation, that it will also fail miserably in a throughput test.

Peter, I've been reading your stuff for a long time and enjoy your material. If bison is indeed greater than 1% floating point, I'd hope you would point this out and mention that this particular parser-generator might be out of range right now for the T2000, and is not the right tool for the job. If this were true, I'd go further and wonder (a) what interesting use of FP operations is bison doing, and (b) is there any way to avoid them.

Keep it coming!