JBQ's spot on the Wild Wild Web
The musings of a French mathematician living in the heart of the American technology industry

Benchmarks return - with a vengeance

I had lots of fun reading the thread on OSNews about the article I wrote this morning. And greetings to the people who took to time to e-mail me so politely!

Let's see. Let's imagine that computers are cars. In the real-world, I drive a Camaro Z28. For the Europeans out there, it's a 300 hp nice little monster, 4.9 meters long, 1.9 meters wide, low, cheap, tires 25cm wide, and fast (0-100kph in 5.5 seconds). Mine is bright red. No kidding. It's also dirty and dusty, because I'm too lazy to wash it.

Computer benchmarks are like watching speed specs for a car. 0-60 in 5.1s - 60-0 in 114ft. 0.87 lateral g. 311 fps at Quake. Gaussian blur in 2.64s. MP3 encoding at 37.2 times real-time. But in reality, who cares? It's fast enough!

Comparing computer benchmarks is like comparing specs of cars. If I want to buy a car to go run on Laguna Seca (for the Europeans out there, a world-class race track about 100 miles south of San Francisco - you should play more Gran Turismo), I'll compare the specs. A Camaro that does 5.1/114/0.87 is like to get better lap times than a Metro that does 11.5/153/0.73. A computer that scores 900 CINT2000 is likely to be faster than one that scores 300. I know, my home machine barely reaches 200, my work one hops around 650, and I wouldn't want to compile on my home machine. But in the real world, it doesn't matter. My dual-450 is fast enough to run Outlook and IE. Sometimes I run Access to keep track of the current PS2 games. It can sort the whole 300 records of the database faster than I can release the mouse button (the left mouse button, that is). Heck, even Warcraft III flies.

So, the G4 benchmark was using gcc and it could have used Metrowerks? Go fill up your Metro with premium gas and we'll run that quarter-mile again.

So, a G4 uses 10W but a P4 uses 40W? Yeah. And a pair of 21" monitors use 300W. If you want to conserve electricity, buy LCDs. Your Metro gets 30mpg but my Camaro only gets 21? Well, I'll put in an extra teaspoon of gas before we start that quarter-mile, but I'll still toast you.

So, MacOSX has a better UI? Yeah, well, you like it better and it's fine but I like Windows better. My Camaro has all the controls on a single stalk? I love that, I have a manual gearbox, I can leave my right hand on the stick at all times and still use my turn signals and cruise control without having to move either hand.

Now, seriously.

SPEC isn't a real-world benchmark. Well Doh, obviously it isn't. Want someone to go get the usage patterns of 100 million users to see what real-world is? Guess the winner is the one who optimizes MineSweeper better. Or the screen saver. Or the JPEG decoder because people watch porn. There's no such thing as a real-world benchmark.

As a developer, I find SPEC interesting, because it shows me what speed I should expect when I port an application from Windows to MacOS. Because I wrote my application for Windows first, because I have a market that was more than one order of magnitude bigger, and I didn't have to order a detailed and expensive market study to know that. Obviously, I won't spend time optimizing for MacOS. Because it's not worth it. MacOS only has a few percent of the market. My sales there are going to be disappointing, no matter what. I'll be lucky if I even break even. I'll do the smallest possible amount of work to do the port, but I won't spend a penny optimizing. Either gcc gets the job done well, or the application will be slow. There's no middle ground. (And, no, I didn't actually write any application for Windows, I spend enough time at work writing code and dealing with code, I don't want to do that at home as well, Eugenia would become entirely mad at me).

As a developer, I find SPEC interesting, because it tests the compiler/CPU combination. Not just the compiler, not just the CPU, both of them. How fast a program goes when you compile it and you run it. If someone writes me an incredibly fast C++ compiler for Z80, I'm not gonna start writing a ray-tracer. If I have a CPU with an amazing peak power but my compiler only gets me 1% of that peak power, I'm not gonna re-write my ray-tracer in assembly for an obscure CPU that is only used by a few hobbyists who don't buy software. If I was a ray-tracer developer, with thousands of lines of C++ code, I'd go check the CFP2000 scores for my target. I'd probably even run the test myself if there are no published results. I'd pay special attention to the results of test 177 (mesa - a 3D graphics library for the few people who never heard of it). And I'd know if I should expect my MacOS version to be twice faster or twice slower than my Wintel version.

Is SPEC fair to AMD? No, it definitely isn't. SPEC assumes that you can recompile your applications with specific compiler flags? How do I recompile Office for Pentium4? Hell, how do I recompile my compiler for Pentium4 (because my compiler is the one I wait for most often when I'm at work). Answer: you can't, you have to deal with the fact that you have binaries that aren't optimized for the latest and greatest CPU. And in that case, the Athlon deserves its 2200+ rating. Or, the Athlon that's in the machine that I had my company buy for me deserves its 1800+ rating, and that's why I don't care about Pentium4 optimizations, because I have an AthlonXP. And the real reason why I have the fastest machine on the block is because I didn't let IT install their real-time virus scanner on my machine.

What does that all mean? That I'm a sucky writer and that you can make benchmarks say anything you want? No! Uh, actually, yes!... What this really means is that no benchmark is fair, no matter how hard you try. I had a lot of fun bashing AMD's and Apple's marketing people, because I hate what they say to me, I hate the way they try to confuse me. Your CPUs aren't clocked as high as your competitors? Why do you constantly remind me? Why is this an Intel 2200 and that an AMD 2200+? What's with the plus? Oh, you mean, it's not really a 2200, but actually a 1800? You tell me it's 100% compatible, but how do I know that it's not actually 82% compatible? Ah, yes, your 82% compatibility works as fine as their 100% compatibility. Seriously, AMD, you put floor salesmen in very difficult situations.

Oh, and my Camaro has a medium-dark-gray interior. Sorry, it wasn't available in transparent blue.

Aren't you confused yet? Well, I guess I'll have to try harder next time!

Gosh I'm getting tired, and I should wait until tomorrow to put this article online. I'll put it online now, and I'll let people laugh at my expense. And I'll have deserved it.

Home page Related articles Posted on Jul 23 2002