Gauging the performance and price/performance of servers using industry-developed, yet open, benchmarks is such an accepted part of the server comparison and acquisition process that many of us take such open benchmarks for granted. Now attempts to precompile certain of the existing benchmarking tests is driving the development of more complex randomized benchmarks…
There is a cat and mouse game going on between server makers and benchmark administrators, the former always trying to stretch the limits and the latter trying to rein in vendors, while adapting to changes in technology.
Benchmarks are in many ways a reflection of the data processing needs of customers – expressed in the broadest of terms, of course – and it is probably safe to say that competitive benchmarks, like the suite of benchmarks developed by the non-profit industry consortium Transaction Processing Council, which rolled out its TPC-A benchmark in late 1989, have driven various processor, storage, operating system, and database technologies forward as much as real-world customer requirements.
If benchmarks reflect and drive server technology, they also give clever vendors – and they are all clever, by the way – an opportunity to game the benchmark tests. This is why benchmarks are tweaked from time to time.
Some of the most egregious gaming on the popular TPC-C online transaction processing benchmark test surrounds not the performance of the machines but the pricing that vendors use on the gear used in the system under test. In the early years of the TPC tests, vendors were allowed to simulate end user transactions using programs running on a server, but vendors had to add in the cost of the physical terminals that real users would have to sit at if they were processing the transactions. It didn’t take long before vendors created special low-cost terminals that were surprisingly inexpensive.
In recent years, vendors have given large systems discounts, ranging as high as 40-50%. Such discounts, which are not what typical users can command, except in the most competitive situations, obviously have a dramatically good – and certainly unrealistic – effect on the price/performance metrics that are published in the official TPC-C results.
The TPC council, which is made of suppliers of servers, operating systems, and databases, has cracked down on the pricing shenanigans in recent months, says Mike Mulloy, the current chairman of the TPC, who sits on the board from server maker Dell. The members of the TPC have agreed that vendors must show the discounts on individual system components in benchmarks like the TPC-C test, and they must price components at the single purchase price.
Moreover, vendors have to be more explicit about whether this is channel or vendor pricing, since channel prices can be substantially different from direct vendor pricing. Mr Mulloy recently told ComputerWire that the new pricing specs for the TPC tests were supposed to be ratified by the vendors sometime in the summer – the IT summer runs until October 31 – and to take effect by the end of the year. This new pricing scheme will standardize pricing across the full suite of TPC tests, which have minor differences in the way that they price such features as maintenance or deal with discounts.
Mr Mulloy also says that the TPC will be putting out a revamped version of its TPC-W Web transaction processing benchmark, which is currently being tweaked and is expected to be ratified by the end of the year. The original TPC-W test showed the performance of a database clustered to Web application servers; with Version 2 of the TPC-W test, TPC is only measuring the performance of the Web application server. This mirrors what the SPECjAppServer benchmarks from the Standard Performance Evaluation do.
Ironically, many of these SPEC Java benchmarks are loosely based on the transaction workload behind the TPC-C test, even though the two testing organizations are not related in any way except for philosophy.
The new TPC-W test includes the ability to add in Web caching servers and load balancers, which were fairly new when the TPC-W test was introduced, four years ago. The TPC-W test also introduces XML as a programming language and makes use of a different simulated e-commerce site that looks a bit like Amazon.com. Companies will be able to do a Java or .NET implementation of the TPC-W test. In the TPC-W V1 spec, the Web and database interaction was done with custom TPC code, since the test predated .NET and the widespread commercialization of Java.
In addition to these tweaks, the TPC members are working on two new benchmarks. One is called TPC-E, and it is a new online transaction processing benchmark that will simulate the transaction processing associated with online stock trading, which has rigorous transaction processing and security requirements. The initial TPC-E spec has been created by IBM, Microsoft and Sun Microsystems, and Hewlett-Packard has just joined the TPC-E committee. Unlike the TPC-C test, the ratio of users to database size is not locked in the TPC-E test.
Rather, the database scaling has been set up in the TPC-E test so that a database has to scale as more processors are added to the central database server. The disk requirements per user for the TPC-C, which seemed reasonable back in 1992, when that test was first put in the field, have in recent years meant that very powerful servers have had to have tens of terabytes of disk storage, which is impractical, compared with real-world practice.
Mr Mulloy says that the TPC wants to cut the disk requirements in half for the TPC-E test. On the TPC-C test right now, disk storage typically accounts for around half of the cost of the whole TPC-C setup, and this obviously skews the price/performance metrics and the overall cost of a system under test. The TPC-E spec is in a preliminary stage right now and is not expected to be ratified until some time in 2005.
In the decision support area, several years ago the TPC-D spec was split in two, yielding the TPC-H and TPC-R specs, because of a schism between TPC member companies over how to implement the test. Some TPC-D players were precompiling certain aspects of the tests, which let them boost their query performance. However, the main point of decision support is to do ad hoc queries, which the TPC-H test did. The TPC-R test, which allows such shenanigans, has been shunned by the industry.
The TPC members want to have a single decision support benchmark that they will all use, so they are now working on the new TPC-DS test. Mr Mulloy will not say exactly what this test will be, except that, unlike the TPC-D, TPC-H, and TPC-R tests, which had a fixed set of transactions, the TPC-DS test will have a randomized set of transactions. Whether this will prevent the kind of precompiling games that torpedoed the TPC-D and TPC-R tests is not clear. Mr Mulloy says that the TPC-DS test could be ratified in late 2005 and put into production in early 2006.