Full hardware redundant fault-tolerant systems have somewhat gone out of fashion nowadays. They are expensive and difficult to build, and only Tandem Computers Inc and Stratus Computer Inc ever made a success out of selling them. Now, Boxborough, Massachusetts-based Marathon Technologies Corp reckons it has found a new approach to hardware fault-tolerance using commodity parts […]
Full hardware redundant fault-tolerant systems have somewhat gone out of fashion nowadays. They are expensive and difficult to build, and only Tandem Computers Inc and Stratus Computer Inc ever made a success out of selling them. Now, Boxborough, Massachusetts-based Marathon Technologies Corp reckons it has found a new approach to hardware fault-tolerance using commodity parts that could change things. Endurance 4000, sold as a kit to resellers and OEM customers, enables any four standard Pentium or Pentium Pro systems to be linked together to form a single, fault-tolerant server with no single point of failure or repair, so that failures are transparent to the user. User applications, and the Windows NT operating system Marathon uses, need no modification. And, using Marathon’s SplitSite facility, the redundant halves of the systems can be located remotely, up to a mile away from each other. The Endurance 4000 kit includes software, PCI cards and cabling, and can also be used to add fault tolerance to an existing cluster through Marathon’s ClusterPlusFT technology, which can be made to work with Microsoft’s WolfPack or other clustering software. It’s out now, with complete kits starting at $25,000.
Compute elements and input-output processors
So how does it work? The reseller or OEM customer takes four personal computers and ties them together using the Marathon cabling and software. Two are used as compute elements, and need no disks or input-output devices, mouse, screen or keyboard. Each compute element runs a copy of Windows NT in parallel, fully synchronized. This gives the system fully fault-tolerant CPU and memory. The other two units act as input-output processors. Below the level of Windows NT, Marathon software replicates the input- output subsystem, mirrors disks, adds redundant Ethernet connections, and makes everything seem like a single server. The four systems become the equivalent of a single, though fully fault-tolerant, system. It is possible to use the input-output processors for other applications as well, but software running in this way won’t get the benefit of the fault-tolerance. Marathon admits its PCI cards and cabling, which act as a high speed, 25Mb/sec backbone with data integrity, are proprietary. But Joost Verbofstad, the company’s vice president of business development, says that it uses commodity parts and is therefore very cheap, unlike alternatives such as Asynchronous Transfer Mode, Gigabit Ethernet or Tandem’s ServerNet. Marathon intends to work mostly through channels, selling its kit to others who put the actual systems together. Early takers include Perot Systems Inc, General Automation Inc and, as an OEM customer, Intergraph Corp. Others are on the way, says Verbofstad. Endurance 4000 currently supports single processor systems, though work is underway to support symmetrical multiprocessors, which will improve price-performance. Marathon was formed by a number of refugees from Digital Equipment Corp’s VaxFT fault-tolerant business when it was closed down, including company founder Bob Glorioso.