If all had gone according to plan, IBM’s RIOS workstation, the successor to the RT, would have been released last Tuesday. Instead, all we have is an IBM White Paper on the technology at the heart of the new machine – IBM’s second generation RISC processor architecture. John Abbott has the details. The 801 minicomputer […]
If all had gone according to plan, IBM’s RIOS workstation, the successor to the RT, would have been released last Tuesday. Instead, all we have is an IBM White Paper on the technology at the heart of the new machine – IBM’s second generation RISC processor architecture. John Abbott has the details.
The 801 minicomputer project at IBM Research in Yorktown Heights, New York during the late 1970s and early 1980s is credited with pioneering many of the architectural concepts that have since become popular in the explosion of Reduced Instruction Set Computing processors now on the market. But the 801 minicomputer never appeared, and IBM’s first RISC computer turned out to be the RT workstation in 1985 – not the company’s most successful product. But that same year, the original 801 development team turned its hand to designing a second generation RISC processor now set to appear in the replacement RT workstations due out early next year. Codenamed America, the project was transferred over to IBM’s Advanced Workstation Division in 1986, and has since had additional inputs from IBM’s Burlington and Toronto development labs.
Although IBM pulled back the introduction of its workstation from the original October 16 launch date until next year, it has gone ahead and published a technical white paper describing the architecture of the new chip. IBM admits that some features on the new chip are common to earlier RISC processors – it uses a register-orientated instruction set, hardwired CPU and pipedlined implementation – but goes on to point out the new features, such as separate instruction and data caches, zero cycle branches, multiple instruction dispatch, and simultaneous execution of fixed and floating point instructions. And while it is careful not to discuss actual chip performance, IBM does say that it has reduced the sustained instruction execution rate to achieve close to one instruction per cycle. To do this the chip has to be capable of a peak execution rate of more than one per cycle – and IBM claims its second generation RISC is capable of executing up to five instructions per cycle, namely, a branch, condition register, a fixed point and two floating point instructions. In order to achieve the high levels of concurrency needed for this, IBM has taken a rather different course from the more fashionable, highly integrated approach of the merchant microprocessor manufacturers such as Intel and Motorola. IBM’s architecture is actually a complex of nine semi-custom chips, including three independent functional units: a combined branch processor and instruction cache unit, a fixed point processor, and a full 64-bit floating point processor. The central electronics complex of the chip also includes four data cache units, a storage control unit, an inputoutput interface unit, and clock chip. The three main units were designed to work together with maximum concurrency and have the instruction set divided amongst them. The branch processor takes the incoming instruction stream from its integrated instruction cache and provides a steady instruction flow to the fixed point and floating point processors. It is the branch processor, which includes six specialised registers, that minimises the delays normally associated with branching instructions by processing in advance all interrupts, branch and condition register instructions, allowing the cycles required for handling branches to be completely overlapped. This results in zero cycle branching for large sequences of meaningful code, according to IBM. The FX, or fixed point processor, has 32 general purpose registers and five special registers to support all of the fixed point arithmetic and logical operations, as well as all of the data reference instructions, and the FP or floating point processor has 32 64-bit floating point registers and a floating point status and control register. The four way set associative 64Kb data cache is divided into the four identical DCU chips of 16Kb each, and IBM has implemented cache reload and store back buffers to boost perfor
mance beyond that of simpler cache implementations. Communication between the three CPU units, main memory and input-output is arbitrated by the storage control unit, to which it is directly hooked via the so-called P-Bus. A separate System Input-Output bus interfaces to the input-output unit, which has an input-output channel controller that generates an enhanced Micro Channel interface, speeding up the transfer of data between system memory and adaptors on the Micro Channel bus that – as widely reported – will be an integral part of the new workstation. The channel controller supports both DMA bus masters and slaves, and will allow the Micro Channel to operate in streaming data mode, which is claimed to double the performance for large data bursts.
Enhanced Micro Channel
The enhanced Micro Channel will be fully compatible with current Micro Channel implementations, enabling standard MCA adaptors to be attached to the enhanced Micro Channel and vice versa. For virtual memory, IBM has extended the approach found in the 801 and RT, providing for a 4 Petabyte virtual address space and 4 Gigabyte real address space in 4Kb pages. The chip’s Special Segment architecture is also an extension of that found in the 801. More revolutionary is the cache architecture, which includes the concept of instruction and data caches that are visible to the software, a factor said both to simplify the cache implementation and increase parallelism between the branch and fixed point processors, as well as between input-output devices and these processors. According to IBM, the design goal for the new 1 micron CMOS chip, was to design a high performance, balanced machine that avoids bottlenecks in the CPU, caches, memory interfaces and input-output subsystem. But IBM was also looking for a product that could produce a family of processors with varying cost and performance. Hence the design will allow a range of configurations, without sacrificing overall yield. The low-end model requires only one memory card, while the high-end unit needs a minimum of two, supporting either 1Mb or 4Mb DRAMs. But while the technology sounds exciting or innovative, the impact of the workstations in which it will appear depends very much on how soon they can be brought to market. With MIPS Computer Systems Inc threatening to unleash a 60 MIPS machine in the near future, and DEC and Sun Microsystems battling it out in the marketplace today, a launch by IBM in the first quarter next year, with deliveries not till the second quarter might be a little too late.