When Motorola Inc decided to go into the RISC business it did so in its own inimitable way, scorning the basic RISC tenet of simplicity (it stands for Reduced Instruction Set Computer remember) by producing one of the most complex pieces of silicon yet developed, containing everything including six optional kitchen sinks. And to hold […]
When Motorola Inc decided to go into the RISC business it did so in its own inimitable way, scorning the basic RISC tenet of simplicity (it stands for Reduced Instruction Set Computer remember) by producing one of the most complex pieces of silicon yet developed, containing everything including six optional kitchen sinks. And to hold the cache-memory management system that it could not fit on the processor, Motorola developed two separate chips, each containing 750,000 transistors. The 88100 processor runs at 20MHz and is rated at 17 MIPS while the twin 88200 cachememory management chips are used to implement a full four-port Harvard architecture: there are completely separate input-output units for data and for instructions, and each has a full 32-bit data path for both addresses and for data.
Brute strength Earlier RISC processors were designed in parallel with their compilers, with the two optimised for each other. Changes in one caused changes in the other. A lot of their performance came from the tight coupling of the compiler and processor, with the processor executing the code from the compiler very efficiently. For example, instructions were rearranged to keep the pipeline busy during the several cycles it took for data to arrive from memory for an earlier instruction. The Motorola port, however, gets its performance from brute strength and complex techniques taken from supercomputers. Apart from including those instructions used most frequently by the C compiler in the 88100’s instruction set, compiler design played no part in the design of the 88000. The heart of the chip is a high-speed register file containing 32 32-bit registers. A data unit and an instruction unit on the chip transfer data and instructions between the register file and the separate off-chip caches. The designers were able to get away with such a small register file by borrowing a feature from Control Data’s Cyberplus supercomputer: scoreboarding. (Control Data developed the technique to set up the crossbar interconnecting 14 processors, five floating point units, and various memories – and to initiate an operation on each of the units every machine cycle. It also helps to control the interprocess communications between the 64 such groups, each rated at 640 MIPS and 98 MFLOPS, and the Cyber 170/800 host that go to make up the Cyberplus supercomputer). The 88100 uses a simpler version of the scoreboard. The built in hard wired circuitry has two functions: keeping the processing units busy and speeding up context switching. The scoreboard keeps track of all the register states and triggers the execution of operations on the various function units whenever all the necessary information for an operation arrives. The scoreboard also maintains a complete set of central-processing-unit registers for each subroutine. This means that when the computer switches tasks, it doesn’t have to save all the registers and reload them from memory – it simply stops using one set of registers and begins using another, all in one clock cycle. The register file is connected to the concurrently operating functional units by an on-chip silicon bus. These functional units, which are effectively separate processors running in parallel, have their own address on the bus and can directly access the data and instruction caches as well as the register file. The first unit performs all integer and bit-field operations, while the second is a floating-point unit with a separate adder and multiplier that work in parallel.
Up to six further special function units, each with 255 reserved operation codes can be added to the silicon bus. As the whole chip is silicon-compiled, new units can be defined and implemented by recompiling the chip. The reserved op-codes will ensure that binary compatibility is maintained. Users can design a special-function unit themselves by defining the unit and the instructions for Motorola to implement in hardware. But first they can emulate it in software to try it out, because if a program uses a reserved op-code not yet implemented in hardware
, the chip traps it to software control. With the 68000 family, Motorola was plagued by the memory management unit. Other manufacturers such as Weitek seemed to make their incompatible memory managers available before Motorola, and they seemed to work more efficiently. Even when it was integrated with the processor, a significant number of users continued to use a separate memory manager. So this time, Motorola has tightly integrated the memory management chips with the processor right from the start, hoping to shut out the competition. The 88200 contains 16Kb of fast static random access memory cache per chip, plus a complex memory bus control section. A bank of up to four cache memory management chips can be connected to both the data and instruction ports to increase the size of the cache. The limit is imposed by the CMOS/TTL circuitry, and when it is implemented in other technologies the number can be increased, theoret-ically to 128. The cache is four-way-set-associative, which means that when the logical address is fed into the cache, it returns four possible values on a hit. In parallel, the memory manager translates the logical address to the physical address and uses it to pick one of the four candidates for the cache.
Steamroller A separate processor on the 88200, the memory bus controller, takes care of cache coherently by watching the bus to keep track of what changes are made to locations in the memory and the different caches – if the data in the memory is changed, the copy of the old value in the cache must be invalidated. Motorola claims that up to 11 operations can be performed concurrently in the three-chip set: two cache subystems will perform cache coherency control, check for cache hits and misses, and do address translation. At the same time on the processor, the data unit and instruction unit will store data in the register file, the integer unit will execute an instruction, and the floating point unit will perform two operations, addition and multiplication. And when several processors, each with six special function units operating concurrently and connected to two banks of cache/memory management chips are working, the number of concurrent operations rises dramatically. Already, Motorola and others are offering modules with several processors and cache/memory manager chips in various configurations, including a fault-tolerant one in which every chip is paired with a checker, so that a pair can be disconnected from the rest if they don’t agree. Motorola is putting a lot of muscle behind the 88000 for rapid development – for example signing Unisoft to do the Unix port, AT&T for an applications binary interface, Data General for an ECL version to follow the 25MHz version, half the world to join the 88open consortium. It may not dominate the RISC market in the way the 68000 family has ruled the workstation world, but the Motorola steamroller, plus the 88000’s undoubted attractions, such as support for multiprocessing, will guarantee it a significant part of the market.