Taking Remote Control

September 2006

Remote IT infrastructure management is not merely a pipe dream. In this article produced by CBR in association with Raritan, we look at the challenges of IT data centre management and the latest technologies that can help companies get nearer to that Holy Grail of 'lights out' data centre management.

For some, administering IT systems is harder than for others. Upstream Petroleum, for instance, which designs, mounts, drills, and manages oil rigs throughout the Asia-Pacific region, used to have to spend $8,000 on a helicopter just to get its engineers out to its rigs when it needed to administer its IT infrastructure.

But even for those whose data centres are just down the hall, there are often significant challenges in managing the IT infrastructure, and data centres in particular, which is why the idea of the 'lights out' data centre is so appealing.

Different names, same goal?
Over the years, vendors have come up with lots of different names for this concept. IBM called it 'autonomic computing', for example, while Sun Microsystems preferred 'preventative services.' But the truly 'lights out' environment is still in relatively short supply.

IBM's research indicates that about 42% of companies still have completely manual data centres with a further 30% operating what it calls 'rudimentary' managed environments: so-called 'dim data centres' rather than actually 'dark'.

"There are a number of problems that organisations face," says Andrew Gibson, UK country manager for Raritan. "Data centres have a wide range of equipment with all sorts of things that require some degree of interaction. This leads to a large amount of foot traffic with people in and out all day. It can also lead to problems with change control and change management."

The statistics speak for themselves. Some analysts estimate that up to 40% of outages in data centres result from operator errors, while between 25% and 50% of data centre operators' time is spent doing problem determination and resolution.

"The feeling is growing that it's better to keep people out," Gibson says. "The drive is towards darkness. There is a whole list of issues that need to be addressed. Why were all those people going in to the data centre in the first place? Were they all needed? What were they all doing? Were they all qualified? Did they all follow procedure? If you don't put the right control mechanisms in place, it's hard to audit.

"For example, most pieces of equipment still have a hard reset," Gibson continues. "One of the reasons that people go into computer rooms is to use the hard reset button, but are they sure when they go in to reset a piece of hardware that it's the right piece that they're resetting? There can be misunderstandings, there can be basic mistakes," he says.

Heterogeneity rules, OK?
While IBM, Microsoft, Oracle et al would implore customers to opt for the one-stop shop and only source their technology from a single vendor, the reality is that most IT shops are a mish-mash of technologies, all of which need to be addressed, managed and supported on a day-to-day basis.

Technologies that are able to keep logs of errors and other alerts have been around for some time, and they certainly have their uses. But as data centre complexity has grown and human resources is shorter and shorter supply, simply logging alerts is no longer enough for many companies.

"We can now provide technology and management systems that can do more than log and audit," says Gibson. "With compliance and the need for audit trails firmly on the corporate agenda, there is a need for secure systems that validate individuals so that you can catalogue who it was that switched off a particular machine. Previously you would be able to audit down to the door-entry system, but when the machine went down you would still have had five or six people in the room and be unable to tell who switched it off."

People can also be in short supply. The seemingly perpetual skills shortage in the IT industry means that trained systems administrators can be hard to come by, which is not a healthy position to be in when server farms are getting larger and larger. On-demand computing infrastructures are only likely to increase this problem.

It would be so much better if the data centre could run itself. But that requires hardware, software and networks that can run themselves to a large degree, and can effectively mimic the self-healing and self-managing capabilities found in the human body. That would enable lights-out operations of data centres, in which systems identify, diagnose and correct problems with little or no human intervention. What human involvement is necessary could be handled remotely.

Chasing the 'lights out' Holy Grail
So has the lights out Holy Grail diverted attention from a more attainable immediate goal? "It's less a dark data centre issue and more an access and control point of view," says Paul Leonard, data centre marketing manager at Sun Microsystems. "We offer remote monitoring and access tools that enable organisations to keep the people content of data centres to a minimum. For most customers, regardless of their business, there are common elements to the way their data centres operate. Those are the elements that you try to automate.

"You don't try to get too involved in a customer's specific business processes. You try to align the IT functionality and data centre operations with the business goals. There's always the question of how much control the customer needs or wishes to keep over the running of the data centre," Leonard adds.

"As a rule, all organisations are under pressure to keep their costs under control," Leonard continues, "so there is now a greater willingness to try to implement technology that can solve a problem rather than throw another person at it. If you're implementing technology, then it implies that you have a proper game plan, whereas throwing a person at it is more of a reactive, fire-brigade approach. As a result, there are many new technologies that are coming to the fore."

"I don't think there are that many engineers who want to go to Dartford at 2am to fix a fault that could be repaired remotely from their bedside," agrees Hugh Jenkins, enterprise marketing manager for Dell UK. "That said, a lights out data centre isn't an entirely dehumanised zone. People do have to go in occasionally to carry out certain tasks, but you can manage the bulk of applications and operations remotely."

If the IT infrastructure that is being administered is located hundreds of miles out to sea, the benefit of being able to administer the all-important IT assets without actually setting foot in the data centre is obvious. Upstream Petroleum was understandably frustrated with helicopter journeys to remote rigs and even moving ships where its systems were located, when its systems required some sort of administration.

But by installing Raritan's CommandCenter NOC on its network, it was able to locate and inventory the company's desktops, servers, network switches, printers, routers and other equipment.

Science of the appliance
Raritan says its CommandCenter NOC (or CC-NOC) is a network, systems and applications monitoring and management appliance. It features a dashboard that can be accessed using virtually any Internet browser, and provide fault and performance data and alerts. It can identify and inventory hardware and software assets, as well as monitor for security vulnerabilities and intrusions.

Using the appliance Upstream Petroleum found more than 250 devices -- it says devices on ships and oil rigs appeared on CC- NOC's dashboard just as quickly as those down the hall. Upstream now has a centralized, real-time view of the company's IT operations and health, and as a result says it is eliminating expensive site visits and costly equipment downtime.

Raritan, which describes itself as a "leading provider of solutions to simplify IT operations", recently added a number of enhancements to its CC- NOC product for remote monitoring and management of IT assets, and also launched a new model in its CommandCenter Secure Gateway (CC-SG) line.

The CC-SG is a management appliance that provides unified, secure access to Keyboard Video Mouse (KVM), serial, and power ports of data centre devices via a web browser. It is said to provide centralized policy and security management of users and devices connected to Raritan's products, other embedded solutions like HP iLO/RILOE, IPMI, and in-band software solutions.

With the latest version of its CC-NOC, meanwhile, Raritan says it integrates with CC-SG. What this offers is single sign on and click through via embedded links for mission-critical in- and out-of-band access; improving the incident management process and reducing mean time to recovery.

The updated CC-NOC also offers historical and real-time equipment inventory reports, and the ability to configure performance thresholds at the device level, as well as by category basis. Organizations can also set threshold limits that automatically trigger with changing business conditions, Raritan says, while another nifty new feature is the ability to do on-demand discovery of a single IP address.

Harmonious Integration?
Raritan says that the latest integration enhancements to the two product lines means both appliance families are able to work together seamlessly. So for example a company that uses Raritan's CC- NOC appliance to monitor the health and security of its IT equipment, can use CC-SG to expedite the repair of a server or device.

"We offer customers simple, cost-effective and secure tools to increase service availability and performance, improve the productivity of their staffs and reduce security risks," explains Sev Onyshkevych, Raritan's VP of marketing. "IT organizations can now pinpoint potential problems before they have an impact, and fix them more quickly, easily and cost effectively using a single integrated solution. Our management appliances -- whether standing alone or working together -- help customers identify, diagnose and resolve problems rapidly, such as failed e-mail systems, server outages and security breaches."

An e-mail alert notification sent from a CC- NOC includes an embedded link, allowing an administrator to access the device in distress in a single click -- greatly improving the mean time to recovery. With a single sign-on and integrated authentication and authorization, users are able to move between their management systems and gain the information needed to resolve incidents more quickly.

In addition, should users need to reboot servers to get services back online quickly, they can use CC-SG's built-in support for Intelligent Platform Management Interface (IPMI) -- an industry standard which is embedded in servers by major manufacturers.

To improve security and provide audit trails for compliance reporting, Raritan says it has added in-band network access to CC-SG. From CC-SG's centralized management dashboard, users can use traditional in-band network-access tools - including Windows Remote Desktop Protocol (RDP), Windows Terminal Server, Secure Shell (SSH) and Virtual Network Computing (VNC) -- should the operating system be up and running.

If users need BIOS-level access for mission-critical troubleshooting and remediation, they can, from the same dashboard, connect via out-of-band direct access to the device through KVM- and serial-over-IP-based networks, according to the company.

But while technologies like these are increasingly sophisticated and offer more and more remote monitoring and administration capabilities, not everyone has seen the light. "The main inhibitors are usually internal," Raritan's Gibson concludes. "No-one that we've been talking to disagrees with the thrust of what we're saying. The question is whether you're prepared to spend the money and put in the time on the processes and procedures. The securest data centre is dark. No-one disagrees with that."

CBR Opinion
The data centre that can run itself has long been the dream of end users. And while it is unlikely that we will ever get to the stage where no one need ever set foot in the data centre, the latest remote auditing, monitoring and management appliances will reduce the number of direct user interventions required and indeed could also reduce the mean time to recovery -- which will come as welcome news to those with data centres in the middle of nowhere, as well as those whose are just down the hall.

Download more information in a white paper by registering your details by clicking here.

For further information on Raritan's CC-NOC and CC-SG, please visit www.raritan.co.uk/uptime.