Break Free From Storage Chaos

SAN (storage area network)-based server consolidation helped the U.S. Air Force 45th Space Wing slash backup times by 83% and give users 600% more storage.

How scary are the details in this storage management picture? More than 80 servers. Not scary at all, right? Managed by a nine-member IT staff. Challenging, but not impossible, you say?

OK, let's add a few more details. All servers relying on direct attached storage and most backing up to direct attached tape drives. That seems cumbersome, you're now thinking. Spread across 12 physically separate facilities. Hmm, that's less than one IT person per site. With the furthest remote operations site located 5,000 miles from headquarters. Ouch.

Here's a scarier thought: How many of those details are eerily similar to the storage management picture at your organization? The explosion of servers? The direct attached storage? The decentralized infrastructure? All of the above?

Unfortunately, the scenario outlined above is a real one - or was, anyway. It recently existed at the U.S. Air Force 45th Space Wing (the 45th), which operates the U.S. Air Force Eastern Range, used for unmanned rocket launches. Fortunately, except for the staff size and the distances between sites, the details of the storage management picture at the 45th have all changed for the better. The number of servers has been reduced, the direct attached devices are mostly gone, and the data centers are fewer in number.

How did the 45th gain management control of its server and storage infrastructure? It committed to a server consolidation project and, along the way, migrated storage resources to SANs (storage area networks).

You Can't Support Apps Without Storage Support
The 45th Space Wing comprises Florida's Patrick Air Force Base and nearby Cape Canaveral Air Force Station, as well as additional smaller facilities across the launch range. (Among the purposes for unmanned rockets are sending satellites into space. You probably have the 45th to thank, in part, for your cellular phone connections.) Headquartered at Patrick AFB and Cape Canaveral, the 45th manages launch-related operations stretching from Florida to islands in the Caribbean and south Atlantic Ocean. The facility on Ascension Island in the south Atlantic is 5,000 nautical miles from Patrick AFB.

While the IT staff at the base doesn't maintain the actual launch systems, it does provide users of those systems and other launch support personnel with an application and storage infrastructure. Says Sheryl Glore, chief of implementation and standards for the 45th Space Communications Squadron, "The IT department provides the infrastructure to make everything run on the base - everything from finance to personnel to launching the rockets." Key applications requiring storage include the suite of Microsoft Office apps and the core financial system, which handles, for instance, the parts requisitioning process for equipment maintenance. Launch support personnel create volumes of what the IT team refers to as "work aids": Word documents, Excel spreadsheets, Access databases, and SQL databases. Those files and databases include mission-critical data such as weather information, radar readings, telemetry analyses, and range schedules.

Uncontrolled Capacity Can Clog Your Backups
Before the server consolidation and SAN migration projects, the 45th was facing a serious server and storage management crisis. With servers and storage at 12 facilities, its nine-member IT staff was spread thin, traveling from site to site to deal with server crashes and problems with localized, departmental backup. At Cape Canaveral alone, there were servers at seven different locations, some as far as 15 miles apart. And, of course, the team couldn't afford to have one of its staff members permanently assigned to the extremely remote Ascension Island. While remote server management tools from Computer Associates' Unicenter suite helped the team handle daily monitoring and software upgrades, server meltdowns were not as easily accommodated. Says Glenn Exline, manager of advanced technology for the 45th Space Wing, "If we had an issue with a server and had to send someone 15 miles or sometimes 5,000 miles to deal with it, we had excessive downtime." Furthermore, the number of servers that needed to be managed and backed with storage was steadily increasing. "We had fallen into the direct attached trap. When departments or user groups needed more storage, they would go off and buy more hardware and turn it over to us to manage," Exline explains.

In addition to the labor problem, the 45th was struggling to keep up with its backup operations. While some departments shared tape devices, the servers for many users and user groups were backed up to locally attached tape drives. This highly decentralized approach was stretching the backup window to the breaking point and beyond. "Our nightly backups were taking at least 12 hours, so we ran them from 6 p.m. to 6 a.m. With nine people, we were constantly running on the edge," says Exline. "There were days when we didn't get the backups completed before users began coming back online."

The extended backup process was also wreaking havoc with storage utilization. Like most organizations with large, inflexible storage environments, the 45th had underutilized capacity on some servers. And, some of that wasted capacity was being deliberately underutilized. Exline admits, "Because we couldn't get everything backed up, we had situations where we told users not to use all of the primary storage on a particular server." And, those users weren't exactly judicious about how much data they were storing. "We discovered lots of users copying data back and forth across group share servers - some at Cape Canaveral and some at Patrick - just so it would be available when and where they needed it," says Exline. "A single user might be replicating as much as 600 MB across 30 different servers."

Streamline Your Servers, Size Up Your SANs
To cut into the server sprawl, the IT staff committed to consolidating its servers. Even before the move from direct attached storage to the SANs, the team eliminated servers that were so underutilized as to be unnecessary or so old as to be not worth replacing. That first step chopped the overall number of servers from 80 to 50.

Next up was the SAN migration. First, the 45th reduced the number of server-housing facilities. At Cape Canaveral Air Force Station, for example, the seven data centers were consolidated into three. Then, the IT staff built two SANs riding on Brocade switch fabrics, one supporting the four data centers at Patrick AFB and one supporting Cape Canaveral. (Given the geographic location of the launch range - "in hurricane alley and the lightning capital of the world," as Exline puts it - the base isn't confident putting all of its IT resources in one centralized site. Also, the off-shore sites will still house their own data centers, remotely managed from Patrick AFB.)

In those data centers, servers have been moved into clustered configurations to provide shared access to Dell and EMC servers and storage units. The clustered design ensures redundant, failover protection in case of server crashes. While each data center has its own storage array, a server can also access storage from an array in a different data center. For example, a server in a Patrick AFB data center can access the array in its own data center or in any of the other three data centers at the base. (Connections between the data centers are made via dedicated fiber cabling using native Fibre Channel protocols, with the longest distance currently at 16 kilometers. Soon, Patrick AFB and Cape Canaveral will be connected over a 40-kilometer link.) Helping the IT staff monitor and manage SAN traffic and utilization is BrightStor SAN Manager software from Computer Associates.

By increasing the ability to share storage resources, the SAN rollout enabled additional server reductions. For example, of the 30 servers left at Patrick AFB after the initial 80 to 50 server downsizing, only 14 servers (primarily used for file and print purposes) remain. In addition to consolidating primary server storage, the IT team was able to streamline backup operations. Except for local backup done at the remote island facilities, all backup processes are handled by two large automated tape libraries managed by Computer Associates' BrightStor Enterprise Backup software. One Exabyte X80 Fibre Channel library is shared by the data centers at Patrick AFB; the other, by the data centers at Cape Canaveral.

The shift to SAN-based backup has greatly increased backup speeds and, hence, reduced backup windows by 83% - from 12 hours to 2 hours. And, those numbers reflect a nightly process that currently backs up 3.5 TB from the SANs. "The reductions in servers and in space wasted by replicated data were important factors," says Exline. "But, throughput was also a major factor. When we moved from IP [Internet Protocol]-based backups to Fibre Channel-based backups, our speeds increased from a maximum of 100 MBs per minute per tape drive to a GB per minute per tape drive." Increased backup speeds have also allowed the IT team to lift the restrictions on storage usage previously imposed because of strained backup windows. "The available storage capacity we can give users has increased by 600%," Exline says.

The redundancy that comes with the SAN-based server infrastructure has also brought dramatic improvements in systems uptime, something Exline particularly appreciates as he recalls the pre-SAN era. "When we were continually adding servers to the picture, we began to see diminishing returns," he says. "We had reached a point where so many of our servers were nearing the end of their useable life that we might have a server down for a half day every three weeks or so. That meant that as many as 400 users would be without their data for half a day at a time." Now that clustered servers take on the load when their cluster partners suffer downtime, those nightmares are clearly in the past. Says Exline, "Since we brought in the SAN, there hasn't been a single server outage that users have actually felt."