‘Who ya gonna call?’ Shared Infrastructure relocates OH-TECH data center

Former Director of Shared Infrastructure
,
Ohio Supercomputer Center
Wednesday, June 24, 2015 - 9:11am
Tangled wires beneath raised floor.

Over the weekend of June 12-13, 2015, staff members of the Ohio Technology Consortium’s (OH-TECH) Shared Infrastructure (SI) unit moved systems and storage from the Kinnear Road Data center (KRC) at The Ohio State University to the State of Ohio Computer Center (SOCC). While not a common event, this was not the first time that I have been responsible for moving equipment from KRC to the SOCC.

First a little background. Various entities within or related to OH-TECH—the Ohio Supercomputer Center, OARnet, OhioLINK, the Ohio Board of Regents and the Ohio Tuition Trust Authority—have had systems and storage at the KRC data center since 1987. This is quite a long commitment for a facility that was never intended or originally designed to be a data center.

Previous Data Center Moves

Moving equipment from one data center to another is never an easy process, and I have been through this process several times in my career. Each time, however, it gets a little bit easier.

Wiring at the SOCC
Newly laid wiring in cable trays at the SOCC align in stark contrast to the clutter of discarded wiring left behind under the floor at KRC

My initial data center equipment moves occurred while I was with Cray Research. At that time, the Cray supercomputing equipment first would be set up at a test bay in Chippewa Falls, Wisconsin, and tested extensively for several weeks. The equipment then would be packed up and moved to the customer site to be installed—a process that would take 4 to 10 days, depending on the system complexity and the site requirements.

During one of these installations, I had the idea to connect the disks that would be used by the customer installed at the test bay. We then did an operating system install during test time, and then moved the disk drive to the customer site. This saved us trying to read in 15 to 20 tapes and rebuilding the OS on-site. This reduced the on-site installation time by one to two days (or more). This was the first opportunity to reduce the equipment move time from data center to data center.

During the late 1990s, as we were reviewing the data-center capabilities of KRC and the future supercomputer requirements, it became very clear that KRC would not be able to handle the immense power and cooling needs; it was time to look for a new data center. The State of Ohio Computer Center had the power and cooling capabilities, and after some negotiations, we were able to secure some space at the SOCC.

In September 2001, OSC moved its first systems to some shared space in the SOCC provided by the State of Ohio Department of Administrative Services. Moving equipment into the SOCC and setting it back up took five days to move four racks of equipment. It actually took an extra couple of days due to the events in NYC and Washington DC on 9/11/2001, as we needed to leave the building as a security precaution.

Yet, we still kept some equipment at KRC. Running in this split data center mode was difficult. By the spring of 2002, OSC obtained additional space at the SOCC on the fourth floor. OhioLINK and OARnet kept their presence at KRC. To move the 15 racks of OSC equipment from KRC to SOCC, we planned on an outage of five days. We beat our expectations by beginning production services on the fourth day.

Cray Y-MP move in 1989
As seen in this photo of the Cray Y-MP installation in 1989, the moving methods for computer racks hasn't changed much in the last quarter century!

But these equipment data center moves were fairly basic—only one to five systems and storage at most. There were lots of racks to move, but the system use and overall infrastructure was pretty simple to transport and reset.

Since 2004, the computing environment has become much more integrated, with many system and data dependencies. As OH-TECH was formed from the component organizations, the integration of the various services required individualized equipment moves to the KRC facility from various locations around the area. Most of the time, we could provide new hardware for a service to run on that made the transition to the new location much easier.

About the same time, system virtualization began to become a much more viable option. We could increase the utilization of the systems, improve service availability and supportability and provide better Business Continuity and Disaster Recovery options. Virtualization, and the addition of shared, high-availability storage, would provide better service support for OH-TECH consortium members and allow for a much different data center move in the future.

Racks moving out of KRC
The racks started moving out of Ohio State's Kinnear Road Center about 2.5 hours after the June 2015 shutdown began.

2015 Data Center Move

In the summer of 2014, when Ohio State informed us of plans to shut down KRC and move all systems there to a new location, OH-TECH began planning. With more than 300 active virtual compute servers and 1.5 Petabytes of storage, over 30 different networks to manage, this was going to be a large undertaking for the SI group. We looked at the data-center move as an opportunity to upgrade hardware, reduce complexity and limit the impact on customers and end users. Thankfully, we were able to get sufficient funding through the Ohio Board of Regents to support these goals.

While using project-management processes to help track requirements, issues, status and activities, the one thing we had to keep in mind was that we needed to stay agile. In a project this large and complex, change was constant; whether it was the timelines for ordering equipment and receiving it or the exact time when our new space would be available, something was always changing.

Since we had the luxury of a little bit of time, we moved equipment from data center to data center is phases. This helped meet overall goal of reducing user impact. We were able to install a virtual machine server at the SOCC and then extend the Storage Area Network across the WAN to include both KRC and the SOCC. We then allowed virtual systems to migrate between the two locations, while still maintaining access to the data stored at KRC. These server migrations took place during maintenance windows, with little user impact.

Racks positioned at the SOCC
A short time after leaving KRC, the OH-TECH racks were wheeled into place at the State of Ohio Computing Center.

After making a series of these simple changes, it was time to move all the data and some remaining fragile systems. A couple of these “fragile” systems were more than 15 years old, and while they run well, they are difficult to repair if they fail since parts and expertise are missing. A lot of time was spent on preparing several levels of contingency plans to ensure restoration of services after the move. All of this effort and planning required communication to a wide user base, with weekly communications providing information about the transfer starting three months before the final move.

That brings us back to the events of the weekend of June 12. Six racks of equipment, containing 1.5 PBs of data and several fragile, highly visible systems, were turned off, packed up on a truck and moved down the road. Approximately 12 hours later, the reinstalled systems were powered up for internal testing, and, by noon on Saturday, most services were available for outside testing.

By 2:30 Saturday afternoon, all production services were restored. It is amazing to me that we were able to move so many services and so much data and restore all services in less than 24 hours. Sure, we had several issues to deal with after most of the move was completed, but most of these were small and handled quickly as we discovered them.

Retrospective

Equipment moves between data centers have come a long way in the past 25 years. In many ways, they are easier to carry out than in the past. In other ways, the number of systems and the complexity and interdependencies, as well as the amount and value of the data involved, make them much more challenging. But there are several facets that have not changed. The following are still important for any major project like an equipment move:

  • Proper planning
  • Good communications
  • Skilled people working on the move

While almost every member of the Shared Infrastructure team was involved with the move in one way or another, everyone did a great job. There was however, a core set of people that made the SI data center move successful. They should be recognized for their diligent efforts leading up to Friday night and throughout the weekend. These include:

  • Jim Jacob
  • Alan Edmonds
  • Jeff Smith
  • Todd Meade
  • Travis Julian
  • Roman Rudy
  • Matt Soter
  • Terry Montgomery (Juniper Contractor)
  • Tom Sales and Brady Dodson (Ohio State)