Knowledge Base / News
This article was originally published by AVTECH President & COO Richard Grundy on LinkedIn Pulse. The original article can be found by following this link.
We lost power last night.
This may not seem that unusual for some people. In fact, just a day earlier, a large section of Manhattan was plunged into a blackout, and the cause is still under investigation. However, it started to ring alarm bells for me. This is now the third power outage we’ve endured at AVTECH in the last several months. Each outage has exceeded an hour, meaning there’s no UPS that can keep us running to cover it. Last night’s outage lasted over six hours.
This could have been a perfect storm of conditions that had the potential to cause significant damage to our computers and servers resulting in extended downtime. After one of the hottest days of the year so far, we started seeing high temperature warnings from Room Alert installed in our data center. Soon thereafter, the power went out. For many data centers, this is a worst case scenario, as it would mean that the air conditioning turned off, but the servers continued running and pumping out heat for another 30-45 minutes while running on UPS power. With temperatures already close to the safe operating limits, this could have quickly caused hard drive and CPU failures that may take days to recover from while repairs were made. Fortunately, our free air cooling solution kept cool air from under our building moving through the data center, and our automated response initiated by Room Alert automatically shut down non-essential systems and equipment to minimize heat load and maximize UPS runtime.
Is the electrical grid being overworked due to the high temperatures we’re seeing? Is electrical generation capacity strained since the regions largest (and dirtiest) coal plant shut down last year? Has the local infrastructure failed to keep pace with the increased number of businesses and the computers and equipment that come with them? Are above ground power lines at risk from trees and vehicles? In reality, it’s probably a little bit of all of these causing power outages to increase in frequency. Last night’s event appears to have been caused by a tree falling on power lines and catching fire.
From a business perspective, the actual cause of the power outage doesn’t matter, since there’s little that can be done to prevent it. What matters is how we respond to unexpected environment issues, including power outages, to minimize the downtime and impact on our customers.
Thankfully, we’re using Room Alert to monitor for high temperature, high and low humidity, power loss, water and flooding, and doors opening to either our data center or the individual racks within it. In yesterday’s example, we started receiving notifications of high temperature warnings late in the afternoon. Using the relay outputs built into Room Alert, we have Room Alert configured to automatically turn on a backup air conditioner when the temperature gets too high. This means we don’t need staff driving into the office on a Sunday evening; Room Alert can do the work for us.
When we received notifications of the power outage a short time later, Room Alert again sprung into action. Scripts were run to automatically perform orderly shutdowns on non-essential servers and equipment, preserving maximum backup battery time for essential systems. Later, when UPS power was nearing its end, Room Alert shut down the remaining systems to maintain data integrity. All of this was done automatically by Room Alert and according to the business continuity plan we established. Room Alert sent notifications along the way so our team remained fully informed.
When power was restored at about 12:30AM, Room Alert sent notifications out indicating that the alarms had cleared, and staff was able to return to the office to power equipment back on. Although we experienced some downtime, we experienced no damage and were immediately back to normal operations once power was restored locally. With our plans to add generator capacity later this year, we’ll be able to stay online completely during events like these, with Room Alert still in place to take automatic action if we see both main and generator power failures.
It’s important for all businesses to think about the impact of power outages and high temperatures and the simple and affordable steps that can be taken with Room Alert to minimize their impact. With high temperatures taxing the power grid, and the first hurricane having already made landfall in the USA, the time is now to think about “business continuity”, rather than facing the prospect of “disaster recovery”.