top of page
Search
  • Writer's pictureMike Carter

The Case For Preparedness

Updated: Jul 18, 2020


Are you prepared? Are you sure?

In the early minutes of Saturday morning, April 2011, a water main broke underneath the street in Columbia, SC.

A $30B lending provider, located just feet away from the break, with over 1,200 servers (95% of them virtualized) located in their basement Data Center, experienced a massive subterranean flood that destroyed the entire electrical subsystem beneath the building, effectively rendering it useless due to power loss.

There was no physical damage to the computing systems themselves, which were safely above the flood waters; however, since the entire electrical distribution for the building was submerged under feet of water, neither commercial grid power or UPS power could be leveraged.

An entire data center lay asleep…in the dark…with no ability to turn it on.

eGroup received the call shortly after 12:45am Saturday morning, requesting assistance of any kind to get critical line of business systems up and operational prior to Monday (As a lending institution, systems and processes certainly ran 24x7x365, however the nationwide mob with torches wouldn’t show up until Monday morning). We sprang into action, mobilizing an entire team that delivered a portable data center rack – complete with storage, compute, and networking and pre-configured and ready to accept workload restoration – to a colocation facility in Alpharetta, Georgia, by Sunday afternoon.

A lot of things went right with that situation – the timing of the calamity, the industry of the business affected, no loss of human life, no collateral damage to critical assets. It was just a complete loss of power, with a 48-hour opportunity to get ahead of it.

In the days, weeks, and months that followed this event, much work was done to fortify their disaster preparedness posture, a term that was used more than disaster recovery, because “who wants to recover from a disaster” when you can be “prepared for any disaster”? Recovery was a slow, intensive, expensive process. Future preparedness was even slower, more expensive, and complicated. Think in terms of “years” and “millions”. The building was declared uninhabitable – for months.

If you had asked this team a few days before the flood if they felt “prepared” for an event, they would have said “yes,” and they would have been right – but only for the events they had prepared for. Clearly, this scenario wasn’t on the threat list ahead of time. They had prepared for the expected – not the unexpected.

It was John Lennon who most famously sang “life is what happens to you while you’re busy making other plans” and I believe this story emphasizes that principle from an IT perspective.

Preparedness means being “operationally ready for the unexpected” to ensure availability.

As IT veterans, I’m certain we all have stories like this – many of those without the happy ending.

It reminds me of the prime concepts of a (fascinating) survival book I read several years ago by Laurence Gonzales called “Deep Survival: Who Lives, Who Dies, and Why,” where the key themes of accident investigation are:

  • Things that have never happened before, happen every day (for example, the NASA Challenger booster explosion due to cold weather creating brittle seals, despite the fact that the shuttle had endured prior cold weather launches successfully many times before)

  • Accidents are almost always a result of a series of seemingly benign individual events that when taken together create an unexpected, and compounded, calamity

But, if things that have never happened before, happen every day, why do many organizations play the IT game as if nothing is about to happen?

Most folks invest only in the minimum – data protection based on point in time backups – but they rarely test them. Most don’t know how long it will take to completely restore a system. We’ve seen groups (not with eGroup by the way) invest large sums in “fast backups” with no regard to “fast recovery” not understanding that your backup is only as good as your ability to recover – and quickly.

Even larger-scale Disaster Recovery capabilities are rarely tested.

Compounding this shortfall, often times the IT team isn’t even checked out on the specific steps (or timelines) of what a partial or full recovery may look like.

How many businesses will pay the ransom or the consequences for their lack of preparedness?

eGroup believes that Disaster Preparedness is really an exercise in discipline over your Disaster Recovery process. You must “inspect what you expect” from it.

Drilling (practicing) with the right tools regularly – often enough that everyone on the team knows what to do and when to do it, while feeling comfortable and confident – is key. Frequent execution of the plan, with the ability to improve and automate aspects of it, ensures fluidity and “muscle memory” when pressed into action when the unexpected happens.

Furthermore, the most prepared teams embrace a philosophy of “you fight like you train” or “you play like you practice” and live in a constant state of preparedness by living in a constant state of recovery. These teams operate regularly in a state of expectation that something is offline.

We’ve seen this concept executed in superb fashion through the likes of the Netflix Chaos Monkey and, more locally, a Federal Credit Union client who routinely fails their systems over, runs for a month or two, and then fails back again…rinse and repeat.

However, the fundamental challenge for most organizations is the cost – both funding and effort – of being prepared. Historically, IT has not been “core” to the business, as it might be with Netflix, so disaster preparedness, or more appropriately “IT Resilience”, has often been sidelined in favor of more profit-generating or cost-saving initiatives.

But with the advent of the “digital transformation” (a discussion for another day), IT is not just a convenience but, rather, an essential component of an organization’s value. As Alec Ross, author of “The Industries of the Future” says: “Land was the raw material of the agricultural age. Iron was the raw material of the industrial age. Data is the raw material of the information age.

Data (and access to it) is so vital that it is more valuable than oil.

In a world where every business will be naturally selected – at the will of its clientele – based on its ability to access its data; downtime – and the inability to avoid it – will spell certain death.

Data Availability=Survival!

Why are the solutions of last year no longer appropriate? Where can you adapt to win?

Storage industry veteran Paul Zeiter, currently president at Zerto, said recently that “One of the big challenges facing companies today is the need to change the way they think about coping with change. Traditional approaches to managing change and disruption were predicated on working with physical systems. But those models don’t work optimally with highly virtualized environments” and more simply “In 2017, outages are no longer tolerable”

How right Paul is!

And if “IT Resilience” is the new normal in a world where business success is based on intolerance of outages, I would add that “IT Complexity can no longer be IT Job Security”

From eGroup’s perspective, shaped by our many experiences delivering speed and certainty with cloud and data center solutions, IT Resilience can be achieved when it delivers:

  • Speed

  • Certainty of success

  • Simplicity

  • Affordability

And, while continuing to build sophisticated, privately-hosted recovery infrastructures, eGroup is increasingly leveraging the power of the Microsoft Azure cloud as the target for replicated on-premises systems – saying “goodbye” to concern for unpatched systems, non-compliant platforms, and slow turn-ups with middle men – while saying “hello” to automated and standardized activation, world-class security, and global platform access with directed data locality to stay compliant with sovereignty interests.

eGroup delivers fast and effective “IT Resilience” that is built with commoditized and standardized services to keep costs low and simplicity high.

By layering Zerto resilience software on top, we deliver RPOs in seconds and RTOs in minutes, with up to 30 days of journaled changes, bi-directionally to, or from, the Microsoft Azure cloud – or your own on-premises private cloud.

If you’re not familiar with Zerto, they make replication and orchestration software for virtualized infrastructures that enable private and public cloud resources to be leveraged quickly and easily for disaster recovery and backup.

More simply put, Zerto’s software, when combined with the Microsoft Azure cloud, delivers certainty in uncertain times.

The power and cost advantages of cloud-enabled IT Resilience are extraordinary, and eGroup is convinced that forward-thinking businesses will embrace the cloud as the way to address tomorrow’s data availability needs today.

Why wait?

Get your IT Resilience score at: www.eGroupcloud.com/resilience


26 views0 comments
bottom of page