Data Center Communications 101 (aka, How to survive a disaster, part 2)
IT folks adopted technologies like message boards, SMS messages, email, chat protocols, web logs, RSS, and micro-blogging long before they were widely adopted. How can the data center manager put these communications channels to use as disaster recovery tools? The simple answer is “use them.” However to use them successfully is not that simple. This is part two of the column Communication: A neglected key to surviving a data center disaster.
“How can you tell an engineer is an extrovert?”
“They look at your shoes instead of their own when they talk to you.”
Yes, it is a funny stereotype, and like all stereotypes it has a kernel of truth. Engineers, technical people, geeks, whatever term you want to apply to us (I say “us” because I certainly fall on this side of the dividing line) share that particular trait to a certain degree. We tend to be thinkers and analyzers, drawn to interacting with machines. When we have something to say it is frequently seen as blunt. However, that does not mean that technical people are not capable of being great communicators.
Quite the opposite in fact: We tend not to speak until we are certain of what we say. Additionally we embrace communications technology and put them to use long before they become mainstream. Think about the history and adoption curve of these recent technologies: message boards, SMS messages, email, chat protocols, web logs, RSS, and micro-blogging. What sort of people used these technologies before they were widely adopted? That’s right… us.
How can the data center manager put these communications channels to use as disaster recovery tools? The simple answer is “use them.” However to use them successfully is not that simple.
First you have to choose the appropriate tool for this external communications channel. In some instances email may be the best, in others a web log with an RSS feed may be the better choice. It may not even involve a “high tech” solution, as it can also be something as simple as a welcome/status message on your phone system, or even a “scoreboard” hanging above your cubicle. The key is to pick something that works within the context of your organization, and most importantly, will be effective in touching the greatest number of your users or customers with the least amount of effort.
All the same considerations you consider when designing a critical system have to be taken into account, when you make this decision. It must be able to function in a highly reliable fashion, even during an outage. It must also have a secondary and even tertiary backup system in order to continue to function in the case of an emergency that disables your primary system.
Next, and this is the critical step, you have to use it. Not just as harbinger of doom, also make it the teller of your tales, the bulletin of the boring, the messenger of the mundane. Use it constantly to inform of what is happening within your facility, your network, your systems, even your staff. Announce every scheduled maintenance interval, every successful circuit installation, the delivery of new equipment, everything.
This accomplishes two things. First it gets you and your staff in the habit of updating your external communications channel. Something happening? It gets communicated. It is critical that this becomes an ingrained habit, something is happening? Relay the status as well as investigate. Making progress? Update the status, and keep working on it. If one staff member is not directly involved in the activity, then they can certainly be the updater of information via the external communications channel.
Secondly, and most importantly this trains your users and customers to look to the channel for all data concerning your data center status. If you have set an expectation among all your users and customers about how they learn what is going on within your realm, then when the going gets tough, whether dealing with an outage, or executing a major facility migration, or whatever circumstance throws your way, then you can keep them informed. Informed users are satisfied users. Remember that paradigm shifting conclusion from part one: Awareness is more important than uptime.
Your users and customers will tolerate incidents of downtime, but ONLY IF they are aware of them, and kept informed as to what is happening, and what is being done to restore things to an up state. By keeping them in a state of constant communication, even of mundane day-to-day operational matters, you build their trust while you keep them aware. The payoff comes when you have that big project or that unplanned outage and rather than being assaulted from multiple directions via multiple channels asking you for status, your users and customers refer to your now well-established external communications channel.
In the event of a large project — a data center migration, for example — you can use your communication channel to provide a far-ahead warning of what is going to happen. You can provide details of what is going to be done and how it will be done. You can elaborate on schedules, fall-back plans, contingencies, and expected results. You can announce reminders in the days and hours leading up to the start of each segment. You can post after-action reports of the successful, and perhaps unsuccessful moves. You can announce the ultimate completion of the project.
In the case of an outage, you can post updates and ETAs, you can use your channel to inform your users and customers of the root cause and what has been done to prevent the issue from recurring.
The purpose of this is to keep your users and customers informed. By remaining informed they will build trust in you. That trust is built through awareness of not only the critical, but the mundane and day-to-day. By staying aware, issues within your realm are perceived by your users in a positive light, and things which would have been seen as full blown train wrecks had they been unaware, are now likely to be seen as mere speed bumps.
Posted in Data center disaster recovery planning | No Comments »
