10 extra tips on what to do when website down – Operational view
Ten extra tips and tricks on what to do and handle when disaster strikes
I am happy to present to you what is part 2, a another set of useful pointers to prepare and think about with regards to being operationally in tip top condition for any contingency or emergency situation that may befall your website (that we all hope will not happen).
- Plan for live & dynamic data transition – this one can be a real headache, depending on the scale and complexity of your site. If you are mainly an information providing website, this may not be much of a worry for lucky you. Because for the rest of us, there are a few things to think and plan for. Do you need to handle and worry about partially completed data transactions? Can these partially completed flows be completed again later, or will it get stuck and cause a problem to the user? Consider any critical paths in your application, both from the technical stand point and also the user flow perspective. Can these be resumed without problems or must they be handled? Consider aspects like shopping carts, payment gateways, back-end jobs, api or web services with communication with other parties.
- Communication – Consider the need to communicate the issue to your users to keep them in the know, seek their understanding and forgiveness, and how your team is working hard to rectify the issue. Are there other stakeholders as well that you should need to inform early? There may be certain decisions or preparation that needs to be made if your site has demands for high availability.
- Compensate your users – think about how you want to make it up to your users for this stumble or boo boo. There was this once I found out that my debit card details got stolen after my bank notified me of several charges in the weird hours of post mid-night (around the figures of $500). I then realized I recently made an online purchase. The site sent an email apologizing for a security breach in their payment gateway, and explained how they were rectifying the issue by changing the payment service, plus offered a discount coupon on my next purchase. No doubt they were candid about the issue, but somehow the impression was already made and the money charged. But it certainly does help to react quickly and be honest about the issue. Customer loyalty and trust is certainly hard to earn but easy to break.
- Investigate issue – look for the root cause. There is temptation to use quick fixes and work around solutions due to the pressures of time. If there is not enough time to do so, decide on the temporary fix or solution to apply so as not to disrupt operations, while the team will continue to work in the background on determining the root cause and solving for it. This will help make ease the pressure so that the actual root cause and best solution can be found. Of course remember to apply the final fix at the end, before getting distracted by other ‘operational fires’. (Optional Read 4 types of website errors and ways to diagnose and solve, A general strategy)
- Emergency team ready – do you have a team ready to handle contingency scenarios? Are they immediately contactable and available? Are they familiar with the procedures? Assigning roles and doing some dry runs before hand goes a long way in preparation. That is why we always conduct those ‘silly’ fire drills every few months, but it helps. It is best to keep the procedure plan simple and straightforward. Simple does not mean easy. It means it will need the extra effort in order to create one that is relevant, realistic, and very importantly, effective.
- Think about what can be improved next time – just like your business, there needs to be continuous iteration for better effectiveness and results. Learn and build from each operational slip. Small ones give a great opportunity to practise and highlight any gaps in the recovery flow, so that your team gains confidence and familiarity in order to be a great team when the big one unfortunately arrives.(Check out my other post: Why continuous iteration and improvement is vital like gym work)
- Check database, hosting, or domain – It’s good to go back to the basic elements of a modern website, and check that the fundamental bread and butter issues are all still up and running. It’s often the case when our electrical device stopped working, and after wasting lots of time we finally “repaired” it by simply re-positioning the power plug or something similar to that. It’s certainly useful if you are able to leverage on monitoring tools like pingdom so that you are notified early whenever something goes awry. This is certainly better than relying on your users to feedback that they could not make a purchase because your site is not working, because it could end up to be your last contact ever with them.
- Clustered/dedicated hosting – you can consider clustered hosting to ensure high availability. There will be fail over to other nodes should any one go down, so that your website will stay up and running. Dedicated hosting on the other hand will prevent sharing of resources with other parties. Important resources such as bandwidth, disk space, CPU time and memory. Such services will be at the expense of more cost of course, therefore you will need to assess the trade offs and benefits.
- Have a maintenance page/plugin ready – should you need to activate one, it will be much faster if you already have one prepared, rather than scramble or make your developer scramble to prepare one as you scribble to get your PR speech ready. Also consider the scenario where your database or application server may be down. In this case, your maintenance page will probably need to be a static html page.
- Get your support center on standby – If you have a support or call center, they should also have their scripts ready and prepare to receive incoming bombardments from all relevant channels. Aggregator and social media monitoring tools will certainly come in handy in this situation to help manage on the social media front, as these sites may become the alternative main source of user engagement and complaints when your main site is down.
- And lastly as a bonus point, if you are the person in charge or in the position of responsibility, stay calm and think. Being on top of the situation is crucial at times like these. You will not be able to be perfect, will not solve everything, nor make everyone happy. But you just try the best you can.
Thank you for reading. Do post if you have extra tips to share!
You can also read the first section on 10 tips when website down…
Simplify and gear up your business today.