Phone out of service

Testing Times for Top Telco

Tue, 21 Nov 2023

How thoroughly do you test your software upgrades? Or “10 million customers can’t be wronged”.
 

If you'd just deployed a software change without fully testing it, but the support desk phones weren’t ringing, you might think you’d dodged a bullet.

But what if the phones were silent because you’d just taken every one of your 10 million customers off the air?

That was the situation Optus, Australia’s second-largest telco, found itself in earlier this month. Close to a third of the population awoke to find their mobile and fixed-line phones, plus their internet connections, had died around 4am. They wouldn’t come back to life for about 12 hours.

Whoops.

The outage also brought trains to a halt, took hospitals offline, forced retailers to shut their doors, and stopped hundreds of calls to emergency services from getting through.

Double whoops.

Following twelve days of “time for some personal reflection”, the Optus CEO has succumbed to the pressure and fallen on her sword. Her fate was sealed not just by the failure itself, but by criticisms of her lack of communication during the event.

In her defence, it is hard to communicate effectively when your phone and internet connections are down!

The root cause is highly technical, and highly contested given what is at stake, but early reports suggested a “routine software upgrade” had gone horribly wrong, with the impacts cascading around the continent as routers disconnected themselves from the network.

If all this sounds familiar to our Canadian readers, that will be because of the striking similarities to the July 2022 Rogers outage. In that case, a “seemingly routine maintenance upgrade” also took 10+ million customers, or about a quarter of the Canadian population, offline for the best part of a day.

Is it time you reviewed your testing regime?

Treating any software deployment as ‘routine’ can lead to complacency, which dramatically increases the risk of human error.

And small errors can have big consequences. Just ask an impacted Optus customer. Or perhaps the former Optus CEO, who may now have more time for a chat.

Here at Orchid, we count our customers in the thousands rather than millions, but we fully understand how much they rely on our software to run their businesses. That’s why we take testing very seriously – a topic covered in our recent Anatomy of a Version Update article.

Latest

Unhappy Boss
Not Happy, Anne!
Fri, 05 Apr 2024
“Sorry, I forgot” is unlikely to satisfy the boss when it comes to backing up your critical data.