In my previous job working with a GSM operator, there was once a major crisis.
Ours was a fledgling GSM operation for a metro license and we had a well dimensioned point of interconnect with DoT (now BSNL). The problem was that we had only one POI, which was not on account of any lack of Engineering acumen on our part but more on account of a reluctance on DoT’s part to spare capacity from a second POI. In several discussions we were told that they don’t have spare capacity – in those days, we would discuss and come to an agreement and, only after that we would apply for the connectivity.
Then the unthinkable happened…
The exchange building in which the POI was established had a major fire accident. Two exchanges and all associated connectivity to the central business district including our POI went up in flames. Fortunately, there were no casualities. Those days only 5% of mobile traffic terminated on another mobile; nearly 95% had to take the PSTN route!
Five hours later, the moment we were able to get access to the senior most executive of DoT in the city, we put forth an urgent request to re-establish connectivity to any other exchange in the vicinity. The senior executive issued a direction to his staff to re-establish connectivity to the two cellular operators on high priority.
Two of us from our company, went to the new exchange identified and started chasing up the work. Cable after cable was taken up, E1 after E1 was put-through till the distribution frame at the exchange building. The building housed 4 different exchanges – getting connectivity inside the building was becoming a problem. Fortunately, our competitor boasted of a great relationship with the transmission team within DoT and we agreed that we will cooperate to make the connectivity established for both the operators. In return, since we had better relationship with the exchange administration, we will help establish the routeing for both companies to happen together.
Twenty-four hours from the fire accident, we had the physical connectivity completed. A quick trip back to the NOC to create the routing scripts at the MSC and back to the exchange – we restored the connections completely in twenty-nine hours!
What is more… for the next one month, we made a killing selling mobile connections to every establishment in the Central Business District because their landlines were not restored. Also – we got an immediate go-ahead from DoT to go for a second POI!
Why am I writing about all this? Frankly, after seeing the Blackberry fiasco of global outage, I feel like boasting.
True the scale was different… but do not discount the fact that we are talking about an era with lesser automation in network management, physical cables supporting a TDM setup etc…
RIM’s explanation is that a core switching element failed and a failover mechanism that they have regularly tested has also failed! “The Internet can never be cut off because it is a packet-switched network and has no single point of failure” – I have heard many people state this. Now, I wonder whether this is really true! Or, was the network design so bad that despite using a packet-switched network there was a single-point of failure?
RIM BBM service provided a free SMS delivery mechanism. The closed user group of Blackberry users could send/receive messages to each other without a carrier involved. Were the design principles for the network carrying these messages as robust as that of a carrier network? And, again, given an important communication medium had failed completely and has not been restored quickly enough. Was it because it was a free service the Quality of Service is so poor? Given that certain business activities would have suffered or taken a costlier communication option, was the service free, after all?
As a well-wisher of RIM and having settled for a BB phone now after all my search for an ideal mobile phone suiting my purposes (see also:http://wp.me/pAb4X-H) I really wish they would sort out this problem quickly and regain the level of reliability that they are famous for. Some day, we will also know some more answers to all the questions above.