« Telco 2.0 on TV: Disruption & Innovation in Voice & Messaging | Main | Ring! Ring! Monday News: 20th August, 2007 »

404 Skype Not Found

Centralised architectures can always cause trouble. Not that this is a point in distributed systems’ favour, necessarily; look what just happened to Skype, which has suffered a whole day’s outage.

We at Telco 2.0, as you may know, are actually a group intellect, structured rather like the brain of a large cephalopod. Rather than one single brain, there is a node for each tentacle, the whole being interconnected by the highest-bandwidth nerve fibres known in nature. Unlike the squid, the Telco 2.0 team uses Skype quite heavily in order to maintain coherence among its multiple cerebellums (cerebella?), so we may be forgiven for feeling a little sporky. We’ve been debased to using Google Talk for much of the day.

Telco 2.0 in its natural habitat
Telco 2.0 in its natural habitat

So all day, access to Skype has been to all intents and purposes impossible, starting around 1000 hours GMT. The pathology takes the following form; on start-up, the Skype client successfully registers on the network (often with considerable delay), but rapidly logs-off again, and struggles to reconnect. During the brief intervals of successful operation, the number of logged-in users is very low; between 100,000 and 320,000 according to our own observations.

What was up? Surely the nature of a peer-to-peer network means that there is no single point of failure? Well, everyone speculated, so why not us too?

Skype’s architecture is supposed to eliminate single points of failure

Skype is one of the most decentralised of decentralised systems. Much is secret about its workings, but it is well-known that some fraction of end-users act as “supernodes”, which all carry part of a distributed directory of Skype names and their current IP addresses. These also act as proxies for users behind firewalls who can’t connect directly. The problem of finding a supernode out there is solved by hard-coding the IP addresses of seven “super-supernodes” into the Skype client. As all supernodes know the locations of all other supernodes, once the client has contacted one of the seven, it can be handed off to a topologically handy node.

Skype user names are issued by a central server. This generates various cryptographic keys used to authenticate users to each other and to supernodes, as well as to encrypt bearer traffic. Everything is always encrypted. When a Skype client starts up, it tries to contact a super-supernode and presents its credentials. What happens then is not entirely clear; it is suggested that the supernode then carries out some sort of logon process with a central server. As the login details go first to the supernode, and this has the crypto necessary to authenticate the user, one wonders why this would be so.

So what have we learned?

Don’t be religious about any particular technology sounds good. IMS may be horribly over-centralised, but Skype may just have some similar pathologies. And the king of centralised telco engineering — the PSTN — is still the world standard for reliability. As long as you solve the user’s problem, nobody will care what technology you use. Until it breaks…

Update: As many people suspected at the time, it was the Windows patch whatdunnit.

To share this article easily, please click:

Comments

I saw this some time ago - it is more than a year old, but it describes how Skype is built.

I can't vouch for its accuracy, but I can vouch for it's jaw dropping complexity.

"Surely the nature of a peer-to-peer network means that there is no single point of failure?"

If my "failure" you mean a hardware failure.

But many failures in highly interconnected networks are due to software faults. And "single point of failure" doesn't apply to software faults - the software is the same everywhere.

I don't know if this had anything to do with the current Skype problems, but common source software faults are just as much a problem in distributed networks (peer-to-peer or otherwise) as in centralized networks.

You win the Telco 2.0 squid spear, DG Lewis. According to F-Secure, there's an exploit out that could make the clients DDOS their supernodes (or login servers).

The last statement must be read with some personal introspection by many in the industry. After all, till not long ago it was fashionable for many in the industry to trash PSTN with apocalyptic predictions. How I wish that the thought leaders then uttered that last sentence and provided guidance on how one could marry PSTN and IP (that too at the end points). Oh well.

surely you should be using a service of your own making? With freely available, reliable and feature rich software available (Asterisk, OpenSER, Linux), it is cost effect to build you own solution that allows you to be mostly in control of your own destiny.

Whilst Skype and GoogleTalk are great services, having them as a backup rather than a primary communications channel makes more sense?

It turned out to be a combination of Windows patch Tuesday and a bug in the Skype client software that caused this outage

Good illustration of why we need a multi-vendor, multi-OS telecom network, which (by definition) is based on open standards

Post a comment

(To prevent spam, all comments need to be approved by the Telco 2.0 team before appearing. Thanks for waiting.)

Telco 2.0 Strategy Report Out Now: Telco Strategy in the Cloud

Subscribe to this blog

To get blog posts delivered to your inbox, enter your email address:


How we respect your privacy

Subscribe via RSS

Telco 2.0™ Email Newsletter

The free Telco 2.0™ newsletter is published every second week. To subscribe, enter your email address:

Telco 2.0™ is produced by: