BitTorrent's uTP: The Art of Getting Out Of The Way

[Ed: Help Telco 2.0 improve our service to you - take a brief survey.]

Media vs P2P vs Telcos: The Internet's Civil War

At the 8th Telco 2.0 Executive Brainstorm in Orlando last week, Eric Klinker, CEO of Bittorrent.com, had some fascinating things to say about technical solutions to the interlocking intellectual property and bandwidth issues we're constantly debating around online video. (He also remarked that the whole debate about P2P, piracy, and intellectual property had begun to remind him of the US Civil War - by 1863, it was clear that the South could never win, but the war went on anyway, and the majority of the casualties died pointlessly between then and 1865.)

He said that both the telecoms and media industries hated BitTorrent, but that this was in part a reflection of their own mutual distrust. BitTorrent was a very small company being ground between these two huge interest blocks. Despite that, it's still global - the only country where there are no BitTorrent applications running is North Korea - BitTorrent.com has 66% of the market, and the monthly peak throughput of the BitTorrent network is 4 terabits per second.

Congestion, not Traffic, Drives Cost

ISPs tend to be concerned about BitTorrent because they see it as a bandwidth hog. Klinker pointed out that he had himself been an ISP engineer and that he therefore understood their concerns. He remarked that traffic was not, in fact, a driver of cost - congestion was.

klinker2.png

Specifically, costs are incurred when one of various stages in the end-to-end relationship becomes congested - usually the CMTS in cable or DSLAM in telco networks, or either transit costs or peering relationships. For the first two, costs are incurred when congestion degrades the user experience, and there is a need to upgrade; for the second, they occur when the link fills up and more capacity is required. For example, the impact of the iPlayer video surge on UK ISPs was driven by the fact that the BT IPStream pipes were sold in 155Mbit minimum commits, so when a link filled up, the cost of that link doubled at a stroke.

A Technical Solution for the Video Surge

Originally, BitTorrent the technology (as opposed to the company) relied on the native congestion control in TCP to manage congestion. TCP is designed to provide reliable delivery over the stateless and nondeterministic underlying IP network. It does this by requiring acknowledgement of each data packet. If it is not acknowledged after a pre-negotiated timeout has passed, the packet is retransmitted until it is acknowledged or a final time-to-live expires. In the late 1980s, the nascent Internet experienced what is known as congestion collapse - because of congestion, data packets were lost, and the ACK messages coming back began to go missing, so the TCP stacks connected to the Internet began retransmitting more and more packets, stuffing more data into the congested links.

In an all-out emergency effort, the Internet engineering community (notably Van Jacobson, the future chief scientist at Cisco) rushed out a fix to the problem. Since then, TCP stacks progressively increase the maximum size of the packets - the window size, which all other things being equal governs the transfer rate - towards line rate, until packet loss is detected or the path maximum is reached, at which point they back off by cutting the window size. After a waiting period, they try to return towards the maximum speed at which packets are not being lost. This prevents congestion collapse and should also result in a fair division of the available bandwidth.

However, a common critique of BitTorrent from the network point of view is that as it communicates with multiple peers, it creates large numbers of TCP connections. It must, after all, have at least as many connections as it has peers and trackers. As the congestion control operates at the connection level, this means that the BitTorrent traffic can get more than its fair share of the capacity - TCP divvies up the link between hosts and applications equally, but if an application has 10 separate TCP connections it will get 10 shares of the link.

Klinker introduced a revision to the BitTorrent network protocol intended to fix this issue. The new protocol (uTP) essentially implements congestion control at the application layer, so that the BitTorrent application sets a target latency for each transfer. If the round-trip latency begins to approach the target value, an indicator of either congestion or routing instability, BitTorrent will then back off and reduce the transmission rate across all the TCP sessions until the link is no longer congested.

klinker.png

Essentially, this creates a scavenger class of traffic, which expands to fill empty capacity when it is available but voluntarily backs off when it encounters congestion. It's therefore possible to run the network at much higher levels of capacity utilisation, because the BitTorrent traffic is essentially a spinning reserve - if VoIP, SSL, or whatever needs to get through, it gets the extra capacity from scaling down the BitTorrent flows. It's rather like demand response on the power grid.

5.5 Gigawatts of TV

And the power grid is the right comparison. Klinker introduced data on the current traffic passing through the BitTorrent peer network. With around one exabyte a month passing through, he said, this was equivalent to 292,000 years of television streaming at 1Mbit/s. He pointed out that serving such volumes of traffic from a traditional client/server architecture would require very large amounts of electricity, and that even delivering it from a globally distributed CDN would require the creation of a second CDN the size of Akamai Technologies' existing infrastructure.

klinker1.png

It would only get worse, he said, as more and more television was distributed through the Internet. US viewers consume about 220GB/month of television - 300 million broadband subscribers doing this would use 66 exabytes a month of bandwidth, or $5bn worth of CDN service at current prices - and would require about 5.5 gigawatts of electricity to power the servers, or six typical nuclear power stations.