vLine Blog - WebRTC Video Chat - Tunneling WebRTC over TCP (and why it matters)

A couple of weeks ago, we quietly turned on support for dual-sided TCP tunneling in the vLine Cloud, becoming the first WebRTC infrastructure provider to support connecting through firewalls that block UDP. This may not sound interesting or important, but it actually makes the difference between having a service that “usually connects” and one that “just works”. Let us explain:

One of the many great things about WebRTC is that it’s relatively easy to get started. Bring up an instance of the apprtc backend for signaling, copy and paste some JavaScript, and, voila, you’re making video calls in your app (actually, it’s a little harder than that, but a good web developer can easily have demoable video chat up and running in a day or two).

Unfortunately, the road from demo to production-grade service can be more challenging than you might expect (and more expensive!). Here’s how it usually goes:

Level 1: STUN

You start by making your first few calls over a local area network, and everything works great. Hooray! Then you try to make a call to someone outside your firewall, and one of two things will happen.

1) If you happened to copy and paste the address of Google’s STUN server from the apprtc source code, your call will go through, and you’ll be a happy camper (though you may have some lingering doubts about whether it’s ok to use an undocumented service that Google has not given third party developers explicit permission to use. Note the silence from Google on this thread).

2) If you don’t have a STUN server configured, your call will fail. A little research will reveal that STUN is a protocol that the browser uses to determine its public IP address and poke a hole in the firewall. So, if you want to connect through a firewall, you’ll need a STUN server. A few hours later you have an open source server up and running on EC2. A small instance should do just fine ($43.92 per month), but you’ll probably want to run at least two of them for availability, preferably in different regions (make that $87.84 per month).

Level 2: TURN

You do a few more test calls, and they all work. Things are looking good. Then you try to make a call between two corporate networks, and it fails. Grrr. While you were researching STUN, you read about another protocol called TURN that’s used to relay data in cases when the browser can’t establish a peer-to-peer connection. You weren’t sure if it was strictly necessary, but some more research reveals that STUN is only sufficient for connecting about 80% of calls. If that’s not enough for you (and it probably isn’t), you’ll need a TURN server.

A few mailing list threads later, and you’ve got a TURN server up and running on your EC2 instance. Actually, the network throughput on a small instance can be pretty unpredictable, if anyone else is using your shared network interface, so you should think about getting a bigger instance. A medium instance ($87.84 per month) works pretty well, but for the most predictability and lowest jitter, you’ll want an extra large ($351.36 per month), which will get you “high network performance”. Actually, make that two ($703.52 per month), for availability.

Of course, since you’re relaying video, you’ll need to factor in the bandwidth costs as well. Base pricing on EC2 is $0.12 per GB. As you’re running the numbers on this, you may start to wonder what prevents someone else from using that public server you just set up and running up your bandwidth bills. Here’s a good mailing list thread on the subject. Summary: there’s not a great way to prevent this given the way the TURN protocol works and that the TURN credentials have to be present in your JavaScript, where anyone can find them.

But let’s not get caught up in the dollars and cents. You can now make calls to your friends at other tech companies. Awesome! Then you try to make a call to someone at a big, corporate, non-tech company, and it fails. Dad-gum. You thought TURN had you covered.

20 minutes later, after a little more research, you discover that Chrome’s TURN allocation implementation only supports relaying UDP packets. Chrome 28 will add support for allocating a TURN server over TCP, but the packets will still be relayed through UDP. Whoops, that still doesn’t solve your problem when the firewall blocks UDP traffic.

Level 3: vLine Cloud

This is when our new TCP-tunneling support comes into play. It doesn’t rely on Chrome’s TURN implementation, so it works in Chrome today. Furthermore, it works even if both parties are behind firewalls that block UDP. All that’s required is access to the internet over port 443 (the HTTPS port), which the vast majority of firewalls allow.

You don’t need to do anything special to enable TCP tunneling in your vLine service. Just use vline.js to build your app, and we’ll connect using the best available method for any given call. We run a highly-available global network of servers, so we’ll provide the best possible call quality to all of your users, anywhere in the world, even behind firewalls that block everything except TCP traffic over the HTTPS port. In case you’re wondering, we’re still doing end-to-end DTLS, so our servers never see your unencrypted media streams.

Our goal is a 100% connect rate. If you have a network where calls aren’t connecting, please let us know.

Note 1: If you want to test this yourself by blocking UDP on your firewall, remember to leave the DNS port (53) open.

Note 2: Some ultra-restrictive firewalls that do stateful packet inspection may still block connections since, even though the browser is using the HTTPS port, it isn’t actually doing SSL/TLS (we’ve never actually encountered a firewall like this in the wild, but they do exist). Chrome will soon support making WebRTC connections over TLS, at which point we will work through these firewalls, as well.