Today’s Signal release for Android and iOS includes beta support for video calls.
This represents an entirely new calling infrastructure for Signal, and should increase voice call quality as well. We think it’s a big improvement, but we’re rolling it out in stages to collect feedback from people with different devices, networks, and regions in order to ensure there are no surprises when it’s enabled for everyone by default.
To help us test on Android and iOS, toggle Signal Settings -> Advanced -> Video calling beta.
Only calls between Signal users who have both enabled the video calling beta will use the new calling system.
Enabling the new calling features on iOS will also allow you to take advantage of CallKit in iOS 10. This allows incoming Signal calls to be answered with one touch, directly from the lock screen. Keep in mind, this only takes effect when both parties are using the new calling system.
CallKit offers a native calling experience to VoIP apps like Signal. As well as being able to answer calls directly from your lock screen, you’ll also see Signal calls in the system’s “Recent Calls” list. This is because iOS treats CallKit calls like any other call, however that also means some information will be synced to iCloud if enabled. This information includes who you called and how long you talked.
If you decide that’s not for you, you can opt-out of the CallKit features at any time in Settings > Advanced > Use CallKit, while continuing to use the rest of the new calling system.
Signal used to be two different applications, one for texting and one for real-time voice calls. Until now, the code for text and voice has remained relatively separate, even after both pieces were merged into one application.
At a high level, most VoIP applications consist of two parts: signaling and streaming. There is a “signaling” pathway that sets up a call and negotiates some media streams, and then a “streaming” pathway that actually transmits the audio, video, or data once the call has “begun.”
Traditionally, the signaling pathway in VoIP applications used protocols like SIP, while the streaming pathway used RTP or SRTP.
We originally developed voice calling in the early days of mobile devices, when everyone was still figuring out that world, and when projects like WebRTC didn’t yet exist. We immediately realized that protocols like SIP, which traditionally required holding open long-lived connections in order to receive incoming calls, were not going to be compatible with the mobile environment.
Instead we built our own simple REST-based signaling protocol (seems hard to believe now, but not using SIP was controversial at the time!), and used push notifications instead of long-lived connections to notify the client of incoming calls. Actual push notifications hadn’t been invented yet, though, so we created our own push infrastructure by sending encoded SMS messages that the app would silently intercept and interpret instead.
Over time, we switched to push notifications when they were created by Google and Apple, and have also slowly carved out parts of the original custom media streaming code to replace it with components from WebRTC as that project has developed.
This update completes the transition by using all of WebRTC and fully migrating the call-signaling pathway to use the Signal Protocol messaging channel for call setup.
Securing VoIP requires two high-level considerations: how data is encrypted, and how data is encoded.
A VoIP call is a synchronous interaction with a relatively ephemeral lifetime, so VoIP encryption is fairly straightforward. Clients can do a public key exchange, negotiate a shared secret, and use that to initiate SRTP streams.
Of primary concern is how the key exchange is authenticated. If it isn’t authenticated properly or at all, the call can be intercepted with a MITM, and shouldn’t be considered secure. Signal has traditionally used ZRTP for negotiating and authenticating the keys used to encrypt the call. ZRTP key exchanges are authenticated by two words (called a “short authentication string,” or SAS) that users can read over the phone to each other in order to ensure that the call was not intercepted.
ZRTP is a clever protocol, but the SAS has always felt a little awkward and bolted on, since Signal already has a separate authenticated Signal Protocol channel used for text messages. The SAS is fairly easy (and sometimes fun), but users shouldn’t have to verify an extra thing.
The new Signal voice and video beta functionality eliminates the need for ZRTP. The “signaling” messages used to set up the voice/video beta calls (offer/answer SDPs, ICE candidates, etc.) are transmitted over the normal Signal Protocol messaging channel, which binds the security of the call to that existing secure channel. It is no longer necessary to verify an additional SAS, which simplifies the calling experience.
The other high-level consideration is data encoding. Even if audio packets are encrypted, variable bitrate codecs can create a side channel that reveals spoken words through observed packet sizes and other features. The new voice and video beta upgrades the audio codec from Speex to Opus, but Signal is still careful to avoid information leaks by using Opus as a CBR rather than VBR codec and minimizing other RTP header metadata.
You can check out any of these changes in more detail (or get involved with development) at our GitHub repositories.