Expanding Signal GIF search

jlund on 01 Nov 2017

Signal for iOS displaying GIF search results

Today’s Signal beta for iOS includes support for animated GIF search. Signal iOS has long supported sending and receiving GIFs, but today’s beta adds support for browsing and searching popular GIFs from within Signal.

We previously announced experimental support for animated GIF search in Signal Android, which we’re now bringing to iOS, along with some privacy updates to the process.

A brief replay on GIFs and privacy

GIF search engines like GIPHY provide network APIs that allow an app to easily expose trending and search functionality for GIFs. For instance, if someone messages you with an invitation, you might want to write back with a message that says “I’m excited.” With integrated GIF search, you could instead do a GIF search for “I’m excited” and send one of the results instead.

Of course, as you type your search, it’s transmitted over the network to the GIF search engine:

http://api.giphy.com/v1/gifs/search?q=I&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+e&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+ex&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+exc&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+exci&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+excit&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+excite&api_key=dc6zaTOxFJmzC
http://api.giphy.com/v1/gifs/search?q=Im+excited&api_key=dc6zaTOxFJmzC

In order to hide your search term from GIPHY, the Signal service acts as a privacy-preserving proxy.

When querying GIPHY:

The Signal app opens a TCP connection to the Signal service.
The Signal service opens a TCP connection to the GIPHY HTTPS API endpoint and relays bytes between the app and GIPHY.
The Signal app negotiates TLS through the proxied TCP connection all the way to the GIPHY HTTPS API endpoint.

Since communication is done via TLS all the way to GIPHY, the Signal service never sees the plaintext contents of what is transmitted or received. Since the TCP connection is proxied through the Signal service, GIPHY doesn’t know who issued the request.

The Signal service essentially acts as a VPN for GIPHY traffic: the Signal service knows who you are, but not what you’re searching for or selecting. The GIPHY API service sees the search term, but not who you are.

Looping back

This has worked well, but we have also been thinking about ways to improve resistance to traffic analysis. If the Signal service were malicious, it could measure the amount of data being transmitted in order to discern something about the GIFs being retrieved from GIPHY.

The most common way to mitigate an attack like that is through the introduction of plaintext padding. Including a random amount of padding at the end of each GIF would make it more difficult for the Signal service to correlate the amount of data it sees being transmitted with a known GIF.

The problem, however, is that we don’t control the content. How can you pad plaintext content that you don’t control?

In range of the solution

The RFC 7233 specification allows HTTP clients to indicate which portions of a file they would like to receive from a remote server. The client passes a Range header in its request, and the server delivers the partial content within that byte range. Among other things, this functionality allows your browser to resume interrupted downloads, begin displaying large documents immediately, and quickly seek to a given position within long videos.

We can also abuse range requests to simulate padding on content we don’t control.

A diagram of bytes being requested with overlapping range requests

In the diagram above, a client wishes to download a 13-byte file. However, the client doesn’t wish to reveal to the network that it has retrieved exactly 13 bytes.

Instead of making a normal request, it picks a block size (in this case 6 bytes), and issues sequential range requests for that amount. For the third and final request, there is only 1 byte remaining to be retrieved, but it instead makes an overlapping request for the final 6 bytes, and discards the first 5 bytes of the final request.

The client has just successfully “padded” this 13-byte piece of content by 5 bytes, making it more difficult for any network observer to determine the true length of what was retrieved.

Giving it a spin

Feel free to follow along with the action in your terminal as we try this strategy on the following GIF:

First, we’ll determine the size of the target file and verify that the server supports range requests:

$ curl -s -I 'https://media.giphy.com/media/k9gFJo5DMijbW/giphy.gif' | egrep 'Content-Length|Accept-Ranges'
Content-Length: 1965425
Accept-Ranges: bytes

We download the first segment of the file using a 1MB range size (specified in bytes):

$ curl -o giphy.gif.part01 --range 0-1048575 'https://media.giphy.com/media/k9gFJo5DMijbW/giphy.gif'

Next we’ll download the second segment of the file (also using a 1MB range size) which will partially overlap with the first.

$ curl -o giphy.gif.part02 --range -1048576 'https://media.giphy.com/media/k9gFJo5DMijbW/giphy.gif'

We need to discard the overlapping bytes in the second segment. We take the combined byte size of the two segments and subtract the byte size of the original file. This leaves us with 131727 bytes that need to be trimmed from the second segment:

$ dd bs=131727 skip=1 if=giphy.gif.part02 of=giphy.gif.part02-trimmed

Now we are ready to combine our two file segments together:

$ cat giphy.gif.part01 giphy.gif.part02-trimmed > giphy-combined.gif

Finally, we can verify that the combined file is the same as the original:

$ curl -s -o giphy-original.gif https://media.giphy.com/media/k9gFJo5DMijbW/giphy.gif
$ shasum -a 256 giphy-original.gif giphy-combined.gif 

When TLS-encrypted requests are sent through the tunneled Signal service using this strategy, we’re replacing a single transfer of 1965425 bytes with two identically sized transfers for blocks of 1MB each.

As a result, the Signal proxy service only sees repeated requests of a block size when routing traffic, which should make it more difficult to identify the content of that traffic. We’re continuing to look at additional measures like randomizing the order of results and randomizing requests striped across multiple downloads.

Fighting austerity with a little excess

We don’t believe that privacy is about austerity. Communication should be expressive and fun. We want you to avoid sending plaintext, but that doesn’t mean that your texts should always be so plain. Send an animated GIF today, and let us know what you think.