Referrer URLs and Privacy Risks

Knowledge Center ❯ Blog

The Wall Street Journal’s recent article in the “What They Know” series discussed the problem of Facebook IDs being passed to ad networks. This is a serious potential privacy risk – and most Facebook applications are impacted by this issue.

The underlying issue is with a piece of the HTTP header called the referrer URL. We recognize that referrer URLs are a major industry-wide problem with the structure of internet security, so Rapleaf has taken extra steps to strip out identifying information from referrer URLs.

When we discovered that Facebook ids were being passed to ad networks by applications that we work with, we immediately researched the cause and implemented a solution to cease the transmissions. As of last week, no Facebook ids are being transmitted to ad networks in conjunction with the use of any Rapleaf service. The transmissions, when they occurred, were not a result of any purposefully engineered process by Rapleaf. Instead, they were due to broader issues — as discussed in the article — concerning site referrer URLs, which are managed by sites themselves and ad networks.

We are committed to working with the industry to fix these issues, and all issues that may emerge in the future from this complex ecosystem. Our mission is that everyone can have a personalized experience on the web that is safe and anonymous, and we will continue to work hard to make this a reality.

Below are more details about referrer URLs and steps the industry should take to eliminate the privacy risk.

Referrer URLs have been core to the web since its creation in the early 1990s. They have a number of useful functions, such as helping a website administrator understand which other websites link to her site and how her visitors are finding her. But referrer URLs come with a cost to user privacy, a cost that is not widely acknowledged and is generally underestimated.

What are “Referrer URLs”?

When you visit the new frozen yogurt shop in town, the owner might ask you how you heard about the store, and you might reply that you read about it in Bob Friedman’s column in the County Chronicle. This referral information is valuable to the yogurt shop owner because it helps her understand which of her marketing efforts are successful (in this case, soliciting Bob’s column).

A similar process is at work on the web. If you click a link on the countychronicle.com website that takes you to tastyfroyo.com, your browser will typically tell the tastyfroyo.com web server that you are coming from a particular page at countychronicle.com. More specifically, your browser makes an HTTP request for a page on the tastyfroyo.com server, and within this HTTP request is a field named “Referer” (yes, that is how it is spelled, or misspelled) that contains the URL to the linking county chronicle page.

What’s the problem?

If you didn’t want to tell the frozen yogurt owner that you learned about her business from Bob Friedman’s column, you could have chosen not to tell her (you might not want her knowing that you’re the kind of person who reads the County Chronicle). But on the web, your browser will automatically send the referrer URL. (Some browsers allow you to disable the transmission of the referrer URL, but this is rarely done.) This automatic referrer transmission leads to two types of privacy problems.

One problem is that if you visit a site anonymously (that is, you do not provide your identity), the site could potentially discover your identity based on information passed along in the referrer URL, thus breaching the principle of presumed anonymity. This problem is discussed in a good handful of places, including here and here. The HTTP1.1 specification acknowledges it as well, articulating, “Although [the Referer URL] can be very useful, its power can be abused if user details are not separated from the information contained in the Referer.”

A second problem is less discussed. If you are visiting a site that knows your identity (i.e. any site you’re logged into), then this site may receive referrer URLs of other pages on the web that you have visited. For example, you may visit a web page about a particular medical condition, click a link on that page to a site that knows your identity, and now that site can associate your identity with having visited that particular medical webpage.
These problems are compounded by the fact that in actuality most web pages are composed not of a single HTTP request, but many dozens of HTTP requests (every image on a web page, for example, is a separate HTTP request). Factoring in iframes and redirects, the prescribed behavior of referrer URLs isn’t even always clear.

What should be done?

Firstly, web sites must take care when linking to external web sites to not include personally identifying information that may get placed in referral URLs. Many major sites do a good job with this, though it is not always easy to be comprehensive.

Secondly, we need to give deeper thought to whether or not the privacy risks associated with referral URLs can be adequately managed. Referral URLs are used by most web sites for constructive purposes (e.g. link statistics, or preventing hotlink bandwidth theft). If the privacy risks cannot be managed, then privacy-centric browsers may decide to turn off referrer URLs entirely.

Referrer URLs and Privacy Risks

Talk with an Email Expert