A Critique of P3P : Privacy on the Web
School of Computer Science
Carnegie Mellon University
Aug 23, 2000
The World Wide Web (W3) consortium brought the Hypertext Transfer Protocol, HTTP, that allows Browsers to talk to Web Servers. It brought the Hypertext Markup Language, HTML, that lets Browsers show what they hear from the Web Servers. It has recently brought a lot more. The eXtensible Markup Language, XML, provides a framework for automated content communication between Browsers and Web Servers. XML is widely used in merging data flow through Web Servers, and, anybody who has encountered Windows 2000 has seen XML content in lots of files that need automated content processing. XML has now naturally set the stage for automated privacy protection.
The privacy assurance proposal is called the Platform for Privacy Preferences or P3P. This serious and excellent effort by the W3 is defined authoritatively at http://www.w3.org/p3p. Today, enormous amounts of information are being collected by many thousands of web sites. While an effective technology, called SSL (Secure Sockets Layer), exists for protecting the privacy of the transaction between a Browser and a Web Server, there is no protection once the information is on the Server and in the hands of the company or organization that lured you to them.
Because P3P is an outstanding work, it deserves serious critique. It is essential to know what it does, and what it does not do. For a period of time, P3P will be a work in progress. There is opportunity to hone the edge on this knife so beautifully made.
The present critique will cover most of the facets of the platform, examining both the assumptions and implementation. It will be seen that P3P is dangerously myopic, and it needs substantial enhancement. The five areas of critical need are
(a) more specificity in declaring the purpose behind taking information,
(b) a means to establish a negotiated contract that goes beyond W3s APPEL (A P3P Preference Exchange Language),
(c) a means in the law for policing the contracts obtained
(d) a means for transitivity and universality of the protection on information, and
(e) an IETF (Internet Engineering Task Force) definition that doesnt require the web (specifically, the HTTP protocol).
It is irrelevant, in this paper, whether the W3 technical committee or the U.S. Congress should address problems with P3P. As many people as possible should deeply understand the Internet privacy debate. This is every individuals and every organizations privacy that is being negotiated through this debate.
So in P3P, the Browser, at the very beginning, exposes itself to a minimum of two invasions of privacy. The first is the first request to a Web Server page. The second is the request to the PolicyRef page specified in the first response by the Web Server. In theory, the second such request is supposed to be in a safe zone. A safe zone is simply a voluntary agreement by the Web Server not to record anything significant from the Browser that is making the request. Furthermore, if a Browser wants to be safe about the first request, it can issue a HEAD request that simply returns the Message Header from the site that contains the policy reference. This HEAD response is supposed, also, to be an action for a safe zone. Because the Web depends on client Browsers making first contact with servers, it is not clear how to avoid this potential attack on privacy by Web Servers that choose not to have these recommended safe zones. However, we can attend to this problem in a different setting at the end of this article.
HTTP defines a communication from a Browser to a Server and from a Server to a Browser. These communications each have three parts. The Browser-to-Server request in essence (1) asks the Web Server to do something, (2) explains how it wants it done, and then (3) provides additional data to do it with. The explanation part is the HTTP Message Header information. When a Web Server talks back to the Browser, it (1) tells the Browser if it did what was asked, (2) explains how it is doing it, and (3) provides the data that does it. Again, the second part is the HTTP Message Header information. The response Message Header provides the first P3P information, the web address of the PolicyRef page.
Notice there is no P3P information in the communication from the Browser to the Server. P3P is a completely one-sided service. The Server tells the Browser its Privacy Policies, and the Browser is now on its own. It tells the Browser its Privacy Policies by giving the Browser the PolicyRef page to go and fetch. The Browser can choose to do this, but it remains on its own.
This policy can say many things. It can isolate things like name and address and stipulate that they will be used one way, perhaps solely to authorize payment, while other things like email address might be used for marketing follow up. The categories of information that the web site may deal with in different ways are specified in the following list:
The above tags are not statements of the purpose for obtaining the information. They are simply referred to as hints about the purpose. Here is the list of purposes quoted exactly from the specification:
P3P clearly provides a way to stipulate the purpose to which the users information disclosure is put. This is highly commendable. Perhaps the choice of particular purposes is not so good.
This Overall Privacy Statement verifies that IBM is a member of the TRUSTe program and is in compliance with TRUSTe privacy principles. This statement discloses the privacy practices for the IBM Web (ibm.com). TRUSTe is an independent, non-profit initiative whose mission is to build users' trust and confidence in the Internet by promoting the principles of disclosure and informed consent. Because this site wants to demonstrate its commitment to your privacy, it has agreed to disclose its information practices and have its privacy practices reviewed and audited for compliance by TRUSTe. When you visit a Web site displaying the TRUSTe mark, you can expect to be notified of:
What information is gathered/tracked
How the information is used
Who information is shared with
Questions regarding this statement should be directed to email@example.com or TRUSTe for clarification.
We know that you are concerned about your privacy; so is IBM. If you provide IBM with information about yourself, such as name, postal address, e-mail address, or other personal data, we may add it to our records. From time to time you may receive information about our products, services, activities, or contacts for other business purposes, unless you request otherwise by selecting the appropriate button on the data collection page.
IBM is a global organization with legal entities operating components of our Web site worldwide. Because of the global scope of our Web, we may transfer your personal information to countries of the world which provide various levels of legal protection. Please realize that when you give us personal information, IBM will handle it in the manner we describe here. To learn more, you can read about IBM's general Internet privacy practices. Our privacy practices are designed to provide a high level of protection for your personal data, all over the world.
This Web site is maintained by the International Business Machines Corporation.
You can reach us by telephone by calling +1-416-383-9224; within North America you can reach us at 1 800-426-7777. You can also send us a message at firstname.lastname@example.org.
Please use the Back button on your Browser to return to the page where you
Yes, they said they were going to disclose your personal information to countries of the world that provide various levels of legal protection. (Did you get that far?) But dont worry, when you give us personal information, IBM will handle it in the manner we describe here. Note, there is nothing in P3P that provides an automatically confirmation that your personal information will escape the laws of the United States.
Back to the technical issues, it might be better to have very concrete, in addition to very abstract, purposes, and let people know these concrete ones are possible. So, for example, in addition to the <current/> tag that just caused some heart burn, we might have (these are made up and not in the specification):
And so forth. We have hundreds of HTTP types (the typing of data legal in HTTP data messages), it would seem we could have hundreds of very specific purposes. For people who know about the science of human intentionality, it makes sense to be able to list many specific purposes.
Before thinking that P3P is just not worth anything, it needs to be recognized that the writers of the 1.0 working draft specification are openly soliciting comments, have disclosed this specification, and have created a specification that covers all the bases that need to be covered in a basic privacy specification.
Not only do they allow the use of different kinds of information to be different, they understand that a purpose or intent is actually a simple thing to state and evaluate. They also provide explicit tags for many other contingencies such as tags that tell who the ultimate recipients of the data will be, and tags that tell the user what penalty the web site is willing to pay for misusing the data! These are all very good things to have for automated negotiation. They are laid out in a fashion that makes machine interpretation possible, and, in fact, reasonable.
The writers also explicitly say that P3P 1.0 lacks the following desirable characteristics:
In effect, P3P 1.0 lacks the ability to negotiate with the Web Server on a contract, and to make a contract with the Web Server that could be legally binding. All of this is fundamentally because the Web Server simply provides an ultimatum to the Browser. Recalling the 1960s love it or leave it, perhaps the Browser can leave the country if he doesnt want to live there, but he cant talk back.
P3P 1.0 is likely, I think, to create some unpleasant behavior for users. The user is simply warned that this web site is going to use his information for marketing purposes and will report the data to a third party. But, lets say this is his stockbroker. What does he do then? Call the Chief Counsel on the telephone to negotiate a better deal? This unpleasant behavior may be as damaging to the P3P effort as anything else. It seems certain that the working group wanted to introduce P3P in steps, but this might harm acceptance if the steps are the wrong ones.
A sane mechanism would be for his Browser to start negotiating with the Web Server to tell it what he is willing to do. The server can then decide whether it wants this persons business. Yes, it is true that the protection of privacy would now become a point of competitive advantage for companies. This willingness to protect privacy to gain business has to be balanced against their desire to grab as much information as they can get.
The P3P group clearly understands that such negotiation is going to be important in future versions of the specification. In fact, there is an affiliated group called APPEL (A P3P Preference Exchange Language, see the P3P site) that has proposed a rule-based reasoning system for privacy that is meant to go hand-in-hand with P3P. However, there is still not a mechanism in P3P, or APPEL, for the Browser to talk back to the Server about privacy, so this rule system has only limited utility at present.
It seems pretty obvious that P3P needs a means to establish a negotiated contract that goes beyond W3s APPEL. But it also needs a means in the law for policing the privacy contracts
Chances are that the P3P group is pretty skittish about suggesting that the law get involved in any of this stuff. However, as an old mentor of mine once said, The trick is to understand when the technology ends and the law begins.
The mechanism of non-repudiation mentioned as a future task for P3P provides a signed contract between the user and the server. The agreement cannot be repudiated as not having happened. This makes the contract, a contract. However, there is almost too much information in this contract. Ideally all you want to know is that the Web Server has used the information illegally. You should not have to disclose all the users that visited the Sex site and gave up their credit card information. It would seem to me that along with non-repudiation, you want to have anonymous users. This cannot happen in a non-mediated transaction as has been proposed for the direct Browser-to-Server P3P interactions. Exactly how to create this scenario around data that contains names and addresses is going to be a technically interesting challenge but one idea is will be worth considering in a moment.
The last of the serious criticisms is that P3P fails to provide a means for transitivity and universality of the protection of information. This is actually several things.
The transitivity problem is how to protect your privacy after the information is handed to somebody else.
If a violation of privacy is generally a misuse of information about you or information that you provide (e.g., a trade secret, a confidential comment to a webmaster), then there must be a way in the privacy protocol to indicate that a privacy directive is essentially non-negotiable, or negotiable only back to the original owner, and this needs to be passed on to the next possessor of the information.
Accomplishing this would be technically fairly simple unless the information changes and becomes derivative information. If something is learned and a conclusion is drawn, is the information that caused the learning binding its privacy directives on the conclusion? This is a hard problem, not an easy one.
One solution is to create directives on derivative information. Essentially the directive says that the purpose of the Web Server is to record information so that conclusions or derivative observations can be obtained. This information now becomes the property of the owner of the Web Server.
Conversely, a User may say to the Web Server that it can use the information to clear a credit card or to give the user a registration account, but that any derivative information must be restricted to just this. A Web Server taking this information could not pass it on or provide it as the basis for developing new knowledge.
These two cases, actually, are potentially handled with existing tags in P3P. What is not is the case where the requirement is that the information can be passed along but that the new owner must preserve the privacy conditions. There is no mechanism in P3P for preserving the integrity of the use to which the information can be put. A particularly useful case of this might be the case where your personal information can be passed along but only non-identifiable summaries used for marketing purposes.
The universality of the protection of information is yet another problem with the current specification. The way P3P is set up, the user has to set up each Browser or User Agent that he uses. If he works on ten machines at the office and at home he has to take care to make sure all the machines utilize the same privacy policies, or he might as well have no privacy policies at all.
Another final criticism of P3P is that it is a web-only solution to privacy when we know that the Internet involves much more than simply the web. For example, email moves by other protocols (mainly SMTP), not HTTP. There is no way in P3P to say that the mail you send to a company contains information that should only be used in ways that you restrict. Most lawyers would not want to tell you about how little legal sting is in their messages to treat an email as confidential.. They would have to show that you agreed to treat the email as confidential. It is easy to argue that you accidentally read the mail before seeing the confidentiality statement. Without negotiated agreement, P3P is completely ill suited to mail routing. However, a strong case can be made, I think, that P3P describes precise and essential building blocks for a solution to privacy in mail and many other Internet protocols. P3P should not be thrown away; it should be built upon. But perhaps one place that should support P3P would be a configuration of non-HTTP servers that serve up information packages, such as credit card purchases, names and addresses, and, perhaps, even passwords. This would be an effort for the IETF, I think, because it would act like the Domain Name System in resolving information requests for all other communications protocols on the Internet.
I have no doubt that many members of the W3 P3P working group have thought through many if not most of the concerns expressed in this article. However, these are not the people who are likely to talk about such concerns since their main interest is in getting P3P accepted. Pointing out flaws like the ones pointed out above dont, on the surface, look like help in getting P3P accepted. But my argument is the opposite. It is probably better for a third party to speak out on these and to invite more vigorous public discussion. This is precisely because P3P takes us in the right direction. It deserves to be supported and added to. P3P clearly represents a good start. People in all aspects of the Internet socio-economic-political system need to sit up and think this through for themselves. Privacy will have a widespread and deep influence on the economic vitality of cyberspace. Information is power, and privacy management is the control, and thereby the economic unleashing, of that power.
A common misconception about Negotiated P3P is that it is too complicated to negotiate. It does seem true that a simplistic negotiation strategy that negotiates every XML tag setting will create a search space that is much too large. Negotiation should take a microsecond and be invisible to the user and easy to manage for the web site guys. How can this be done? I wrote another paper, http://drachma.ecom.cmu.edu/pspnote/, that describes how this can plausibly be done which isn't read by many people (unlike this paper which has been read by thousands). That paper has a bunch of math and doesn't just come out and say what I was thinking so here goes: