A Critique of P3P : Privacy on the Web

 

Robert Thibadeau, Ph.D.

School of Computer Science

Carnegie Mellon University

Pittsburgh PA

Aug 23, 2000

 

Postscript:2004

 

The World Wide Web (W3) consortium brought the Hypertext Transfer Protocol, HTTP, that allows Browsers to talk to Web Servers.  It brought the Hypertext Markup Language, HTML, that lets Browsers show what they hear from the Web Servers.  It has recently brought a lot more.  The eXtensible Markup Language, XML, provides a framework for automated content communication between Browsers and Web Servers.  XML is widely used in merging data flow through Web Servers, and, anybody who has encountered Windows 2000 has seen XML content in lots of files that need automated content processing.  XML has now naturally set the stage for automated privacy protection.

 

The privacy assurance proposal is called the Platform for Privacy Preferences or P3P. This serious and excellent effort by the W3 is defined authoritatively at http://www.w3.org/p3p.   Today, enormous amounts of information are being collected by many thousands of web sites.  While an effective technology, called SSL (Secure Sockets Layer), exists for protecting the privacy of the transaction between a Browser and a Web Server, there is no protection once the information is on the Server and in the hands of the company or organization that ‘lured’ you to them.

 

Because P3P is an outstanding work, it deserves serious critique.  It is essential to know what it does, and what it does not do.  For a period of time, P3P will be a work in progress.  There is opportunity to hone the edge on this knife so beautifully made. 

 

The present critique will cover most of the facets of the platform, examining both the assumptions and implementation.  It will be seen that P3P is dangerously myopic, and it needs substantial enhancement.  The five areas of critical need are

 

(a)    more specificity in declaring the purpose behind taking information,

(b)   a means to establish a negotiated contract that goes beyond W3’s APPEL (A P3P Preference Exchange Language),

(c)    a means in the law for policing the contracts obtained

(d)   a means for transitivity and universality of the protection on information, and

(e)    an IETF (Internet Engineering Task Force) definition that doesn’t require the web (specifically, the HTTP protocol). 

 

It is irrelevant, in this paper, whether the W3 technical committee or the U.S. Congress should address problems with P3P.  As many people as possible should deeply understand the Internet privacy debate.  This is every individual’s and every organization’s privacy that is being negotiated through this debate.

 

P3P works as a series of HTTP communications.   The first is a Browser request to a Web Server for a file or an action.  In this communication, the Browser says nothing about privacy to the Web Server.  However, the Web Server responds to the Browser with whatever the Browser asked for, plus a special reference to a Privacy Policy Reference page.   The Browser or person operating it, can now determine what do with the Web Server’s response based on the Privacy Policy Reference page provided by a second HTTP request.  The Browser reads the Policy-Ref page and decides what to do.  This PolicyRef page is in the language of XML. It has many very definite things it can say.  A Privacy policy reference page is very special and can be used to determine whether the Browser should ever come back to that Web Server again, and whether information from a form on a web page should be sent to that Web Server.

 

So in P3P, the Browser, at the very beginning, exposes itself to a minimum of two invasions of privacy.  The first is the first request to a Web Server page.  The second is the request to the PolicyRef page specified in the first response by the Web Server.   In theory, the second such request is supposed to be in a “safe zone”.  A safe zone is simply a voluntary agreement by the Web Server not to record anything significant from the Browser that is making the request.   Furthermore, if a Browser wants to be safe about the first request, it can issue a “HEAD” request that simply returns the Message Header from the site that contains the policy reference.  This HEAD response is supposed, also, to be an action for a safe zone.  Because the Web depends on client Browsers making first contact with servers, it is not clear how to avoid this potential attack on privacy by Web Servers that choose not to have these recommended safe zones.  However, we can attend to this problem in a different setting at the end of this article.

 

HTTP defines a communication from a Browser to a Server and from a Server to a Browser.  These communications each have three parts.  The Browser-to-Server request in essence (1) asks the Web Server to do something, (2) explains how it wants it done, and then (3) provides additional data to do it with.   The explanation part is the “HTTP Message Header” information.  When a Web Server talks back to the Browser, it  (1) tells the Browser if it did what was asked, (2) explains how it is doing it, and (3) provides the data that does it.  Again, the second part is the “HTTP Message Header” information. The response Message Header provides the first P3P information, the web address of the PolicyRef page. 

 

Notice there is no P3P information in the communication from the Browser to the Server.  P3P is a completely one-sided service.  The Server tells the Browser its Privacy Policies, and the Browser is now on its own.   It tells the Browser its Privacy Policies by giving the Browser the PolicyRef page to go and fetch.  The Browser can choose to do this, but it remains on it’s own.

 

The Browser fetches the PolicyRef page to decide what to do.  Here the P3P information is in the content of the page itself, and it is encoded in an elaborate XML language as well as, possibly, an HTML presentation for the benefit of a human being who wants to read the privacy policy.   The Browser, or a program in the Browser called the “User Agent”, decides unilaterally whether to accept the privacy policy presented to it. 

 

This policy can say many things.  It can isolate things like name and address and stipulate that they will be used one way, perhaps solely to authorize payment, while other things like email address might be used for marketing follow up.   The categories of information that the web site may deal with in different ways are specified in the following list:

 

  1. <physical/>  Physical Contact Information
  2. <online/> Online Contact Information
  3. <uniqueid/> Unique Identifiers
  4. <purchase/> Purchase Information
  5. <financial/> Financial Information
  6. <computer/> Computer Information
  7. <navigation/> Navigation and Click-stream Data
  8. <interactive/> Interactive Data
  9. <demographic/> Demographic and Socioeconomic Data
  10. <content/> Content
  11. <state/> State Management Mechanisms
  12. <political/> Political Information
  13. <health/>     |Health Information
  14. <preference/> Preference Data
  15. <other/> Other

 

The above tags are not statements of the purpose for obtaining the information.  They are simply referred to as “hints” about the purpose.  Here is the list of purposes quoted exactly from the specification:

 

  1. “<current/>   Completion and Support of Current Activity: Information may be used by the service provider to complete the activity for which it was provided, such as the provision of information, communications, or interactive services -- for example to return the results from a Web search, to forward email, or place an order.
  2. <admin/>  Web Site and System Administration: Information may be used for the technical support of the Web site and its computer system. This would include processing computer account information, and information used in the course of securing and maintaining the site.
  3. <develop/> Research and Development: Information may be used to enhance, evaluate, or otherwise review the site, service, product, or market. This does not include personal information used to tailor or modify the content to the specific individual nor information used to evaluate, target, profile or contact the individual.
  4. <customization/> Affirmative Customization: Information may be used to tailor or modify the content or design of the site only to specifications affirmatively selected by the particular individual during a single visit or multiple visits to the site. For example, a financial site that lets users select several stocks whose current prices are displayed whenever the user visits.
  5. <tailoring/> One-time Tailoring: Information may be used to tailor or modify content or design of the site not affirmatively selected by the particular individual where the information is used only for a single visit to the site and not used for any kind of future customization. For example, an online store that suggests other items a visitor may wish to purchase based on the items he has already placed in his shopping basket.
  6. <pseudonym/>  Pseudononymous Profiling: Information may be used to create or build a record of a particular individual or computer that is tied to a pseudononymous identifier, without tying personally-identifiable information (such as name, address, phone number, email address, or IP address) to the record. This profile will be used to determine the habits, interests, or other characteristics of individuals, but it will not be used to attempt to identify specific individuals.
  7. <profiling/>  Individual Profiling: Information may be used to create or build a record on the particular individual or computer for the purpose of compiling habits or personally identifiable information of that individual or computer. For example, an online store that suggests items a visitor may wish to purchase based on items he has purchased during previous visits to the web site.
  8. <contact/> Contacting Visitors for Marketing of Services or Products: Information may be used to contact the individual for the promotion of a product or service. This includes notifying visitors about updates to the Web site.
  9. <other-purpose> string </other-purpose> Other Uses: Information may be used in other ways not captured by the above definitions. (A human readable explanation should be provided in these instances).”

 

P3P clearly provides a way to stipulate the purpose to which the user’s information disclosure is put.  This is highly commendable.  Perhaps the choice of particular purposes is not so good.

 

As one example of this in action, let us take the case of giving your name, credit card, and address information for an order.  Basically the site that wants you to feel safe can say that this information will be used for it’s current purpose as explained on the page you saw.  Yes.  So, for example, if I print in very fine print at the bottom of the page that my current purpose is to give your credit card number to the first thief I can find, I have fulfilled my obligation.  I might even declare that this information is of the type “purchase” but that is supposed to only be a “hint” as to how it might be used.  If you happen to read the fine print, you know what is going to happen to your credit card.  A lawyer might argue otherwise, but the fact is that the only thing in writing from the Web Server is that the purpose is stipulated to be written on the page and the page says that the purpose of taking the credit card information is to hand it to a thief (as well as, probably, to make a payment, without ambiguity).  I might even create a “TrustUS” symbol and put it at the top of my purchase page and on my privacy policy page.

 

If you don’t think companies will try to use ploys to get you to trust them, read the IBM privacy policy on the IBM P3P Editor site:

 

“This Overall Privacy Statement verifies that IBM is a member of the TRUSTe  program and is in compliance with TRUSTe privacy principles. This statement discloses the privacy practices for the IBM Web (ibm.com).  TRUSTe is an  independent, non-profit initiative whose mission is to build users' trust and confidence in the Internet by promoting the principles of disclosure and informed consent. Because this site wants to demonstrate its commitment to your privacy,  it has agreed to disclose its information practices and have its privacy practices reviewed and audited for compliance by TRUSTe. When you visit a Web site displaying the TRUSTe mark, you can expect to be notified of:

 

        What information is gathered/tracked

        How the information is used

        Who information is shared with

 

 

Questions regarding this statement should be directed to askibm@vnet.ibm.com or TRUSTe for clarification.

 

 

We know that you are concerned about your privacy; so is IBM. If you provide IBM with information about yourself, such as name, postal address, e-mail address, or other personal data, we may add it to our records. From time to time you may receive information about our products, services, activities, or contacts for other business purposes, unless you request otherwise by selecting the appropriate button on the data collection page.

 

 

IBM is a global organization with legal entities operating components of our Web site worldwide. Because of the global scope of our Web, we may transfer your  personal information to countries of the world which provide various levels of legal protection.  Please realize that when you give us personal information, IBM will handle it in the manner we describe here. To learn more, you can read about IBM's general Internet privacy practices. Our privacy practices are designed to provide a high level of protection for your personal data, all over the world.

 

 

 

 

 

This Web site is maintained by the International Business Machines Corporation.

 

You can reach us by telephone by calling +1-416-383-9224; within North America you can reach us at 1 800-426-7777. You can also send us a message at askibm@vnet.ibm.com.

 

 

Please use the Back button on your Browser to return to the page where you

were. “

 

 

Yes, they said they were going to disclose your personal information to countries of the world that provide “various levels of legal protection.” (Did you get that far?) But don’t worry, “when you give us personal information, IBM will handle it in the manner we describe here.”  Note, there is nothing in P3P that provides an automatically confirmation that your personal information will escape the laws of the United States.

 

I have used the IBM privacy policy as an example precisely because I also believe firmly that IBM can be trusted well above most other firms and entities.  I do think they could re-word their policy particularly by throwing out TRUSTe, or making that a side note.   I am not going to trust TRUSTe more than IBM.  I certainly don’t know who these people are and really don’t care.  Anybody calling themselves “trust” can’t be trusted by definition.  (You earn trust, you don’t declare it, and IBM has earned a great deal of trust).  Anyway, the purpose of this example was to show that even IBM is engaging in privacy policy behavior that does not, in itself, help the privacy issue along very far.

 

Back to the technical issues, it might be better to have very concrete, in addition to very abstract, purposes, and let people know these concrete ones are possible.  So, for example, in addition to the <current/> tag that just caused some heart burn, we might have (these are made up and not in the specification):

  1. <payment/> The binding purpose is to obtain payment for the order.
  2. <delivery/> The binding purpose is to deliver the order to the address.
  3. <web_search/> The binding purpose is to perform the current web search.
  4. <export/> The binding purpose is to export the data to the authority of another country.

And so forth. We have hundreds of “HTTP types” (the typing of data legal in HTTP data messages), it would seem we could have hundreds of very specific purposes.  For people who know about the science of human intentionality, it makes sense to be able to list many specific purposes.

 

Before thinking that P3P is just not worth anything, it needs to be recognized that the writers of the 1.0 working draft specification are openly soliciting comments, have disclosed this specification, and have created a specification that covers all the bases that need to be covered in a basic privacy specification. 

 

Not only do they allow the use of different kinds of information to be different, they understand that a purpose or intent is actually a simple thing to state and evaluate.  They also provide explicit tags for many other contingencies such as tags that tell who the ultimate recipients of the data will be, and tags that tell the user what penalty the web site is willing to pay for misusing the data!  These are all very good things to have for automated negotiation.  They are laid out in a fashion that makes machine interpretation possible, and, in fact, reasonable.

 

The writers also explicitly say that P3P 1.0 lacks the following desirable characteristics:

 

 

In effect, P3P 1.0 lacks the ability to negotiate with the Web Server on a contract, and to make a contract with the Web Server that could be legally binding.  All of this is fundamentally because the Web Server simply provides an ultimatum to the Browser.  Recalling the 1960’s “love it or leave it,” perhaps the Browser can leave the country if he doesn’t want to live there, but he can’t talk back.

 

P3P 1.0 is likely, I think, to create some unpleasant behavior for users.  The user is simply warned that this web site is going to use his information for marketing purposes and will report the data to a third party.  But, let’s say this is his stockbroker.  What does he do then?  Call the Chief Counsel on the telephone to negotiate a better deal?  This unpleasant behavior may be as damaging to the P3P effort as anything else.  It seems certain that the working group wanted to introduce P3P in steps, but this might harm acceptance if the steps are the wrong ones.

 

A sane mechanism would be for his Browser to start negotiating with the Web Server to tell it what he is willing to do.  The server can then decide whether it wants this person’s business.  Yes, it is true that the protection of privacy would now become a point of competitive advantage for companies.   This willingness to protect privacy to gain business has to be balanced against their desire to grab as much information as they can get.

 

The P3P group clearly understands that such negotiation is going to be important in future versions of the specification.  In fact, there is an affiliated group called APPEL (A P3P Preference Exchange Language, see the P3P site) that has proposed a rule-based reasoning system for privacy that is meant to go hand-in-hand with P3P.  However, there is still not a mechanism in P3P, or APPEL, for the Browser to talk back to the Server about privacy, so this rule system has only limited utility at present.

 

It seems pretty obvious that P3P needs a means to establish a negotiated contract that goes beyond W3’s APPEL.    But it also needs a means in the law for policing the privacy contracts 

 

Chances are that the P3P group is pretty skittish about suggesting that the law get involved in any of this stuff.  However, as an old mentor of mine once said, “The trick is to understand when the technology ends and the law begins.” 

 

The mechanism of non-repudiation mentioned as a future task for P3P provides a “signed” contract between the user and the server.   The agreement cannot be repudiated as not having happened.  This makes the contract, a contract.  However, there is almost too much information in this contract.  Ideally all you want to know is that the Web Server has used the information illegally.  You should not have to disclose all the users that visited the Sex site and gave up their credit card information.  It would seem to me that along with non-repudiation, you want to have anonymous users.  This cannot happen in a non-mediated transaction as has been proposed for the direct Browser-to-Server P3P interactions.  Exactly how to create this scenario around data that contains names and addresses is going to be a technically interesting challenge but one idea is will be worth considering in a moment.

 

The last of the serious criticisms is that P3P fails to provide a means for transitivity and universality of the protection of information.  This is actually several things.

 

The transitivity problem is how to protect your privacy after the information is handed to somebody else. 

 

If a violation of privacy is generally a misuse of information about you or information that you provide (e.g., a trade secret, a confidential comment to a webmaster), then there must be a way in the privacy protocol to indicate that a privacy directive is essentially non-negotiable, or negotiable only back to the original owner, and this needs to be passed on to the next possessor of the information. 

 

Accomplishing this would be technically fairly simple unless the information changes and becomes derivative information.  If something is learned and a conclusion is drawn, is the information that caused the learning binding its privacy directives on the conclusion?  This is a hard problem, not an easy one. 

 

One solution is to create directives on derivative information.  Essentially the directive says that the purpose of the Web Server is to record information so that conclusions or derivative observations can be obtained.  This information now becomes the property of the owner of the Web Server. 

 

Conversely, a User may say to the Web Server that it can use the information to clear a credit card or to give the user a registration account, but that any derivative information must be restricted to just this.  A Web Server taking this information could not pass it on or provide it as the basis for developing new knowledge. 

 

These two cases, actually, are potentially handled with existing tags in P3P.  What is not is the case where the requirement is that the information can be passed along but that the new owner must preserve the privacy conditions.  There is no mechanism in P3P for preserving the integrity of the use to which the information can be put.  A particularly useful case of this might be the case where your personal information can be passed along but only non-identifiable summaries used for marketing purposes.

 

The universality of the protection of information is yet another problem with the current specification.  The way P3P is set up, the user has to set up each Browser or “User Agent” that he uses.  If he works on ten machines at the office and at home he has to take care to make sure all the machines utilize the same privacy policies, or he might as well have no privacy policies at all. 

 

It should be possible to have a location on the Internet where you have your name-and-address information.  Exclusively you can invoke this bundle of information, and it has a shell of a hard privacy policy surrounding it.  Thus, you can insert your name-and-address in any web Browser or in any email message and the privacy policy is negotiated with the recipient.  Furthermore, because there is a “third party” proxy for the bundle, it is readily possible to create anonymous transactions.  This general technique of having proxy sites for you could solve the problem mentioned at the beginning of the paper with having to trust the Web Server on the first two of your hits to its site. 

 

Accomplishing a system such as this does not strike me as much more complex than the existing Domain Name Service that works well on the Internet.  P3P, in this view, makes the conceptual error of thinking that privacy is intransitive, and that it is not necessary to describe information being sent by the user with a privacy policy.  Just as the Web Server privacy policy has a unique web address (the PolicyRef page), the user could have a privacy policy with a unique web address (his PolicyRef) and the two could negotiate and transfer data in a uniform and universal fashion.  The system could be engineered, I think, to even handle the problems with having to remember a lot of passwords as well as the privacy problem of providing a universal acceptance criterion for the user’s name and address.

 

Another final criticism of P3P is that it is a web-only solution to privacy when we know that the Internet involves much more than simply the web.  For example, email moves by other protocols (mainly SMTP), not HTTP.  There is no way in P3P to say that the mail you send to a company contains information that should only be used in ways that you restrict.  Most lawyers would not want to tell you about how little legal sting is in their messages to treat an email as confidential..  They would have to show that you agreed to treat the email as confidential.  It is easy to argue that you “accidentally read” the mail before seeing the “confidentiality statement.”  Without negotiated agreement, P3P is completely ill suited to mail routing.  However, a strong case can be made, I think, that P3P describes precise and essential building blocks for a solution to privacy in mail and many other Internet protocols.   P3P should not be thrown away; it should be built upon.  But perhaps one place that should support P3P would be a configuration of non-HTTP servers that serve up information packages, such as credit card purchases, names and addresses, and, perhaps, even passwords.  This would be an effort for the IETF, I think, because it would act like the Domain Name System in resolving information requests for all other communications protocols on the Internet.

 

I have no doubt that many members of the W3 P3P working group have thought through many if not most of the concerns expressed in this article.  However, these are not the people who are likely to talk about such concerns since their main interest is in getting P3P accepted.   Pointing out flaws like the ones pointed out above don’t, on the surface, look like help in getting P3P accepted.  But my argument is the opposite.   It is probably better for a third party to speak out on these and to invite more vigorous public discussion.  This is precisely because P3P takes us in the right direction.  It deserves to be supported and added to.  P3P clearly represents a good start.   People in all aspects of the Internet socio-economic-political system need to sit up and think this through for themselves.  Privacy will have a widespread and deep influence on the economic vitality of cyberspace.  Information is power, and privacy management is the control, and thereby the economic unleashing, of that power. 

 

 

Post Script, April 20, 2004

 

A common misconception about Negotiated P3P is that it is too complicated to negotiate.  It does seem true that a simplistic negotiation strategy that negotiates every XML tag setting will create a search space that is much too large.  Negotiation should take a microsecond and be invisible to the user and easy to manage for the web site guys.  How can this be done?  I wrote another paper, http://drachma.ecom.cmu.edu/pspnote/,  that describes how this can plausibly be done which isn't read by many people (unlike this paper which has been read by thousands).  That paper has a bunch of math and doesn't just come out and say what I was thinking so here goes:

 

Basically, if a company, let's take Amazon, publishes the Amazon Consumer Privacy Policy, and other companies do the same, so that your 'negotiation engine' only has to go to Barnes and Noble, or PCConnection, and say "I'll accept the Amazon Policy," then all the complexity that is tied into the XML language of the Amazon policy is accepted at once, in a microsecond.  If we assume that people go with popular policies, then the math works out that a web site, like Barnes and Noble, need only keep track of a few different privacy policies (e.g., 7) to capture 99% of the people.  Barnes and Noble need only order their preference and when a customer requests a policy, they can respond with another.  As long as Barnes and Noble and the user eventually cave into popular policies, the system works. There is a whole culture in popular privacy policies that could arise from this, but it takes a few big companies with vision, or a Government, to get this one kick started.  But it does solve the problem in a way that feels pretty comfortable.