Memento Response 2009-11-24

About Guide Demos Tools

This note is in response to Pete Johnston's blog post about Memento: http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html

1. Enumeration of URIs of Archived Resources

First, we'll address your suggestion that a "Link" header could be used to achieve what we propose doing by using datetime content negotiation, and in doing so we ignore for the moment that the Link header seems to be stuck in a perpetual draft status. The difficulty of listing the URIs of archived versions and their associated metadata in a document that is pointed at by a URI in a Link response header is that is based on an implicit assumption that the number of options (i.e. archived resources) is small. The same assumption underlies the Alternates header in RFC 2295, which expects there to be no more than 10 or so options. Indeed, there just aren't that many media types, languages, encodings and charsets, and hence the associated 4D matrix of options for a resource is generally very sparse. Obviously, this is not true for the time dimension. Looking forward, there will be a countable infinite number of options. Looking backward, there are a finite number of options, the boundary of which =~ 20 * 365 * 24 * 60 * 60 (assuming the first web server was running in 1989, and RFC 1123 date format only supports second granularity).

Obviously, most resources will not have that many archived resources associated with them. But, as Wikipedia version pages show, they might easily have several thousands. While it would be possible to dereference "/qotd/" (possibly with a HEAD) and parse such an extensive list that is pointed at by a URI in the "Link" response header in search for an archived resource that meets the client's datetime preference, this could turn out to be rather challenging. Also, even though it might be possible to do so, it would certainly not be an efficient (or arguably even elegant) approach. Rather, if the client knows it wants a 2005-12-13 version of "/qotd/", there is no reason to not make that desire known at request time. This is what happens in the existing 4 dimensions of CN, using Accept-* headers. We note that we do propose the use of the Link header, as a means to try and honor the mandate expressed in RFC 2295 that an Alternates header must list all available options. Given the amount of options, this is not possible for the datetime dimension. Hence, our approach to resolving this problem is to list in the "Alternates" response header only a "few" archived resources that are temporal neighbors to the datetime preference expressed by the client (allowing the server some leeway in how many are a "few"), and to provide a "Link" header with a URI of a TimeMap (an ORE ReM listing all available archived resources). Hence, the list is available via the "Link" header. But for reasons explained above, we do not think the "Link" header is the most appealing approach to allow a client to figure which resource meets its preference.

2. The resource that delivers the archived representation.

Just to make sure there is no misunderstanding about this (as there has been on another forum) we want to state that in our proposed datetime content negotiation scheme, the archived representation will never be delivered by the original resource. Rather it is requested via the original resource but delivered by another resource, i.e. an archived resource that has its own URI. See http://www.mementoweb.org/guide/http/local/ and http://www.mementoweb.org/guide/http/remote/ .

3. The "current" state issue

It has been suggested here, and in another forum that it might be inappropriate to request an archived version of a resource via the original resource doing so would be in conflict with the definitive documents (W3C Web Architecture, RFC 2616) and with Roy Fielding's Dissertation (REST).

On the Linked Data list, Mark Baker formulates the problem as follows:

Quote: My claim is simply that all HTTP requests, no matter the headers, are requests upon the current state of the resource identified by the Request-URI, and therefore, a request for a representation of the state of "Resource X at time T" needs to be directed at the URI for "Resource X at time T", not "Resource X".

We can understand this concern, and will try and show that the definitive documents are quite less firm regarding the "current state" issue as Mark Baker is in the above. But before doing so, we would like to point out that it seems rather logical (and even essential) to us to involve the original resource in the attempt to get to its prior versions. After all, it is the URI of the original resource by which the resource has been known as it evolved over time. Hence, it makes sense to be able to use that URI to try and get to its past versions. And by "get", we don't mean search for it, but rather use the network to get there. After all, we all go by the same name irrespective of the day you talk to us. Or we have the same Linked Data URI irrespective of the day it is dereferenced. Why would we suddenly need a new URI when we want to see what the Linked Data description for any of us was, say, a year ago? Why must we prevent that this same URI helps us to get to prior versions?

But back to the authoritative documents. It is our impression that neither RFC 2616 or the W3C Web Arch document really define or enforce the notion of *current* state when it comes to the representation that is returned in response to a GET on a resource.

3.1 W3C Web Architecture

The W3C Web Arch document is agnostic about "current" state. See bullets 3 and 4 from http://www.w3.org/TR/webarch/#dereference-details

Quote:


   Precisely which representation(s) are retrieved depends on a number  of factors, including:
    1. Whether the URI owner makes available any representations at
       all;
    2. Whether the agent making the request has access privileges for
       those representations (see the section on linking and access
       control (35.2));
    3. If the URI owner has provided more than one representation (in
       different formats such as HTML, PNG, or RDF; in different    
       languages such as English and Spanish; or transformed       
       dynamically according to the hardware or software capabilities  
       of the recipient), the resulting representation may depend on 
       negotiation between the user agent and server.
    4. The time of the request; the world changes over time, so       
       representations of resources are also likely to change over
       time.

  Assuming that a representation has been successfully retrieved, the
   expressive power of the representation's format will affect how
   precisely the representation provider communicates resource state.

3.2 RFC 2616

RFC 2616 is pretty open ended about choosing a representation for a resource (emphasis ours): ...


   resource
      A network data object or service that can be identified by a URI,
      as defined in section 3.2. Resources may be available in multiple
      representations (e.g. multiple languages, data formats, size, and
      resolutions) or **vary in other ways.**

   content negotiation
      The mechanism for selecting the **appropriate representation**
      when servicing a request, as described in section 12. The
      representation of entities in any response can be negotiated
      (including error responses).
...

 12 Content Negotiation

   Most HTTP responses include an entity which contains information for
   interpretation by a human user. Naturally, it is desirable to supply
   the user with the "best available" entity corresponding to the
   request. Unfortunately for servers and caches, not all users have the
   same preferences for what is "best," and not all user agents are
   equally capable of rendering all entity types. For that reason, HTTP
   has provisions for several mechanisms for "content negotiation" --
   ** the process of selecting the best representation for a given response
   when there are multiple representations available.**

...

   However, an origin server is not limited to these dimensions and
   MAY   vary the response based on any aspect of the request,
   including information outside the request-header fields or
   **within extension header fields not defined by this specification.**

3.3 Fielding's Dissertation

Returning to Fielding's dissertation, it admittedly depends on how you read it, but we think it at the very least it does not preclude Memento. Re-quoting some of the relevant bits:

... a resource R is a temporally varying membership function M_R(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers.

and:

If the value set of a resource at a given time consists of multiple representations, content negotiation may be used to select the best representation for inclusion in a given message.

Returning to the "/qotd/" resource: If you view "/qotd/" *as* the string "From each..." @ t1, and then *as* the string "Those who..." @ t2, then you're not going to like the proposed Memento approach. If you view "/qotd/" more abstractly -- say as "pithy quotations from left-wing German philosophers" -- then you'll have no problem with "/qotd/" negotiating to different strings @ t1, t2, etc. So *if* you subscribe to this notion of abstractness of the resource, then negotiating in the established 4 dimensions as well as in 5th, time, dimension should be acceptable.

We feel that the abstract perspective of a resource is supported rather strongly by Fielding's perspective, when he states:

The key abstraction of information in REST is a resource. Any information that can be named can be a resource: a document or image, a temporal service (e.g. "today's weather in Los Angeles"), a collection of other resources, a non-virtual object (e.g. a person), and so on. In other words, any concept that might be the target of an author's hypertext reference must fit within the definition of a resource. A resource is a conceptual mapping to a set of entities, not the entity that corresponds to the mapping at any particular point in time.

And later on:

This abstract definition of a resource enables key features of the Web architecture. First, it provides generality by encompassing many sources of information without artificially distinguishing them by type or implementation. Second, it allows late binding of the reference to a representation, enabling content negotiation to take place based on characteristics of the request. Finally, it allows an author to reference the concept rather than some singular representation of that concept, thus removing the need to change all existing links whenever the representation changes (assuming the author used the right identifier).

So, assuming you buy into the 5 content negotiation dimensions, "/qotd/" will negotiate to these representations (each of which is a stand-alone resource in its own right):


/qotd/index.html.de.20090101
/qotd/index.html.de.20090102
/qotd/index.html.en.20090101
/qotd/index.html.en.20090102
/qotd/index.pdf.de.20090101
/qotd/index.pdf.de.20090102
/qotd/index.pdf.en.20090101
/qotd/index.pdf.en.20090102

(Note the above notions of time (i.e., in the "X-Accept-Datetime" dimension) are different from value that might exist as expressed in "Last-Modified". For example, I could fix a misspelling in a .20090101 version today, thus changing its modification time w/o changing its "X-Accept-Datetime" value.)

Which one to return when dereferencing "/qotd/"? Most people's browsers probably make this explicit, but even if they didn't you'd probably get the .html.en version as the default. We'll go furher and say for datetime you should get the most current version, .html.en.20091124 for example, as the default value when no datetime preference is specified. That is, in the absence of Memento headers, everything should operate as it normally does. In the future this should be formalized in a variant selection algorithm, such as those at:

http://www.ietf.org/rfc/rfc2296.txt
http://httpd.apache.org/docs/2.3/content-negotiation.html

On the other hand, if you intended to get another representation of that resource (e.g., .pdf.de.20090531), then you need to be explicit about your preferences in the request headers.

We believe the above section also addresses Mark Baker's concerns about what "Accept-" headers negotiate to:

http://lists.w3.org/Archives/Public/public-lod/2009Nov/0138.html
http://lists.w3.org/Archives/Public/public-lod/2009Nov/0140.html

To reiterate, in transparent content negotiation (i.e., RFC 2295), "Accept-" headers always end up negotiating from one URI to another URI. This is the purpose of the "Alternates" header: to enumerate and make transparent these URIs. While it is possible to implement content negoaitation non-transparently (with anonymous representations, e.g. a cgi script that chooses a representation depending on Accept- headers), this is 1) not considered good practice (it is dangerously close to "cloaking"!), and 2) it limits content negoaitation to representations available on the same server (i.e., you can't negotiate from URI1 on server1 to URI2 on server2); this is not a limitation with content negoaitation done with 302s.

We note that the first quote from Fielding's thesis is honored (i.e. not violated) by each of the resources that actually deliver representations in this scheme. And the second quote applies to the resource that is being negotiated on. In cases where that negotiable resource does not deliver any representations itself (the case of TimeGates for servers without internal archival capabilities), the first quote of Fielding applies in the sense that the set of values is (and will always) be empty as there are no representations. In cases where that negotiable resource does deliver a representation (the case of the TimeGate that coincides with the original resource for servers with internal archival capabilities), the representation that is served is always that of the current state, and hence the first quote of Fielding is honored by a set that contains this representation.

It helps to think of it like this: instead of "/qotd/" negotiating to "secondary" URIs such as:


/qotd/index.html.en.20090102
/qotd/index.pdf.de.20090101

it is better to think of the above URIs as primary, and "/qotd/" as the URI that is introduced to glue those more explicit URIs together. In a sense, "/qotd/" doesn't exist with its own representation, it just negotiates to another URI that does have a representation (although this could depend on how "/qotd/" is implemented).

In this sense, "/qotd/" perhaps has more in common with Linked Data "non-information resources" than with conventional document resources. If viewed on a continuum:

resource
transparently negotiable resource *
non-information resource

Although perhaps this is a discussion for a different time...

* also called "content-type-generic" in http://www.w3.org/Provider/Style/URI

4. Summary

So ultimately, Memento will hinge on your comfort level with the notion of time as negotiable dimension and the nature of resources. Of course, documents like the W3C Web Arch and RFC 2616 don't explicitly address this (otherwise we would not have had to introduce it), but we don't think they explicitly deny it either. As such, we think Datetime works nicely as a fifth dimension for Content Negotiation.

Michael, Herbert & Rob.