I am a Ph.D. student and researcher at Old Dominion University majoring in Computer Science. My emphasis is Web Preservation with a focus on the temporal quality of existing holdings of web archives such as The Internet Archive. I work under the supervision of Dr. Michael Nelson.

Prior to my current academic studies, I spent time in the U.S. Navy and spent several decades as a successful software engineer. I enjoy mentoring and teaching. In 2006, I decided to return to school in order to further my education and formally prepare for a possible future teaching Computer Science.

Temporal Coherence Patterns in Composite Mementos

Informally, a composite memento is an archived web page and all the archived resources (images, stylesheets, etc.) required to render the page in a web browser. A composite memento can be considered a tree, as shown in the figure 2 below. The root resource, normally a web page, embeds other resources, which can embed other resources, etc. When viewed in a web browser (ex. January 2, 1997 ODU CS Home Page), a single datetime is displayed even though the root and each embedded resource were captured at different times.

Figure 2. Composite Memento Tree

When evaluating temporal coherence of embedded resources, the relationships between the root and embedded mementos form a discrete number of different patterns. The example diagrammed in figure 3 shows an embedded memento (blue) captured after the root (red), but with a Last-Modified datetime before the root's capture datetime. This is a temporally coherent state.

Figure 3. Prima Facie Coherence

On the other hand, figure 4 shows an embedded memento that was both captured after the root and has a Last-Modified datetime after the root's capture datetime. This is a temporal violation.

Figure 2. Prima Facie Violation

Temporal coherence of composite mementos is ongoing research. A comprehensive technical report covering the patterns greater depth is available: arXiv:1402.0928.

Temporal Drift While Browsing Web Archives

When viewing archived web pages through an archive's user interface, the archive attempts to simulate the familiar experience of browsing the Web. The user begins by specifying a URI and selecting a datetime to view from a list—this is the target datetime. The archived web page is then displayed. As the user follows links, each click changes the target datetime to the archive datetime the selected page. The result is a nearly-silent drift away from the datetime originally selected. (Try it at the Internet Archive.)

There are many circumstances where this drift is undesirable. But until recently, there was no convenient method to control it. With Memento API and the MementoFox, this changes. Two target datetime policies were identified and tested: Sliding and Sticky. The Sliding Policy models existing archive browsing interfaces, allowing the target datetime to change with each link followed. The Sticky Policy fixes the target datetime to the first datetime selected.

Figure 1. Median Temporal Drift by Step

As shown in figure 1 above, we found that the Sticky Policy controlled drift, keeping it to just under 14 days. The sticky policy, on the other hand, allowed drift to continually increase as the number of pages viewed increased. Details were published at JCDL 2013 (nominated for best student paper). A preprint is available on arXiv. An extended version will be available in a special edition of IJDL in late 2013 or early 2014.

History Print Recent Changes Search

Page last modified on February 18, 2014, at 07:09 PM