Current Research Projects
My research has been funded by the Andrew Mellon Foundation,
Library of Congress, NASA and NSF. Since 2001, I have been
PI or Co-PI on 11 grants totaling more than $2.5M. The list
below covers the current, major projects but please see the publication list and Web Science and Digital Libraries
blog for details.
- The Memento Project
defines "datetime" as a fifth dimension for HTTP content negotiation.
This allows the user to set the browser and server to try to
find the desired version of the web page. For example, if you
want to view cnn.com as it existed on November 4, 2008, you could
check the Internet
Archive Wayback Machine, but the desired copy might not
be there yet. Or additional copies exist in other archives,
such as Archive-It,
CDL Web Archives,
WebCite.
Memento provides an HTTP-level mechanism for
integrating the holdings of these archives. See the ODU
news release, WS-DL
blog or our techreport
for more information. This project is joint work with
the Los Alamos National Library Research Library and Scott
Ainsworth is the lead student.
- We are investigating the theory and implementation of handling
"hard" and "soft" 404 web pages and the real-time discovery of the same
(or very similar) page at a new URI. We are analyzing how content moves
from page to page in the live and archived web over time. To discover
moved and similar content, we are experimenting with combinations of
lexical signatures, titles, tags and link neighborhoods. Soon we will
release "Synchronicity", a FireFox extension that embodies our theoretical
findings and assists users in (re-)discovering lost pages.
Martin Klein is the lead
student on this project.
- My NSF
CAREER Award is funding research into building mobile digital objects
that can live longer than the people or organizations that created them.
We have built a simulator that can create large networks of linked digital
objects according to different linking parameters. We have shown that
unsupervised objects can build linkage networks with small-world graph
properties. In this project we are also investigating digital object
archival practices of college students. Charles Cartledge is the lead
student on this project.
- The Open Archives
Initiative Object Reuse and Exchange ORE will develop specifications
that allow distributed repositories to exchange information about their
constituent digital objects. These specifications will include approaches
for representing digital objects and repository services that facilitate
access and ingest of these representations. The specifications will enable
a new generation of cross-repository services that leverage the intrinsic
value of digital objects beyond the borders of hosting repositories.
- The mod_oai project integrates
OAI-PMH semantics directly into the Apache webserver. Using the notion
of resource harvesting, mod_oai allows the entire web resource to be
harvested, not just the descriptive metadata. Initial results accessing
a departmental web site using both web crawling and mod_oai harvesting
techniques show that harvesting provides comparable performance to
crawling when accessing a web site for the first time, and significant
speed increases when updates are considered. We are also exploring the
use of MPEG-21 DIDLs to disseminate "preservation ready" representations
of web resources. This project is joint with the LANL Research Library
and funded by the Andrew Mellon Foundation. Some test links for this
research: http://oducrate.gotdns.com/index.html (defunct),
http://crate.gotdns.com/index.html (defunct),
http://blanche-00.cs.odu.edu/, and
http://blanche-02.cs.odu.edu/.
Joan A. Smith (PhD, 2008) was
the lead student on this project.
- Lazy
Preservation uses the Web Infrastructure (commercial
search engines, Internet Archive and research projects) to reconstruct lost web sites.
The purpose of lazy preservation is not to replace backup strategies
and disaster planning, but it does offer a surprising good safety net
for recovering sites after a catastrohpic event. We are investigating
descriptive models ("how much do you get if you do nothing?") as
well as prescriptive and predictive models. This research is
funded by the NSF. We have set up a test repository, called the Monarch Repository, to test
certain features of lazy preservation.
Frank McCown (PhD, 2007) was
the lead student on this project.
Annual Reports