|
 |
Computer Science Department Past Features
Featured Defense - July 2011
Martin Klein PhD, Old Dominion University
Bio
Martin Klein received his Diploma in Applied Computer Science in 2002 from the University of Applied Sciences in Berlin, Germany. For the following two years he worked as a full time research assistant at the same institution in the areas of e-learning as well as mobile computing.
Martin joined the Ph.D. program at Old Dominion University in 2005. He passed the Diagnostic Exam in 2006 and his Proposal/Candidacy Exam in 2008. As a member of Dr. Michael L. Nelson's Web Science and Digital Libraries Group he is mainly working in the areas of digital preservation, information retrieval and temporal aspects of web resources. He also is interested in the fields of data mining, linked data, natural language processing and semantic web. Since 2005 Martin has published more than 20 scholarly articles in international conference proceedings, in journals and as book chapters. He received the Old Dominion University Outstanding Research Assistant Award in Computer Science in 2009, the College of Sciences Dissertation Fellowship in 2008/2009 as well as various ACM SIGWEB student awards.
Abstract
Title: Using the web infrastructure for real time recovery of missing web pages
Given the dynamic nature of the World Wide Web, missing web pages, or "404 Page not Found" responses, are part of our web browsing experience. Intuitively, information on the web is rarely completely lost, it is just missing. In whole or in part, content often moves from one URI to another and hence it just needs to be (re-)discovered. We evaluate several methods for a "just-in-time" approach to web page preservation. We investigate the suitability of lexical signatures and web page titles to rediscover missing content. It is understood that web pages change over time which implies that the performance of these two methods depends on their age. We therefore conduct a temporal study of their decay and estimate their half-life. We further propose the use of tags that users have created to annotate pages as well as the most salient terms derived from a page's link neighborhood. We utilize the Memento framework to discover previous versions of web pages and to execute the above mentioned methods. We provide a workflow including a set of parameters that is most promising for the (re-)discovery of missing web pages.
We introduce Synchronicity, a web browser add-on that implements this workflow. It works while the user is browsing and detects the occurrence of 404 errors automatically. When activated by the user Synchronicity offers a total of six methods to either rediscover the missing page at its new URI or discover an alternative page that satisfies the user's information need. Synchronicity depends on user interaction which enables it to provide results in real time.
|
|
|