Computer Science Department Events
Ahmed AlSum will have his PhD candidacy exam on Wednesday, October 24 at 11:00 amE&CS Building Auditorium (1st Floor)TIME: 11:00 AMTITLE: A Services Framework for Tighter Integration between the Past and Present WebSPEAKER: Ahmed AlSumAbstract:
Wednesday, October 24, 2012
Web archives contain the cultural history of the Web for many years, but it still has a limited capability for the access. Most of the web archiving research has focused on the crawling and the preservation activities, with little focus on the delivery methods. The current access methods might be tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the userâ€™s needs.
In this proposal, we focus on the access methods for the archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archive. The vision behind the proposal is integrating the past and the present web, additional to the integration between the web archives to combine a complete and consistent view of the past web.
We build a new service framework that enables the global web archives user community to benefit from the archived web data. The proposal introduces a novel categorization technique to divide the archived corpus into four levels. For each level, we will propose suitable services and APIs that enable both the users and the third-party developers to build new interfaces.
The highest level in the web archiving service framework pyramid is the archive level. In this level, we define the web archive by the characteristics of its corpus. The archive level is used to find the archives that fulfill specific request. The second level is the URI level where it focuses on the URI and its snapshots. Third level is the metadata level; we extract the metadata from the archived web data and make it available to the users through APIs and GUI interfaces. Finally, the content level that extracts the content from the archived web data, the service is supported with different filters that facilitate a broad range of services.