Research?

Teaching

Downloads

Classes

Publications

ainsworth-jcdl11

Summary

Ainsworth, Scott G., Alsum, Ahmed, SalahEldeen, Hany, Weigle, Michele C. and Nelson, Michael L., "How Much of the Web Is Archived?" In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL'11). Ottawa, Canada, June 2011.

Abstract

The Memento Project’s archive access additions to HTTP have enabled development of new web archive access user interfaces. After experiencing this web time travel, the inevitable question that comes to mind is “How much of the Web is archived?” This question is studied by approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives. The results indicate that 35%--90% of URIs have at least one archived copy, 17%--49% have two to five copies, 1%--8% have six to ten copies, and 8%--63% at least ten copies. The number of URI copies varies as a function of time, but only 14.6--31.3% of URIs are archived more than once per month.

Bibtex entry

@INPROCEEDINGS { ainsworth-jcdl11,
    ABSTRACT = { The Memento Project’s archive access additions to HTTP have enabled development of new web archive access user interfaces. After experiencing this web time travel, the inevitable question that comes to mind is “How much of the Web is archived?” This question is studied by approximating the Web via sampling URIs from DMOZ, Delicious, Bitly, and search engine indexes and measuring number of archive copies available in various public web archives. The results indicate that 35%--90% of URIs have at least one archived copy, 17%--49% have two to five copies, 1%--8% have six to ten copies, and 8%--63% at least ten copies. The number of URI copies varies as a function of time, but only 14.6--31.3% of URIs are archived more than once per month. },
    ADDRESS = { Ottawa, Canada },
    AUTHOR = { Ainsworth, Scott G. and Alsum, Ahmed and SalahEldeen, Hany and Weigle, Michele C. and Nelson, Michael L. },
    BOOKTITLE = { Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL'11) },
    MONTH = { jun },
    TITLE = { How Much of the Web Is Archived? },
    YEAR = { 2011 },
    PUBDATE = { 201106 },
    PDF = { ainsworth-jcdl11.pdf },
}

History Print Recent Changes Search

Page last modified on March 01, 2014, at 11:00 AM