TITLE: The Internet Archive: Our Collections, Programs and Research Initiatives ABSTRACT Kris Carpenter Negulescu, Director of the Web Group at the Internet Archive, will present an overview of the current holdings of the Archive, review active research and development currently underway at IA, and introduce other major initiatives IA has undertaken or plans to undertake in 2011-2012. Her talk will touch briefly on the following topics: . Overview of the Internet Archive: Data Repository, Books, TV, Other Special Collections, Web . Recent Web Archive Statistics and Reports, Changes to Ingest/Update Cycles . Special Projects: o Data Mining & Extraction via Hadoop/Pig, etc. o Generating Link Graphs of an entire domain from 1996-2010 (e.g. .uk) o WebWide crawling and Hbase o Automated QA of web data at scale o ISC SIE and other data collaboratives o Dynamic, On Demand, Archiving of video, annotations, etc. o Semantic Data extraction and IA's TV archives o IA Data clusters, Cloud computing, VMs o Digital Archive Services & planned bulk Api's Kris Carpenter Negulescu Director, Web Group Internet Archive The Internet Archive (IA) is an entrepreneurial and technologically innovative nonprofit. IA's web archive, launched in 1996, contains over 4 petabytes of data compressed, ~175+ billion publicly accessible web captures (www.waybackmachine.org), including content from every top-level domain, 200+ million web sites, and over 60 languages. Kris leads a team responsible for . Developing the Heritrix open source web crawler and Wayback machine as well as other tools used to search, mine & replay archived web content . Providing expertise and services in web archiving, data mining and access to national and state libraries & archives, universities, museums, research institutes, and many other institutions around the globe. . Collaborating with memory institutions and research teams from around the globe to promote web publishing, harvesting, preservation, and bulk research best practices. Kris represents IA on the International Internet Preservation Consortium steering committee and is co-chair of the Access Working Group. Kris received a Bachelor of Arts in Political Science and a Masters of Arts in Organizational Behavior (an interdisciplinary program in Sociology/Industrial Engineering/Business) from Stanford University.