Deriving Recommendations from User Behavior in Digital Libraries Johan Bollen Los Alamos National Laboratory Abstract: The proliferation of distributed knowledge systems (e.g. the World Wide Web (WWW), Digital Libraries and genome databases) has increased the need for reliable and valid methods of information linking. The latter is required for efficient human retrieval and the study of the structural properties of a given domain. Existing Information Retrieval methodologies for information linking (e.g. vector space models) are largely text-content or meta-data reliant, and can therefore only be applied to traditional document collections. They are furthermore sensitive to synonymy, polysemy and the use of different languages. Other approaches are limited to the analysis of the graph-theoretical properties of existing author-defined document networks (e.g. citation graphs and WWW hyperlink structure). I present a methodology that derives document relations from an analyses of user retrieval sequences and therefore operates independent from document text content or author-based design. Three Hebbian-based learning rules modify link weights according to sequences of user retrieval requests and gradually adapt existing hypertext networks or document relations to changes in user preferences. The generated networks have been shown to reliably and validly represent the preferences of its community of users. The presentation consists of three parts. First, I discuss a prototype of a system that generates hyperlinks for a collection of reduced hypertext pages based on the retrieval sequences registered for a large group of users. The results have been evaluated in terms of network development reliability and validity by a test-retest methodology using a simulation of user retrieval behavior. I briefly discuss the computational model of user retrieval behavior used in these simulations. A recent modification of this model using an associative optimization heuristic has been shown to yield accurate predictions of user navigation paths in a large web site. Second, I present an application of the discussed system to the generation of information links in Digital Library systems. A prototype has been implemented which generates large networks of document and journal relations from user retrieval sequences as they were registered in the Los Alamos National Laboratory (LANL) Research Library (RL) server logs. One such generated network was compared to a citation network for the same set of journals. The latter was derived from the Institute for Scientific Indexing 1998 journal citation data. Results indicate large discrepancies between the graph-theoretical properties of the user-determined and the ISI citation network, and raise questions regarding the validity of ISI Impact Factors. Third, I provide a demonstration of a prototype of a Spreading Activation journal recommendation system operating on the generated journal networks. The system allows users to expand initial keyterm queries by honing lists of recommended journals and will be connected to the LANL RL MyLibrary personalization service. Biography Johan Bollen: Johan Bollen is a Postdoctoral Research Associate in the Computer and Computational Science Division at the Los Alamos National Laboratory, where he works on the Active Recommendation Project. He received his BA and MS in Experimental Psychology (Specialization Cognitive Science and AI) at the Vrije Universiteit Brussel with Distinction in 1994. His MS thesis concerned the implementation of neural behavior systems for autonomous robotics and was awarded Highest Distinction. He successfully defended his PhD dissertation in July 2001 and will be awarded a PhD degree in Psychology in October 2001 from the Vrije Universiteit Brussel. His research focuses on systems for automated information linking, human factors in web design and navigation, Distributed Knowledge Systems, Information Retrieval and Digital Library recommendation systems. He has authored and co-authored numerous articles in peer reviewed conference proceedings and journals, co-edited an anthology entitled "The Evolution of Complexity" and co-organized several international symposia and workshops. His research on distributed knowledge systems for the WWW has received widespread attention in the scientific and popular press. BIBTEX: @ARTICLE{system:bollen1998, author = {Johan Bollen and Francis Heylighen}, year = 1998, title = {A system to restructure hypertext networks into valid user models}, journal = {The new review of Hypermedia and Multimedia}, volume = 4, pages = {189--213}, } @INPROCEEDINGS{groupu:bollen2000, author = {Johan Bollen}, title = {Group user models for personalized hyperlink recommendation}, year = 2000, publisher = {Springer Verlag}, address = {Trento}, month = {August}, booktitle = {LNCS 1892 - International Conference on Adaptive Hypermedia and Adaptive Web-based Systems (AH2000)}, pages = {39--50}, } @INPROCEEDINGS{adapti:bollen2000, author = {Johan Bollen and Luis M. Rocha}, title = {An adaptive systems approach to the implementation and evaluation of digital library recommendation systems}, year = 2000, publisher = {Springer Verlag}, address = {Lisbon}, month = {September}, booktitle = {LNCS - Fourth European Conference on Research and Advanced Technology for Digital Libraries (ECDL2000)}, }