Dr. Slobodan Vucetic, Temple University TITLE: Memory-Constrained Data Mining ABSTRACT: Data mining concentrates on the development of computationally efficient and accurate tools for knowledge discovery from very large data sets. Many successful data mining algorithms scale linearly or even better with data size and can learn with high accuracy from millions of examples on a regular PC. Interestingly, linear or sub-linear scaling is often not enough in a today's world of ever increasing data collections in sciences, medicine, engineering, and business at one end and increasing use of microcontrollers with highly limited memory and processing power at another end. A critical question in this new world is how to optimize learning from data that exceeds the available computational resources by orders of magnitude. This talk will demonstrate that highly accurate learning could be achieved using seemingly inferior computational devices. To accomplish this, a computer is treated as a reservoir that sequentially observes a large data stream and, at any given moment, maintains a data summary and a prediction model that describes the data. After each observed stream example, the reservoir content is updated such that the accuracy is increased while the memory constraint is kept satisfied. The proposed approach prefers the examples on which the current predictor is the most uncertain. In addition, a data summary is maintained instead of the individual examples. In this talk I will present an online support vector machine algorithm that requires constant memory and achieves sublinear runtime. The proposed algorithm was tested on a number of large data sets (some of them containing more than 1 million examples) from various domains, where it was assumed that the memory can maintain information about only 100 examples. In most cases, the achieved accuracy was near the upper bound achievable using unlimited resources. On the other hand, the accuracy was substantially larger than when randomly selected 100 support for the hypothesis that innovative data mining methods can enable high-quality learning from large data sets using resource-constrained computing devices. BIO: Slobodan Vucetic received B.S. and M.S. degrees from the University of Novi Sad, Serbia, in 1994 and 1997, and Ph.D. degree in electrical engineering from the Washington State University in 2001. He is an assistant professor at the Department of Computer and Information Sciences at Temple University. His research interests include bioinformatics, knowledge discovery from sequence, time and spatial data, scientific data mining, text and web mining, machine learning, and data compression.