Date: April 8, Tuesday 10-12 Title: Schema matching for large-scale data integration Dr. Wensheng Wu, IBM Research center, San Jose, CA Abstract: A data integration system is a system that provides uniform access to a set of data sources. Such a system largely facilitates the retrieval of information from multiple sources. An important task in building data integration system is schema matching, that is, to discover semantic correspondences or mappings among the elements from different schemas. While schema matching has been studied for decades, the scalability issue has not received much attention until recently. In particular, there are two dimensions of the scale. One is the number of elements in the schema, and the other is the number of schemas to be matched. In this talk, I will first present a solution for discovering topical structures of databases to support the divide-and-conquer approach to matching large schemas. I will then describe a clustering-based matching algorithm for matching a large number of schemas. I will also briefly describe a new search-driven matching paradigm for focused integration when a complete integration is not required. Finally, I will discuss future directions and conclude the talk. Biodata: Wensheng Wu is currently a postdoctoral fellow at IBM Research Center. He got his Ph.D. in Computer Science from University of Illinois at Urbana-Champaign in 2006. His general research interest includes database, data mining, information retrieval, and Web technology. His current research focuses on large-scale data integration.