Mar. 29, 12:30-1:30. Part of the ACM distinguished lecturer series.
Data Science: Recent Developments and Future Trends
Speaker: Li Chen (ACM Distinguished Speaker)
Abstract
Data contains science. How data is handled today is much different than the
classical mathematical approach of using models to fit the data. Today, people
are supposed to find rules and properties within the data set and sometimes
among different data sets.
In this talk, we will explain data science and its relationship to BigData,
cloud computing, and data mining. We also discuss current research problems in
data science and provide concerns related to the data science industry. This
talk will connect and group computer science, mathematics, information
technology, social science, and other applications in order to give a
comprehensive view of the future of data science. Emphasizing the bridge
between computer science and math, we will explain why data science would serve
as a tremendous engine to the development of the new computing and math
theories.
Data science is about the study of: (1) The science of data, (2) Knowledge
extraction from massive data sets (BigData) mainly using machine learning, (3)
Data and data set relations, (4) BigData processing including tools such as
Hadoop and Spark on cloud computing, and (5) Visualization of massive data and
human-computer interaction.
In this talk, we give an overview of data mining and machine learning methods
such as kNN, k-means, SVM, decision trees, PCA, and other popular methods. We
also introduce timely problems for study including: smart search (also called
the matrix completion problem or the Netflix problem), the subspace problem,
financial data recovery, video tracking, and persistent data processing.
For future research problems, we would like to discuss computing and algorithm
design based on various MapReduce-based models. For applications, we provide a
simple case study in image segmentation using MapReduce with detailed algorithm
analysis.