|
|
|||||||||||
|
|
|||||||||||
|
Active Research Projects Dr. Yaohang Li Department of Computer Science Old Dominion University
Novel Sampling Approaches in Protein Structure Modeling Accurately modeling protein or protein complex structure is considered a significant grand challenge that has broad economic and scientific impact. One of the key obstacles is the absence of a reliable sampling method that can efficiently explore the tremendously large protein conformation space. This project investigates efficient sampling approaches that can lead to prediction of high resolution protein structures with accuracy and reliability currently not achievable in computational protein modeling. The rationale is to integrate various physics- and knowledge-based scoring functions via multi-scoring functions sampling to explore the complex protein conformation space. The research work includes 1) establishing computational models for multi-scoring functions sampling in protein structure modeling with theoretically and mathematically rigorous justification; 2) designing novel sampling algorithms to efficiently explore large protein conformation space; 3) applying the sampling algorithms to important protein modeling applications including ab initio protein folding and protein-protein docking; and 4) developing a resource-efficient protein modeling programming paradigm. The following figure shows the structure-score distribution of 849 non-dominated solutions obtained in an 11-residue protein loop 153l(154:164) using our sampling algorithm. The multi-scoring functions sampling has led to diversified, non-dominated solutions satisfying the three statistics- or physics-based scoring functions. Among these solutions, a solution cluster close to the native conformation (< 0.5A) emerges. As expected, the near-native solutions do not yield minimum scores in either of the scoring functions, but are at the Pareto-optimal front of the multi-scoring function space.
Protein Loop Structure Prediction Accurate protein loop structure modeling is important
in structural biology for its wide applications, including determining
the surface loop regions in homology modeling, defining segments in NMR
spectroscopy experiments, designing antibodies, and modeling ion
channels. Our ultimate goal is to obtain computational loop models with
experiment resolution. In near 70% of 87 long loop benchmark targets (10~13 residues), the top-ranked decoys predicted from our current server are in subangstrom resolution. In more than 80% of these targets, at least one of the top-5-ranked decoys is in subangstrom resolution. The following figures show our prediction results in two protein loops.
1CNV(110:122) 1RCF(122:132) We are collaborating with Dr. Eric Jakobsson and Ionel Rata of National Center for Supercomputing Applications (NCSA) at University of Illinois, Urbana-Champaign, in this project.
High Resolution ab initio Protein Folding The successful ab initio
protein structure prediction depends on the surmounting of three
efforts: (1) formulating an accurate and sensitive scoring function that
can lead the search process to the global minimum in the protein folding
energy landscape; (2) devising efficient moves (conformation changes)
toward the native conformation; and (3) developing a global optimization
algorithm that can efficiently escape from the deep local minima and
converge to the global energy minimum. Among these three efforts,
building an accurate and sensitive scoring function is of the most
importance. However, just like many other computational biology
problems, developing a sensitive and accurate scoring function is a very
difficult and even formidable job. In reality, even though many scoring
functions based on various criteria, such as energy, statistics,
secondary structure, loops, or contact pairs, have been proposed,
currently there does not exist a reliable and general scoring function
that can always drive a search to a native fold, and there is no
reliable and general global optimization method under these scoring
functions that can sample the conformation space adequately to guarantee
a significant fraction o near-natives (<3.0 A RMSD from the experimental
structure).
Many
different types of sensors are employed to monitor the different
environmental attributes contributing to the climate change. For
example, seismic sensors are deployed to monitor the seismic activities
under the ocean and temperature sensors and sea level sensors monitor
the changes in temperature and sea level in the ocean. While individual
sensors provide some insights about the ongoing events, it is very
important to consider the signals from different sensors collectively
for detecting climate change events. Patterns detected from the
individual sensors may look just normal in isolation. However, the
temporal relations among them across geospatially distributed sensors
may better indicate an important class of global events that may have
not been apparent from the individual stream analysis. Mining individual
stream data has been a subject of a large number of studies. However,
studies on mining spatio-temporal patterns across multivariate stream
data are very limited. In this project, we intend to address the
following three issues:
Markov Chain Monte Carlo
Grid-based Monte Carlo and Quasi-Monte Carlo Monte Carlo applications are widely perceived as computationally intensive but naturally parallel. Therefore, they can be effectively executed on the grid using the dynamic bag-of-work model. We improve the efficiency of the subtask-scheduling scheme by using an N-out-of-M strategy, and develop a Monte Carlo-specific lightweight checkpoint technique, which leads to a performance improvement for Monte Carlo grid computing. Also, we enhance the trustworthiness of Monte Carlo grid-computing applications by utilizing the statistical nature of Monte Carlo and by cryptographically validating intermediate results utilizing the random number generator already in use in the Monte Carlo application. All these techniques lead to a high-performance grid-computing infrastructure that is capable of providing trustworthy Monte Carlo computation services. These techniques can be also extended to quasi-Monte Carlo applications. Dr. Michael Mascagni is my key collaborator in this project.
Biological systems are remarkably adaptive and robust in complex real-world environments. For this reason, in this project, we are exploring biologically-inspired approaches to system adaptation, fault-tolerance and reconfiguration. Our approach is theoretical and application driven. On the theoretical side, we will explore the phenomena of self-adaptation and reconfiguration/organization in natural biological systems, and then develop a theoretical framework for self-reconfigurable systems. On the application side, we will design and evaluate biomimetic mechanisms and algorithms for future metamorphic autonomous systems. This project is funded by National Science Foundation, RISE program. More information can be found at the cooperative research center website. |
|||||||||||