Modeling Protein Loop Structure using Multiple Knowledge- and Physics-based Scoring Functions

Yaohang Li
Department of Computer Science
Old Dominion University

The value of computer-generated protein structural models in biological research and practice relies critically on their accuracy. However, development of high-resolution computational approaches that can reliably produce protein structural models with or close to experimental quality remains an unsolved problem, though significant advances have been made in the past ten years. The main difficulties include the tremendously large and complex protein conformation space and, more importantly, the absence of scoring functions with satisfactory accuracy as well as sensitivity. In this project, the research seeks to answer a challenging question -- can one still model protein structures with high accuracy using the existing scoring functions which are potentially insensitive and inaccurate?

Different from the common approaches of globally optimizing a scoring function describing the conformational energy, we are exploring a new direction to model protein structures via efficiently sampling the common low score regions in multiple carefully-selected knowledge-based, physics-based, or regression-based scoring functions. This new approach addresses the scoring function insensitivity problem based on the assumption that the native or native-like conformations should satisfy most of the existing good scoring functions by yielding low score values. Sampling multiple scoring functions allows toleration of insensitivity and deficiency in individual scoring functions and identification of conformations that can best satisfy most scoring functions, which will eventually lead to significant resolution improvement. We are verifying this sampling strategy by applying it to a proof-of-concept ab initio protein loop structure prediction problem. The following figure shows our prediction result of 1rge(57:68). Our work involves integrating multiple scoring functions, including triplet torsion angle score, physical energy, distance-based potential, loop closure score, and others, into the sampling scheme with the goal of reliably predicting loop backbone structures with near experimental resolution. The computational tools for loop structure prediction are being made available as a software package to the protein modeling research community.

We are collaborating with Dr. Eric Jakobsson and Ionel Rata of National Center for Supercomputing Applications (NCSA) at University of Illinois, Urbana-Champaign, in this project.


Software Packages:

Protein Loop Structure Modeling using Multiple Scoring Functions (656M)

Y. Li, I. Rata, S. Chiu, E. Jakobsson, “Improving Predicted Protein Loop Structure Ranking using a Pareto-Optimality Consensus Method,” submitted to BMC Structural Biology, 2009.

I. Rata, Y. Li, E. Jakobsson, “Backbone statistical potential from local sequence-structure interactions in protein loops,” submitted to Journal of Phys. Chem. B. 2009.

Y. Li, I. Rata, E. Jakobsson, “Multi-Scoring Functions Sampling in Protein Loop Structure Prediction,” submitted to Applied Mathematics and Computation, 2009.

Decentralized Hybrid Parallel Tempering and Simulated Annealing Program

Y. Li, V. A. Protopopescu, N. Arnold, X. Zhang, A. Gorin, “Hybrid Parallel Tempering/Simulated Annealing Method,” Applied Mathematics and Computation, 212:216-228, 2009.
Y. Li, M. Mascagni, A. Gorin “A Decentralized Parallel Implementation for Parallel Tempering Algorithm,” Parallel Computing, 35(5): 269-283, 2009.

This work is funded as under NSF Computing and Communication Foundations: SGER: A Novel Multi-Scoring Functions Sampling Approach to Improve Protein Modeling Resolution and It's Applications in Protein Loop Structure Prediction CCF-0829382.
National Science Foundation