Active Research Projects -- Yaohang Li

Active Research Projects

Dr. Yaohang Li

Department of Computer Science

Old Dominion University

Machine Learning and Artifical Intelligence in Nuclear Physics
Randomized Algorithms for Numerical Linear Algebra
System Biology
Protein Structure Modeling

Machine Learning and Artifical Intelligence in Nuclear Physics

By collaborating with the Center for Theoretical and Computational Physics at Jefferson Lab, I become interested in developing machine learning and data analytic algorithms for nuclear physics. My research currently focuses on two projects.

1) Physical Event Generator: We are using Generative Adversarial Networks (GAN) to build AI-based Monte Carlo event generators (MCEG) capable of faithfully generating final state particle phase space. Unlike many GAN applications, such as generating realistic and sharp looking images, where the distribution agreement between the GAN-generated samples and the true ones is often not strictly enforced, GANs for generating physical events are required to model the distributions of event features and their correlations sufficiently precisely for the nature of particle reactions to be faithfully replicated. Moreover, events generated by GANs should not violate the well-known physics laws, such as baryon number and momentum conservation. We address these issues by incorporating physics into the design of the GAN architectures. The following two preprints summarize our recent work in physical Event Generator.

Y. Alanazi, P. Ambrozewicz, M. P. Kuchera, Yaohang Li, T. Liu, R. E. McClellan, W. Melnitchouk, E. Pritchard, M. Robertson, N. Sato, R. Strauss, L. Velasco, “AI-based Monte Carlo event generator for electron-proton scattering,” arXiv:2008.03151, 2020.
Y. Alanazi, N. Sato, T. Liu, W. Melnitchouk, M. P. Kuchera, E. Pritchard, M. Robertson, R. Strauss, L. Velasco, Yaohang Li, “Simulation of electron-proton scattering events by a Feature-Augmented and Transformed Generative Adversarial Network (FAT-GAN),” arXiv:2001.11103, 2020.

Our FAT-GAN package can be found at https://github.com/yaohangli/FAT-GAN.

This work was supported by the Jefferson Lab LDRD project No. LDRD19-13 and No. LDRD20-18, and in part by the U.S. Department of Energy contract DE-AC05-06OR23177, under which Jefferson Science Associates, LLC, manages and operates Jefferson Lab.

2). Nuclear Femtography: The goal of the nuclear femtography problem is to understand nucleon’s internal 3-dimensional quark and gluon structure. We are currently working a neural network architecture for solving the inverse problem.

This work is supported by Southeastern Universities Research Association (SURA) Center for Nuclear Femtography Initiative.

Randomized Algorithms for Numerical Linear Algebra

Matrix operations are the fundamental components in many data analysis and computational simulation applications. In the era of big data, many traditional numerical methods for matrix operations, designed to minimize floating-point operations, fail to scale or are incapable of handling the complexity emerging with large data sets. Due to their attractive properties including fast approximation, pass efficiency, flexible implementation, memory efficiency, and natural parallelism. randomized algorithms can nicely address a lot of these issues.

Our main works in randomized algorithms for numerical linear algebra includes:

1) Monte Carlo linear solvers

H. Ji, M. Mascagni, Yaohang Li, “Convergence Analysis of Markov Chain Monte Carlo Linear Solvers using Ulam-von Neumann Algorithm,” SIAM Journal on Numerical Analysis, 51(4): 2107-2122, 2013.

2) Low-rank approximation

W. Yu, Y. Gu, Yaohang Li, "Efficient Randomized Algorithms for the Fixed-Precision Low-Rank Matrix Approximation," SIAM Journal on Matrix Analysis and Applications, 39(3): 1339-1359, 2018.
H. Ji, W. Yu, Yaohang Li, “A Rank Revealing Randomized Singular Value Decomposition (R3SVD) Algorithm for Low-rank Matrix Approximations,” arXiv:1605.08134, 2016.

3) Pass-efficient randomized algorithms

W. Yu, Y. Gu, J. Li, S. Liu, Yaohang Li, "Single-Pass PCA of Large High-Dimensional Data," Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, 2017.

4) Matrix Completion using randomized SVD

X. Feng, W. Yu, Yaohang Li, "Faster Matrix Completion Using Randomized SVD," Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence, (ITCAI2018), Volos, 2018.

5) Fast verification of product of large matrices

H. Ji, M. Mascagni, Yaohang Li, “Gaussian Variant of Freivalds' Algorithm for Efficient and Reliable Matrix Product Verification,” arXiv:1705.10449, 2017.

System Biology

We are interested in applying our randomized algorithms to practical applications. One of the main applications is in system biology, due to these randomized algorithms’ capability of efficiently sampling large datasets and extracting global patterns. By treating the association matrix between biological entities as an incomplete matrix, the randomized matrix completion algorithms, coupled with other machine learning techniques, such as regularization, optimization, feature induction, or deep learning, can be used to derive the underlying unknown associations effectively.

M. Yang, Yaohang Li, J. Wang, “Feature and Nuclear Norm Minimization for Matrix Completion,” IEEE Transactions on Knowledge and Data Engineering, in press, 2020.
M. Yang, H. Luo, F. Wu, Yaohang Li, J. Wang, "Overlap matrix completion for predicting drug-associated indications," PLOS Computational Biology, 15(12): e1007541, 2019.
M. Yang, H. Luo, Yaohang Li, J. Wang, "Drug repositioning based on bounded nuclear norm regularization," Bioinformatics, 35(14): i455-i463, 2019.

This approach has demonstrated success, not only in accuracy improvement, but also in revealing biological patterns, in a variety of bioinformatics problems, including drug repositioning, lncRNA-disease association prediction, and miRNA-target association prediction. Here are some of our representative publications.

H. Luo, M. Yang, M. Li, Yaohang Li, F. Wu, J. Wang, “Biomedical data and computational models for drug repositioning: a comprehensive review,” Briefings in Bioinformatics, bbz176, 2020.
H. Jiang, M. Yang, X. Chen, M. Li, Yaohang Li, J. Wang, “miRTMC: A miRNA target prediction method based on matrix completion algorithm,” IEEE Journal of Biomedical and Health Informatics, accepted, 2020.
C. Lu, M. Yang, F. Luo, F. Wu, M. Li, Y. Pan, Yaohang Li, J. Wang, "Prediction of lncRNA-Disease Associations based on Inductive Matrix Completion," Bioinformatics, 34(19):3357-3364 , 2018.
H. Luo, M. Li, S. Wang, Q. Liu, Yaohang Li, J. Wang, "Computational Drug Repositioning using Low-Rank Matrix Approximation and Randomized Algorithms," Bioinformatics, 34(11):1904-1912 , 2018.

Protein Structure Modeling

I have an ongoing interest in understanding and modeling protein structures. The major works from our group include:

1) The discovery of a new conserved conformational cluster of phosphorylated Tyrosine sidechain structures

M. Abdelrasoul, K. Ponniah, A. Mao, M. S. Warden, W. Elhefnawy, Yaohang Li, S. M. Pascal, "Conformational Clusters of Phosphorylated Tyrosine," Journal of the American Chemical Society, 139: 17632-17638, 2017.

2) An accurate loop modeling method with subangstrom accuracy

J. López-Blanco, A. Canosa-Valls, Yaohang Li, P. Chacón, "RCD+: Fast loop modeling server," Nucleic Acids Research, 44(W1): W395-W400, 2016.
Yaohang Li, “Conformational Sampling in Template-Free Protein Loop Structure Modeling: An Overview,” Computational and Structural Biotechnology Journal, 5(6): e201302003, 2013.
Yaohang Li, I. Rata, E. Jakobsson, “Sampling Multiple Scoring Functions Can Improve Protein Loop Structure Prediction Accuracy,” Journal of Chemical Information and Modeling, 51(7): 1656-1666, 2011.

3) A series of prediction servers for predicting protein properties with improved accuracy and reliability

W. Xuan, N. Liu, N. Huang, Yaohang Li, J. Wang, “CLPred: A sequence-based protein crystallization predictor using BLSTM neural network,” 19th European Conference on Computational Biology (ECCB2020), accepted, 2020.
A. Yaseen, Yaohang Li, “Template-based C8-SCORPION: a Protein 8-state Secondary Structure Prediction Method using Structural Information and Context-based Features,” BMC Bioinformatics, 15(S8): S3, 2014.
A. Yaseen, Yaohang Li, “Context-based Features Enhance Protein Secondary Structure Prediction Accuracy,” Journal of Chemical Information and Modeling, 54(3): 992-1002, 2014.
A. Yaseen, Yaohang Li, “Dinosolve: A Protein Disulfide Bonding Prediction Server using Context-based Features to Enhance Prediction Accuracy,” BMC Bioinformatics, 14(S13): S9, 2013.

4) Parallel algorithms to fast evaluate inter-residual interactions

A. Yaseen, H. Ji, Yaohang Li, "A Load-Balancing Workload Distribution Scheme for Three-Body Interaction Computation on Graphics Processing Units (GPU)," Journal of Parallel and Distributed Computing, 87: 91-101, 2016.
A. Yaseen, Yaohang Li, “Accelerating Knowledge-based Energy Evaluation in Protein Structure Modeling with Graphics Processing Units,” Journal of Parallel and Distributed Computing, 72(2): 297-307, 2012.

5) Protein strcuture modeling potentials

W. Elhefnawy, L. Chen, Y. Han, Yaohang Li, “ICOSA: A Distance-dependent, Orientation-specific Coarse-grain Contact Potential for Protein Structure Modeling,” Journal of Molecular Biology, 427(15): 2562-2576, 2015.
Yaohang Li, H. Liu, I. Rata, E. Jakobsson, “Building a Knowledge-based Statistical Potential by Capturing High-Order Inter-Residue Interactions and its Applications in Protein Secondary Structure Assessment,” Journal of Chemical Information and Modeling, 53(2): 500-508, 2013.
I. Rata, Yaohang Li, E. Jakobsson, “Backbone Statistical Potential from Local Sequence-Structure Interactions in Protein Loops,” Journal of Physical Chemistry B, 114(5): 1859-1869, 2010.

6) Fragment libraries

W. Elhefnawy, M. Li, J. Wang, Yaohang Li, “DeepFrag-k: a fragment-based deep learning approach for protein fold recognition,” BMC Bioinformatics, in press, 2020.
W. Elhefnawy, M. Li, J. Wang, Yaohang Li, "Decoding the Structural Keywords in Protein Structure Universe," Journal of Computer Science and Technology, 34(1): 3-15, 2019.
Z. Haratipour, H. Aldabagh, Yaohang Li, L. H. Greene, "Network Connectivity, Centrality and Fragmentation in the Greek-Key Protein Topology," The Protein Journal, 38(5): 497-505, 2019.

This work is funded as under NSF Computing and Communication Foundations: CAREER: Novel Sampling Approaches for Protein Modeling Applications.
CCF-0845702. National Science Foundation