Efficient Generative Modelling of Protein Structure Fragments using a Deep Markov Model

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Documents

  • Fulltext

    Final published version, 1.45 MB, PDF document

Fragment libraries are often used in protein structure prediction, simulation and design as a means to significantly reduce the vast conformational search space. Current state-of-the-art methods for fragment library generation do not properly account for aleatory and epistemic uncertainty, respectively due to the dynamic nature of proteins and experimental errors in protein structures. Additionally, they typically rely on information that is not generally or readily available, such as homologous sequences, related protein structures and other complementary information. To address these issues, we developed BIFROST, a novel take on the fragment library problem based on a Deep Markov Model architecture combined with directional statistics for angular degrees of freedom, implemented in the deep probabilistic programming language Pyro. BIFROST is a probabilistic, generative model of the protein backbone dihedral angles conditioned solely on the amino acid sequence. BIFROST generates fragment libraries with a quality on par with current state-of-the-art methods at a fraction of the run-time, while requiring considerably less information and allowing efficient evaluation of probabilities.
Original languageEnglish
Title of host publicationInternational Conference on Machine Learning, 18-24 July 2021, Virtual
PublisherPMLR
Publication date2021
Pages10258-10267
Publication statusPublished - 2021
Event38th International Conference on Machine Learning - Virtual
Duration: 18 Jul 202124 Jul 2021

Conference

Conference38th International Conference on Machine Learning
LocationVirtual
Periode18/07/202124/07/2021
SeriesProceedings of Machine Learning Research
Volume139
ISSN1938-7228

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 300919805