Machine Learning in Structural Biology

Workshop at the 36th Conference on Neural Information Processing Systems

Saturday, December 3rd, 2022

About

In only a few years, structural biology, the study of the 3D structure or shape of proteins and other biomolecules, has been transformed by breakthroughs from machine learning algorithms. Machine learning models are now routinely being used by experimentalists to predict structures that can help answer real biological questions (e.g. AlphaFold), accelerate the experimental process of structure determination (e.g. computer vision algorithms for cryo-electron microscopy), and have become a new industry standard for bioengineering new protein therapeutics (e.g. large language models for protein design). Despite all this progress, there are still many active and open challenges for the field, such as modeling protein dynamics, predicting higher order complexes, pushing towards generalization of protein folding physics, and relating the structure of proteins to the in vivo and contextual nature of their underlying function. These challenges are diverse and interdisciplinary, motivating new kinds of machine learning systems and requiring the development and maturation of standard benchmarks and datasets.

In this exciting time for the field, our workshop, “Machine Learning in Structural Biology” (MLSB), seeks to bring together relevant experts, practitioners, and students across a broad community to focus on these challenges and opportunities. We believe the union of these communities, including the geometric and graph learning communities, NLP researchers, and structural biologists with domain expertise at our workshop can help spur new ideas, spark collaborations, and advance the impact of machine learning in structural biology. Progress at this intersection promises to unlock new scientific discoveries and the ability to design novel medicines.

MLSB will be an in-person workshop with hybrid elements (livestream + chat for questions) in Room 288 - 289 on Saturday, December 3rd, 2022 at NeurIPS.

Stay updated on changes and workshop news by joining our mailing list.

FAQ

WIFI
network name: NeurIPS
password: conference

SCHEDULE MLSB will begin at 8:30am CST and end at 5:00pm CST on Saturday, Dec 3. The physical workshop will take place in Room 288 - 289 in the New Orleans Convention Center. The schedule of talks and poster sessions is available at our website, mlsb.io and on the NeurIPS website.

PANEL - We will be hosting a panel discussion including our invited speakers on research directions in Machine Learning for Structural Biology from 2:50-3:50p CST. We are soliciting additional discussion topics and questions -- let us know what you would like to hear about by filling out this google form: https://forms.gle/fiJ6ZzsmepKL87tMA.

POSTER SESSIONS - We have scheduled two poster sessions at 11:15-12:15pm CST and 3:50-4:50pm CST. Posters can be set up before the conference begins, between 8:00-8:30am, during the break between 9:45-10:05am, or at the beginning of the first poster session. We ask that posters are 24"W x 36"H, as there may be space constraints in the room that limit the display of wider posters. However, if you already have a poster of a different size, it is acceptable to use that.

LUNCH + HAPPY HOUR - MLSB will be providing boxed lunches to attendees during the workshop. In addition, there will be a happy hour during the afternoon poster session, where drinks will be served (non-alcoholic options will be available).

ORALS - Each oral will have a 15 minute timeslot. We anticipate that talks should last approximately 10 minutes, allowing 5 minutes for questions and transition between speakers. If you are presenting in-person, you may check slides before the workshop (8:00-8:30am CST) or during the break (9:45-10:05am CST).

Please check the schedule on our website, mlsb.io, for your presentation time. If you are not able to make this time, let us know as soon as possible. Additionally, below we have listed the format we currently have for each oral presentation (In-person, Live over Zoom, or Prerecorded). If this information is not correct, please let us know. You can contact us at workshopmlsb@gmail.com with any issues.

VIRTUAL PARTICIPATION - MLSB will have some hybrid elements. There will be a livestream (a link is available at the NeurIPS website. Chat will be available on the NeurIPS workshop site for asking questions. Currently, there is no official venue for presenting posters virtually at MLSB. However, we are considering options to enable greater virtual participation over the next few days. If you have a poster at MLSB but plan to participate only virtually, please let us know by adding your name to this google form as soon as possible.

Invited Speakers

David Fleet

David Fleet

Professor of Computer Science at University of Toronto.

Show/Hide Bio
Alexander Rives

Alexander Rives

Research Scientist at Meta AI.

Show/Hide Bio
Kathryn Tunyasuvunakool

Kathryn Tunyasuvunakool

Staff Research Scientist at DeepMind.

Show/Hide Bio
Max Welling

Max Welling

Distinguished Scientist at Microsoft Research.

Show/Hide Bio

Schedule (CST)

08:30 Opening Remarks
08:35 Invited Speaker - David Fleet
08:40
08:45
08:50
08:55
09:00 Contributed Talk

Latent Space Diffusion Models of Cryo-EM Structure
Karsten Kreis · Tim Dockhorn · Zihao Li · Ellen Zhong

09:05
09:10
09:15 Contributed Talk

Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models
Namrata Anand · Tudor Achim

09:20
09:25
09:30 Contributed Talk

Predicting conformational landscapes of known and putative fold-switching proteins using AlphaFold2
Hannah Wayment-Steele · Sergey Ovchinnikov · Lucy Colwell · Dorothee Kern

09:35
09:40
09:45 Break
09:50
09:55
10:00
10:05 Invited Speaker - Kathryn Tunyasuvunakool
10:10
10:15
10:20
10:25
10:30 Contributed Talk

SWAMPNN: End-to-end protein structures alignment
Jeanne Trinquier · Samantha Petti · Shihao Feng · Johannes Soeding · Martin Steinegger · Sergey Ovchinnikov

10:35
10:40
10:45 Contributed Talk

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
Gabriele Corso · Hannes Stärk · Bowen Jing · Regina Barzilay · Tommi Jaakkola

10:50
10:55
11:00 Contributed Talk

Dynamic-backbone protein-ligand structure prediction with multiscale generative diffusion models
Zhuoran Qiao · Weili Nie · Arash Vahdat · Thomas Miller · Anima Anandkumar

11:05
11:10
11:15 Poster Session
11:20
11:25
11:30
11:35
11:40
11:45
11:50
11:55
12:00
12:05
12:10
12:15 Lunch
12:20
12:25
12:30
12:35
12:40
12:45
12:50
12:55
01:00 Invited Speaker - Max Welling
01:05
01:10
01:15
01:20
01:25 Contributed Talk

EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation
Jae Hyeon Lee · Payman Yadollahpour · Andrew Watkins · Nathan Frey · Andrew Leaver-Fay · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Aviv Regev · Richard Bonneau

01:30
01:35
01:40 Contributed Talk

Predicting Ligand – RNA Binding Using E3-Equivariant Network and Pretraining
Zhenfeng Deng · Ruichu Gu · Hangrui Bi · Xinyan Wang · Zhaolei Zhang · Han Wen

01:45
01:50
01:55 Invited Speaker - Alexander Rives
02:00
02:05
02:10
02:15
02:20 Contributed Talk

Seq2MSA: A Language Model for Protein Sequence Diversification
Pascal Sturmfels · Roshan Rao · Robert Verkuil · Zeming Lin · Tom Sercu · Adam Lerer · Alex Rives

02:25
02:30
02:35 Contributed Talk

Metal3D: Accurate prediction of transition metal ion location via deep learning
Simon Dürr

02:40
02:45
02:50 Break
02:55
03:00
03:05 Panel Discussion
03:10
03:15
03:20
03:25
03:30
03:35
03:40
03:45
03:50 Closing Remarks
03:55 Poster Session / Happy Hour
04:00
04:05
04:10
04:15
04:20
04:25
04:30
04:35
04:40
04:45
04:50
04:55

Accepted Papers

  • 3D alignment of cryogenic electron microscopy density maps by minimizing their Wasserstein distance

    Aryan Tajmir Riahi, Geoffrey Woollard, Frederic Poitevin, Anne Condon, Khanh Dao Duc

    [paper]

  • 3D Reconstruction of Protein Complex Structures Using Synthesized Multi-View AFM Images

    Jaydeep Rade, Soumik Sarkar, Anwesha Sarkar, Adarsh Krishnamurthy

    [paper] [preprint]

  • A Benchmark Framework for Evaluating Structure-to-Sequence Models for Protein Design

    Jeffrey Chan, Seyone Chithrananda, David Brookes, Sam Sinai

  • A Federated Learning benchmark for Drug-Target Interaction

    Filip Svoboda, Gianluca Mittone, Nicholas Lane, Pietro Lió

    [paper]

  • Adversarial Attacks on Protein Language Models

    Ginevra Carbone, Francesca Cuturello, Luca Bortolussi, Alberto Cazzaniga

    [paper] [preprint]

  • Agile Language Transformers for Recombinant Protein Expression Optimization

    Jeliazko Jeliazkov, Maxim Shapovalov, Diego del Alamo, Matt Sternke, Joel Karpiak

    [paper]

  • Allele-conditional attention mechanism for HLA-peptide complex binding affinity prediction

    Rodrigo Hormazabal, Doyeong Hwang, Kiyoung Kim, Sehui Han, Kyunghoon Bae, Honglak Lee

  • Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness

    Sharrol Bachas, Goran Rakocevic, David Spencer, Anand Sastry, Robel Haile, John Sutton, George Kasun, Andrew Stachyra, Jahir Gutierrez, Edriss Yassine, Borka Medjo, Vincent Blay, Christa Kohnert, Jennifer Stanton, Alexander Brown, Nebojsa Tijanic, Cailen McCloskey, Rebecca Viazzo, Rebecca Consbruck, Hayley Carter, Simon Gottreich-Levine, Shaheed Abdulhaqq, Jacob Shaul, Abigail Ventura, Randal Olson, Engin Yapici, Joshua Meier, Sean McClain, Matthew Weinstock, Gregory Hannum, Ariel Schwartz, Miles Gander, Roberto Spreafico

    [preprint]

  • APPRAISE: ranking engineered proteins by target binding propensity using pair-wise, competitive structure modeling and physics-informed analysis

    Xiaozhe Ding, Xinhong Chen, Erin Sullivan, Tim Miles, Viviana Gradinaru

  • ChemSpacE: Interpretable and Interactive Chemical Space Exploration

    Yuanqi Du, Xian Liu, Nilay Shah, Shengchao Liu, Jieyu Zhang, Bolei Zhou

    [preprint]

  • Conditional Invariances for Conformer Invariant Protein Representations

    Balasubramaniam Srinivasan, Vassilis Ioannidis, Soji Adeshina, Mayank Kakodkar, George Karypis, Bruno Ribeiro

    [paper] [preprint]

  • Conditional neural processes for molecules

    Miguel Garcia-Ortegon, Andreas Bender, Sergio Bacallado

    [paper] [preprint]

  • ContactNet: Geometric-Based Deep Learning Model for Predicting Protein-Protein Interactions

    Matan Halfon, Dina Schneidman, Tomer Cohen, raanan fattal

    [paper]

  • Contrasting drugs from decoys

    Samuel Sledzieski, Rohit Singh, Lenore J Cowen, Bonnie Berger

    [paper] [preprint]

  • Deep Local Analysis estimates effects of mutations on protein-protein interactions

    Yasser Mohseni Behbahani, Elodie Laine, Alessandra Carbone

    [paper] [preprint]

  • Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization

    Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon

    [paper] [preprint]

  • DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking

    Gabriele Corso, Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola

    [paper] [preprint]

  • Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

    Jason Yim, Brian L Trippe, Doug Tischer, David Baker, Tamara Broderick, Regina Barzilay, Tommi Jaakkola

    [paper] [preprint]

  • Does Inter-Protein Contact Prediction Benefit from Multi-Modal Data and Auxiliary Tasks?

    Arghamitra Talukder, Rujie Yin, Yang Shen, Yuning You

    [paper]

  • Dynamic-backbone protein-ligand structure prediction with multiscale generative diffusion models

    Zhuoran Qiao, Weili Nie, Arash Vahdat, Thomas Miller, Anima Anandkumar

    [preprint]

  • End-to-end accurate and high-throughput modeling of antibody-antigen complexes

    Tomer Cohen, Dina Schneidman

    [paper]

  • EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation

    Jae Hyeon Lee, Payman Yadollahpour, Andrew Watkins, Nathan Frey, Andrew Leaver-Fay, Stephen Ra, Vladimir Gligorijevic, Kyunghyun Cho, Aviv Regev, Richard Bonneau

    [paper] [preprint]

  • EvoOpt: an MSA-guided, fully unsupervised sequence optimization pipeline for protein design

    Hideki Yamaguchi, Yutaka Saito

    [paper]

  • Explainable Deep Generative Models, Ancestral Fragments, and Murky Regions of the Protein Structure Universe

    Eli Draizen, Cameron Mura, Philip Bourne

    [paper] [preprint]

  • ExpressUrself: A spatial model for predicting recombinant expression from mRNA sequence

    Michael P Dunne, Javier Caceres-Delpiano

    [paper]

  • Fast and Accurate Antibody Structure Prediction without Sequence Homologs

    Jiaxiang Wu, Fandi Wu, Biaobin Jiang, Wei Liu, Peilin Zhao

    [paper] [preprint]

  • Fast protein structure searching using structure graph embeddings

    Joe Greener, Kiarash Jamali

    [paper] [preprint]

  • Heterogeneous reconstruction of deformable atomic models in Cryo-EM

    Youssef Nashed, Ariana Peck, Julien Martel, Axel Levy, Bongjin Koo, Gordon Wetzstein, Nina Miolane, Daniel Ratner, Frederic Poitevin

    [paper] [preprint]

  • Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space

    Gian Marco Visani, Michael Pun, Armita Nourmohammad

    [paper] [preprint]

  • Identifying endogenous peptide receptors by combining structure and transmembrane topology prediction

    Felix Teufel, Jan Christian Refsgaard, Christian Toft Madsen, Carsten Stahlhut, Mads Grønborg, Dennis Madsen, Ole Winther

    [paper] [preprint]

  • Improving Molecular Pretraining with Complementary Featurizations

    Yanqiao Zhu, Dingshuo Chen, Yuanqi Du, Yingze Wang, Qiang Liu, Shu Wu

    [paper] [preprint]

  • Improving Molecule Properties Through 2-Stage VAE

    Chenghui Zhou, Barnabas Poczos

    [paper]

  • Improving Protein Subcellular Localization Prediction with Structural Prediction & Graph Neural Networks

    Geoffroy Dubourg-Felonneau, Arash Abbasi, Eyal Akiva, Lawrence Lee

    [paper]

  • Investigating graph neural network for RNA structural embedding

    vaitea opuu, Helene Bret

    [paper]

  • Investigating the conformational landscape of AlphaFold2-predicted protein kinase structures

    Carmen Al Masri, Francesco Trozzi, Marcel Patek, Anna Cichonska, Balaguru Ravikumar, Rayees Rahman

  • Large-scale self-supervised pre-training on protein three-dimensional structures

    Ilya Senatorov

  • Latent Space Diffusion Models of Cryo-EM Structures

    Karsten Kreis, Tim Dockhorn, Zihao Li, Ellen Zhong

    [preprint]

  • Learning Free Energy Pathways through Reinforcement Learning of Adaptive Steered Molecular Dynamics

    Nicholas Ho, John Kevin Cava, John Vant, Ankita Shukla, Jacob Miratsky, Pavan Turaga, Ross Maciejewski, Abhishek Singharoy

    [paper] [preprint]

  • Learning from physics-based features improves protein property prediction

    Amy Wang, Ava Soleimany, Alex X Lu, Kevin Yang

    [paper]

  • Ligand-aware protein sequence design using protein self contacts

    Jody Mou, Benjamin Fry, Chun-Chen Yao, Nicholas Polizzi

    [paper]

  • Lightweight Equivariant Graph Representation Learning for Protein Engineering

    Bingxin Zhou, · Kai Yi, Xinye Xiong, Pan Tan, Liang Hong, Yuguang Wang

    [paper] [preprint]

  • Masked inverse folding with sequence transfer for protein representation learning

    Kevin Yang, Niccoló Zanichelli, Hugh Yeh

    [preprint]

  • Membrane and microtubule rapid instance segmentation with dimensionless instance segmentation by learning graph representations of point clouds

    Robert Kiewisz, Tristan Bepler

    [paper]

  • Metal3D: Accurate prediction of transition metal ion location via deep learning

    Simon Dürr

    [preprint]

  • MLPfold: Identification of transition state ensembles in molecular dynamics simulations using machine learning

    Preetham Venkatesh

    [paper]

  • ModelAngelo: Automated Model Building in Cryo-EM Maps

    Kiarash Jamali, Dari Kimanius, Sjors Scheres

    [paper] [preprint]

  • Online Inference of Structure Factor Amplitudes for Serial X-ray Crystallography

    Kevin Dalton, Doeke Hekstra

    [paper]

  • Peptide-MHC Structure Prediction With Mixed Residue and Atom Graph Neural Network

    Antoine Delaunay, Yunguan Fu, Alberto Bégué, Robert McHardy, Bachir Djermani, Liviu Copoiu, Michael Rooney, Andrey Tovchigrechko, Marcin Skwark, Nicolas Lopez Carranza, Maren Lang, Karim Beguir, Ugur Sahin

    [paper] [preprint]

  • Physics aware inference for the cryo-EM inverse problem: anisotropic network model heterogeneity, global pose and microscope defocus

    Geoffrey Woollard, Shayan Shekarforoush, Frank Wood, Marcus Brubaker, Khanh Dao Duc

    [paper]

  • Physics-aware Graph Neural Network for Accurate RNA 3D Structure Prediction

    Shuo Zhang, Lei Xie, Yang Liu

    [paper] [preprint]

  • Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC

    Patrick Emami, Aidan Perreault, Jeffrey Law, David Biagioni, Peter St. John

    [paper]

  • Predicting conformational landscapes of known and putative fold-switching proteins using AlphaFold2

    Hannah Wayment-Steele, Sergey Ovchinnikov, Lucy Colwell, Dorothee Kern

    [paper] [preprint]

  • Predicting Immune Escape with Pretrained Protein Language Model Embeddings

    Kyle Swanson, Howard Chang, James Zou

    [paper] [preprint]

  • Predicting interaction partners using masked language modeling

    Damiano Sgarbossa, Umberto Lupo, Anne-Florence Bitbol

    [paper]

  • Predicting Ligand – RNA Binding Using E3-Equivariant Network and Pretraining

    Zhenfeng Deng, Ruichu Gu, Hangrui Bi, Xinyan Wang, Zhaolei Zhang, Han Wen

    [paper]

  • Pretrained protein language model transfer learning: is the final layer representation what we want?

    Francesca-Zhoufan Li, Ava Soleimany, Kevin Yang, Alex X Lu

    [paper]

  • Protein Sequence Design in a Latent Space via Model-based Reinforcement Learning

    Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung, Hyunjoo Ro, Ho Min Kim, Meeyoung Cha

    [paper] [preprint]

  • Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models

    Namrata Anand, Tudor Achim

    [preprint]

  • Protein structure generation via folding diffusion

    Kevin Wu, Kevin Yang, Rianne van den Berg, James Zou, Alex X Lu, Ava Soleimany

    [preprint]

  • Protein-Protein Docking with Iterative Transformer

    Lee-Shin Chu, Jeffrey Ruffolo, Jeffrey Gray

    [paper]

  • Reconstruction of polymer structures from contact maps with Deep Learning

    Atreya Dey

  • Representation Learning on Biomolecular Structures using Equivariant Graph Attention

    Tuan Le, Frank Noe, Djork-Arné Clevert

    [paper] [preprint]

  • Representation of missense variants for predicting modes of action

    Guojie Zhong, Yufeng Shen

    [paper]

  • RL Boltzmann Generators for Conformer Generation in Data-Sparse Environments

    Yash Patel, Ambuj Tewari

    [paper] [preprint]

  • Seq2MSA: A Language Model for Protein Sequence Diversification

    Pascal Sturmfels, Roshan Rao, Robert Verkuil, Zeming Lin, Tom Sercu, Adam Lerer, Alex Rives

    [paper]

  • So ManyFolds, So Little Time: Efficient Protein Structure Prediction With pLMs and MSAs

    Thomas D Barrett, Amelia Villegas-Morcillo, Louis Robinson, Benoit Gaujac, Karim Beguir, Arthur Flajolet

    [paper] [preprint]

  • Structure-based Drug Design with Equivariant Diffusion Models

    Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, Bruno Correia

    [paper] [preprint]

  • SWAMPNN: End-to-end protein structures alignment

    Jeanne Trinquier, Samantha Petti, Shihao Feng, Johannes Soeding, Martin Steinegger, Sergey Ovchinnikov

    [paper]

  • T-cell receptor specific protein language model for prediction and interpretation of epitope binding (ProtLM.TCR)

    Ahmed Essaghir

  • The geometry of hidden representations of protein language models

    Lucrezia Valeriani, Francesca Cuturello, Alessio Ansuini, Alberto Cazzaniga

    [paper] [preprint]

  • Towards automated crystallographic structure refinement with a differentiable pipeline

    Minhuan Li, Doeke Hekstra

    [paper]

  • Training self-supervised peptide sequence models on artificially chopped proteins

    Gil Sadeh, Zichen Wang, Jasleen Grewal, Huzefa Rangwala, Layne Price

    [paper] [preprint]

  • Unsupervised language models for disease variant prediction

    Allan Zhou, Nicholas C. Landolfi, Daniel ONeill

    [paper]

  • Using domain-domain interactions to probe the limitations of MSA pairing strategies

    Alex Hawkins-Hooker, David Jones, Brooks Paige

    [paper]

  • Visualizing DNA reaction trajectories with deep graph embedding approaches

    Chenwei Zhang, Anne Condon, Khanh Dao Duc

    [paper]

  • What is hidden in the darkness? Characterization of AlphaFold structural space

    Janani Durairaj, Joana Maria Soa Pereira, Mehmet Akdel, Torsten Schwede

    [preprint]

  • ZymCTRL: a conditional language model for the controllable generation of artificial enzymes

    Noelia Ferruz

    [paper]

Organizers

Jonas
Adler

Photo of Jonas Adler
DeepMind

Namrata
Anand

Photo of Namrata Anand
Stanford

John
Ingraham

Photo of John Ingraham
Generate Biomedicines

Sergey
Ovchinnikov

Photo of Sergey Ovchinnikov
Harvard University

Roshan
Rao

Photo of Roshan Rao
Meta AI

Ellen
Zhong

Photo of Ellen Zhong
Princeton

Sponsors

Atomic Logo
TRV Logo
Novo Nordisk Logo


Prescient Logo
Generate Bio Logo