Machine Learning in Structural Biology

Workshop at the 38th Conference on Neural Information Processing Systems

15th December 2024

About

Structural biology, the study of the 3D structure or shape of proteins and other biomolecules, has been transformed by breakthroughs from machine learning algorithms. Machine learning models are now routinely used by experimentalists to predict structures to aid in hypothesis generation and experimental design, accelerate the experimental process of structure determination (e.g. computer vision algorithms for cryo-electron microscopy), and have become a new industry standard for bioengineering new protein therapeutics (e.g. large language models for protein design). Despite all of this progress, there are still many active and open challenges for the field, such as modeling protein dynamics, predicting the structure of other classes of biomolecules such as RNA, learning and generalizing the underlying physics driving protein folding, and relating the structure of isolated proteins to the in vivo and contextual nature of their underlying function. These challenges are diverse and interdisciplinary, motivating new kinds of machine learning methods and requiring the development and maturation of standard benchmarks and datasets.

Machine Learning in Structural Biology (MLSB), seeks to bring together field experts, practitioners, and students from across academia, industry research groups, and pharmaceutical companies to focus on these new challenges and opportunities. This year, MLSB aims to bridge the theoretical and practical by addressing the outstanding computational and experimental problems at the forefront of our field. The intersection of artificial intelligence and structural biology promises to unlock new scientific discoveries and develop powerful design tools.

MLSB will be an in-person NeurIPS workshop on 15th December 2024 in MTG Rooms 11 & 12 at the Vancouver Convention Center.

Please contact the organizers at workshopmlsb@gmail.com with any questions.

Stay updated on changes and workshop news by joining our mailing list.

Presenter Information

Congratulations to all accepted presenters! Please find some information on deadlines and expectations leading up to the MLSB Workshop!

Posters

We ask all authors to prepare a poster that can be presented as part of our workshop. Posters must be 24W x 36H inches and will be taped to the wall. Poster boards will not be provided at the workshop. We specifically ask for portrait layout because we will be tight on wall space.

Additionally, a virtual copy of each poster must be uploaded to the NeurIPS poster upload portal, by Thursday, December 12. Posters must be PNG with no more than 5120 width x 2880 height (no more than 10 MB). Thumbnail images should be 320 width x 256 height PNG and no more than 5 MB. We know these are different dimensions than what we're asking for in-person posters, the poster upload dimensions are set by NeurIPS.

Users should log in using the neurips.cc account associated with their CMT email address. If they did not already have a neurips.cc account, then it should have automatically been created and can be accessed by resetting the password.

Paper Camera-Ready

De-anonymized, camera-ready versions of the workshop paper will be due on Microsoft CMT by Monday, Dec 2. Papers must indicate that they are NeurIPS MLSB workshop papers by using the modified NeurIPS style file here. Papers should be compiled with the `final` argument, e.g. \usepackage[final]{neurips_mlsb_2024}

We plan to make all camera-ready submitted papers available on the workshop website (https://www.mlsb.io/). If you would prefer that your work not be shared, then there is no need to submit a camera-ready version..

Travel Award

This year we will try to cover as many workshop registrations as possible for student/academic attendees with oral presentations or posters who need financial assistance. If you would like to be considered, please fill out the following form by Friday, Nov 15 Friday, Nov 8. If you have any questions, please don't hesitate to contact us at workshopmlsb@gmail.com.

Key Dates

Application for Registration Reimbursement: Friday, November 15th, 2024 November 8th, 2024, at 11:59PM, Anywhere on Earth.

Camera-Ready PDF due on Microsoft CMT: Monday, December 2nd, 2024.

Poster due: Thursday, December 12th, 2024.

Call For Papers

We welcome submissions of short papers leveraging machine learning to address problems in structural biology, including but not limited to:

  • Prediction of biomolecular structures, complexes, and interactions
  • Design of or generative models for structure and/or sequence
  • Methods for structure determination / biophysics (Cryo-EM/ET, NMR, crystallography, single-molecule methods, etc.)
  • Geometric and symmetry-aware deep learning
  • Conformational change, ensembles, and dynamics
  • Integration of biomolecular physics
  • Function and property prediction
  • Structural bioinformatics and systems biology
  • Therapeutic screening and design
  • Language models and other implicit representations of protein structure
  • Forward-looking position papers

We request anonymized PDF submissions by Friday, September 20, 2024, at 11:59PM, AoE (anywhere on earth) through our submission website on CMT.

Papers should present novel work that has not been previously accepted at an archival venue at the time of submission. Submissions should be a maximum of 5 pages (excluding references and appendices) in PDF format, using the NeurIPS style files, and fully anonymized as per the requirements of NeurIPS. The NeurIPS checklist can be omitted from the submission. Submissions meeting these criteria will go through a light, double-blind review process. Reviewer comments will be returned to the authors as feedback.

Accepted papers will be invited to present a poster at the workshop, with nominations of spotlight talks at the discretion of the organizers.

New this year, we will have two special tracks for models for predicting protein-protein and protein-ligand interactions, evaluated on two new large-scale benchmarks, PINDER and PLINDER. The highest-performing open-source methods from these two tracks will receive invitations to a spotlight presentation. Stay tuned for more information on how to submit to these tracks.

Like last year, authors that commit to open-sourcing code, model weights, and datasets used in the work will be given precedence for spotlight talks. This change only affects consideration for spotlights. Submissions that cannot make this commitment will still be considered for posters and will not be penalized for acceptance.

This workshop is considered non-archival, however, authors of accepted contributions will have the option to make their work available through the workshop website. Presentation of work that is concurrently in submission is welcome. We welcome papers sharing encouraging work-in-progress results or forward-looking position papers that would benefit from feedback and community discussion at our workshop.

Important Dates

Submission Deadline: Friday, September 20th, 2024, at 11:59PM, Anywhere on Earth.

Notification of Acceptance: Wednesday, October 9th, 2024.

Workshop Date: December 15th 2024, Vancouver, Canada.

Invited Speakers

Erika Alden deBenedictis

Erika Alden DeBenedictis

Group Leader, Francis Crick Institute

Show/Hide Bio
Gabe Rocklin

Gabe Rocklin

Assistant Professor, Department of Pharmacology, Northwestern University.

Show/Hide Bio
Jennifer Listgarten

Jennifer Listgarten

Professor in EECS, UC Berkeley

Show/Hide Bio
Milot Mirdota

Milot Mirdita

Postdoctoral Researcher, Seoul National University.

Show/Hide Bio
Noelia Ferruz

Noelia Ferruz

Group Leader, Center of Genomic Regulation, Barcelona.

Show/Hide Bio
AlphaFold3 Team

Josh Abramson, AlphaFold3

Senior ML Researcher, Google DeepMind.

Show/Hide Bio
Boltz-1 Team

Jeremy Wohlwend, Boltz-1

PhD Student, Massachusetts Institute of Technology.

Show/Hide Bio

Schedule (Pacific Standard Time)

08:30 Opening Remarks
08:35 Invited Speaker - Noelia Ferruz

Title: TBA

08:40
08:45
08:50
08:55
09:00 Contributed Talk

The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
Yunha Hwang · Andre Cornman · Jacob West-Roberts · Martin Beracochea · Sergey Ovchinnikov · Simon Roux · Antonio Camargo · Milot Mirdita

09:05
09:10
09:15 Invited Speaker - Josh Abramson, AlphaFold3 Team

Biomolecular Structure Prediction with AlphaFold3

09:20
09:25
09:30
09:35
09:40 Break
09:45
09:50
09:55 Invited Speaker - Erika Alden DeBenedictis

Title: TBA

10:00
10:05
10:10
10:15
10:20 Contributed Talk

Controllable All-Atom Generation of Protein Sequence and Structure from Sequence-Only Inputs
Amy Lu · Wilson Yan · Kevin Yang · Vladimir Gligorijevic · Kyunghyun Cho · Richard Bonneau · Pieter Abbeel · Nathan Frey

10:25
10:30
10:35 Contributed Talk

Protein Language Model Fitness is a Matter of Preference
Cade Gordon · Amy Lu · Pieter Abbeel

10:40
10:45
10:50 Invited Speaker - Gabe Rocklin

Title: TBA

10:55
11:00
11:05
11:10
11:15 Word from Sponsors
11:20 Poster Session/Lunch
11:25
11:30
11:35
11:40
11:45
11:50
11:55
12:00
12:05
12:10
12:15
12:20 Invited Speaker - Jennifer Listgarten

Title: TBA

12:25
12:30
12:35
12:40
12:45 Contributed Talk

Generative modeling of protein ensembles guided by crystallographic electron densities
Sai Advaith Maddipatla · Nadav Sellam · Sanketh Vedula · Ailie Marx · Alex Bronstein

12:50
12:55
01:00 Contributed Talk

Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference
Shayan Shekarforoush · David Lindell · Marcus Brubaker · David Fleet

01:05
01:10
01:15 Break
01:20
01:25
01:30 Invited Speaker - Milot Mirdita

Title: TBA

01:35
01:40
01:45
01:50
01:55 Invited Speaker - Jeremy Wohlwend, Boltz-1

Title: Democratizing Biomolecular Structure Prediction with Boltz-1

02:00
02:05
02:10
02:15
02:20 Contributed Talk

FlowPacker: protein side-chain packing with torsional flow matching
Jin Sub Lee · Philip Kim

02:25
02:30
02:35 Contributed Talk

HelixFlow, SE(3)–equivariant Full-atom Design of Peptides With Flow-matching Models
Xuezhi Xie · Pedro A Valiente · Jisun Kim · Jin Sub Lee · Philip Kim

02:40
02:45
02:50 Break
02:55
03:00 Word from PLINDER / PINDER competition
03:05 PLINDER / PINDER competition talks
03:10
03:15 Panel Session
03:20
03:25
03:30
03:35
03:40
03:45
03:50
03:55 Closing Remarks
04:00 Poster Session / Happy Hour
04:05
04:10
04:15
04:20
04:25
04:30
04:35
04:40
04:45
04:50
04:55
05:00

Accepted Papers

  • LatentDE: Latent-based Directed Evolution accelerated by Gradient Ascent for Protein Sequence Design

    Thanh Tran, Nhat Khang Ngo, Duy Nguyen, Truong Son Hy

    [paper]

  • Assessing interaction recovery of predicted protein-ligand poses

    Frederic Dreyer, David Errington, Cedric Bouysset, Constantin Schneider

    [paper]

  • Improving Inverse Folding models at Protein Stability Prediction without additional Training or Data

    Oliver Dutton, Sandro Bottaro, Michele Invernizzi, Istvan Redl, Albert Chung, Falk Hoffmann, Louie Henderson, Stefano Ruschetta, Fabio Airoldi, Benjamin M J Owens, Patrik Foerch, Carlo Fisicaro, Kamil Tamiola

    [paper]

  • Improving Antibody Design with Force-Guided Sampling in Diffusion Models

    Paulina Kulyte, Francisco Vargas, Simon Mathis, Yuguang Wang, Jose Miguel Hernandez-Lobato, Pietro Liò

    [paper]

  • Equivariant Blurring Diffusion for Multiscale Generation of Molecular Conformer

    Jiwoong Park, Yang Shen

    [paper]

  • Active Learning for Affinity Prediction of Antibodies

    Alexandra Gessner, Sebastian Ober, Owen Vickery, Dino Oglic, Talip Ucar

    [paper]

  • IgBlend: Unifying 3D Structure and Sequence for Antibody LLMs

    Cedric Malherbe, Talip Ucar

    [paper]

  • Learning the Language of Protein Structures

    Jeremie DONA, Benoit Gaujac, Timothy Atkinson, Liviu Copoiu, Thomas Pierrot, Thomas Barrett

    [paper]

  • moPPIt: De Novo Generation of Motif-Specific Binders with Protein Language Models

    Tong Chen, Yinuo Zhang, Zachary Quinn, Pranam Chatterjee

    [paper]

  • Improving generalisability of 3D binding affinity models in low data regimes

    Julia Buhmann, Ward Haddadin, Alan Bilsland, Lukáš Pravda, Hagen Triendl

    [paper]

  • Active Learning for Energy-Based Antibody Optimization and Enhanced Screening

    Kairi Furui, Masahito Ohue

    [paper]

  • Conditional Enzyme Generation Using Protein Language Models with Adapters

    Jason Yang, Aadyot Bhatnagar, Jeffrey Ruffolo, Ali Madani

    [paper]

  • Improving Structural Plausibility in 3D Molecule Generation via Property-Conditioned Training with Distorted Molecules

    Lucy Vost, Vijil Chenthamarakshan, Payel Das, Charlotte Deane

    [paper]

  • Understanding Protein-DNA Interactions by Paying Attention to Protein and Genomics Foundation Models

    Dhruva Abhijit Rajwade, Erica Wang, Aryan Satpathy, Alexander Brace, Hongyu Guo, Arvind Ramanathan, Shengchao Liu, Animashree Anandkumar

    [paper]

  • SPECTRE: A Spectral Transformer for Molecule Identification

    Wangdong Xu

    [paper]

  • FlowPacker: protein side-chain packing with torsional flow matching

    Jin Sub Lee, Philip Kim

    [paper]

  • Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction

    Karina Zadorozhny, Kangway Chuang, Bharath Sathappan, Ewan Wallace, Vishnu Sresht, Colin Grambow

    [paper]

  • HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

    Gian Marco Visani, Michael Pun, William Galvin, Eric Daniel, Kevin Borisiak, Utheri Wagura, Armita Nourmohammad

    [paper]

  • Adapting protein language models for structure-conditioned design

    Jeffrey Ruffolo, Aadyot Bhatnagar, Joel Beazer, Stephen Nayfach, Jordan Russ, Emily Hill, Riffat Hussain, Joseph Gallagher, Ali Madani

    [paper]

  • Allo-Allo: Data-efficient prediction of allosteric sites

    Tianze Dong, Christopher Kan, Kapil Devkota, Rohit Singh

    [paper]

  • CryoSPIN: Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference

    Shayan Shekarforoush, David Lindell, Marcus Brubaker, David Fleet

    [paper]

  • GFlowNet Pretraining with Inexpensive Rewards

    Mohit Pandey, Gopeshh Subbaraj, Emmanuel Bengio

    [paper]

  • Benchmarking text-integrated protein language model embeddings and embedding fusion on diverse downstream tasks

    Young Su Ko

    [paper]

  • RNAgrail: graph neural network and diffusion model for RNA 3D structure prediction

    Marek Justyna

    [paper]

  • The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling

    Yunha Hwang, Andre Cornman, Jacob West-Roberts, Martin Beracochea, Sergey Ovchinnikov, Simon Roux, Antonio Camargo, Milot Mirdita

    [paper]

  • Functional Alignment of Protein Language Models via Reinforcement Learning with Experimental Feedback

    Nathaniel Blalock, Srinath Seshadri, Philip Romero, Agrim Babbar, Sarah Fahlberg

    [paper]

  • Antibody Library Design by Seeding Linear Programming with Inverse Folding and Protein Language Models

    Conor Hayes, Andre Goncalves, Steven Magana-Zook, Ahmet Solak, Daniel Faissol, Mikel Landajuela

    [paper]

  • EpiGraph: Recommender-Style Graph Neural Networks for Highly Accurate Prediction of Conformational B-Cell Epitopes

    Jung-Eun Shin, Yen-Lin Chen, Nathan Rollins, Thomas Hopf, Jordan Anderson, Michael Cianci, Daniela Cipolletta, Jyothsna Visweswaraiah, Yi Xing, Colin Lipper, Kevin Otipoby, Nathan Higginson-Scott, Ryan Peckner

    [paper]

  • MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning

    Peter Eckmann, Dongxia Wu, Germano Heinzelmann, Michael Gilson, Rose Yu

    [paper]

  • Higher-Order Message Passing for Glycan Representation Learning

    Roman Joeres, Daniel Bojar

    [paper]

  • LOCAS: Multi-label mRNA Localization with Supervised Contrastive Learning

    Abrar Abir, Md Toki Tahmid, M. Saifur Rahman

    [paper]

  • Does Structural Information Improve ESM3 for Protein Binding Affinity Prediction?

    Thomas Loux, Dianzhuo Wang

    [paper]

  • Unified Sampling and Ranking for Protein Docking with DFMDock

    Lee-Shin Chu, Sudeep Sarma, Jeffrey Gray

    [paper]

  • Expanding Automated Multiconformer Ligand Modeling to Macrocycles and Fragments

    Jessica Flowers

    [paper]

  • Protein Sequence Domain Annotation using Language Models

    Arpan Sarkar, Kumaresh Krishnan, Sean Eddy

    [paper]

  • ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids

    Hannes Stärk, Bowen Jing, Tomas Geffner, Jason Yim, Tommi Jaakkola, Arash Vahdat, Karsten Kreis

    [paper]

  • Rapid protein structure assessment via a forward model for NMR spectra

    Benjamin Harding, Chad Rienstra, Hannah Wayment-Steele, Ziling Hu, Frank Delaglio, Rajat Garg, Katherine Henzler-Wildman, Timothy Grant

    [paper]

  • PropEn: Optimizing Proteins with Implicit Guidance

    Natasa Tagasovska, Vladimir Gligorijevic, Kyunghyun Cho, Andreas Loukas

    [paper]

  • Bayesian Optimisation for Protein Sequence Design: Gaussian Processes with Zero-Shot Protein Language Model Prior Mean

    Carolin Benjamins, Shikha Surana, Oliver Bent, Marius Lindauer, Paul Duckworth

    [paper]

  • Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences

    Niklas Schmidinger, Lisa Schneckenreiter, Philipp Seidl, Johannes Schimunek, Sohvi Luukkonen, Pieter-Jan Hoedt, Johannes Brandstetter, Andreas Mayr, Sepp Hochreiter, Guenter Klambauer

    [paper]

  • Retrieval Augmented Protein Language Models for Protein Structure Prediction

    Peter Lee, Xavier Cheng, Eric Xingyi, Le Song

    [paper]

  • CompassDock: A Comprehensive Accurate Assessment Approach for Deep Learning-Based Molecular Docking in Inference and Fine-Tuning

    Ahmet Sarigun, Vedran Franke, Bora Uyar, Altuna Akalin

    [paper]

  • BoostMD: Accelerating Molecular Sampling using ML Force Field Feature

    Lars Schaaf, Ilyes Batatia, Christoph Brunken, Thomas Barrett, Jules Tilly

    [paper]

  • DockFormer: Efficient Multi-Modal Receptor-Ligand Interaction Prediction using Pair Transformer

    Ben Shor, Dina Schneidman

    [paper]

  • Cryo-EM images are intrinsically low dimensional

    Luke Evans, Octavian-Vlad Murad, Lars Dingeldein, Pilar Cossio, Roberto Covino, Marina Meila

    [paper]

  • Open-source Tools for CryoET Particle Picking Machine Learning Competitions

    Kyle Harrington, Zhuowen Zhao, Jonathan Schwartz, Saugat Kandel, Utz Ermel, Mohammadreza Paraan, Clinton Potter, Bridget Carragher

    [paper]

  • Protein Language Model Fitness is a Matter of Preference

    Cade Gordon, Amy Lu, Pieter Abbeel

    [paper]

  • AptaBLE: An Enhanced Deep Learning Platform for Aptamer Protein Interaction Prediction and Design

    Sawan Patel, Sherwood Yao, Keith Fraser, Adam Friedman, Zhangzhi Peng, Pranam Chatterjee, Owen Yao

    [paper]

  • Balancing Locality and Reconstruction in Protein Structure Tokenizer

    Jiayou Zhang, Barthélémy Meynard, Jing Gong, Xavier Cheng, Eric Xing, Le Song

    [paper]

  • What has AlphaFold3 learned about antibody and nanobody docking, and what remains unsolved?

    Fatima Hitawala, Jeffrey Gray

    [paper]

  • HelixFlow, SE(3)–equivariant Full-atom Design of Peptides With Flow-matching Models

    Xuezhi Xie, Pedro A Valiente, Jisun Kim, Jin Sub Lee, Philip Kim

    [paper]

  • MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning

    Andrei Manolache, Dragos-Constantin Tantaru, Mathias Niepert

    [paper]

  • Integrating Macromolecular X-ray Diffraction Data with Variational Inference

    luis aldama, Kevin Dalton, Doeke Hekstra

    [paper]

  • Fine-Tuning Discrete Diffusion Models via Reward Optimization: Applications to DNA and Protein Design

    Chenyu Wang, Masatoshi Uehara, Yichun He, Amy Wang, Tommaso Biancalani, Avantika Lal, Tommi Jaakkola, Sergey Levine, Hanchen Wang, Aviv Regev

    [paper]

  • Low-N OpenFold fine-tuning improves peptide design without additional structures

    Theo Sternlieb, Jakub Otwinowski, Sam Sinai, Jeffrey Chan

    [paper]

  • SPRINT Enables Interpretable and Ultra-Fast Virtual Screening against Thousands of Proteomes

    Andrew McNutt, Abhinav Adduri, Caleb Ellington, Monica Dayao, Eric Xing, Hosein Mohimani, David Koes

    [paper]

  • Ranking protein-peptide binding affinities with protein language models

    Charles Chalas, Michael Dunne, Michael Dunne, charles chalas

    [paper]

  • Generating and scoring stable proteins using joint structure and sequence modeling

    Yehlin Cho, Justas Dauparas, Kotaro Tsuboyama, Gabriel Rocklin, Sergey Ovchinnikov

    [paper]

  • FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Adjusted Rate Masking

    Sophia Vincoff, Shrey Goel, Kseniia Kholina, Rishab Pulugurta, Pranay Vure, Pranam Chatterjee

    [paper]

  • Systems-Structure-Based Drug Design

    Vincent Zaballa, Elliot Hui

    [paper]

  • Learning the language of protein-protein interactions with ESM-Multimer

    Varun Ullanat, Bowen Jing, Samuel Sledzieski, Dr. Bonnie Berger

    [paper]

  • Guided Multi-objective Generative AI for Structure-based Drug Design

    Amit Kadan, Kevin Ryczko, Erika Lloyd, Adrian Roitberg, Takeshi Yamazaki

    [paper]

  • Tradeoffs of alignment-based and protein language models for predicting viral mutation effects

    Noor Youssef, Sarah Gurev, Navami Jain, Debora Marks

    [paper]

  • IgFlow: Flow Matching for De Novo Antibody Design

    Sanjay Nagaraj, Amir Shanehsazzadeh, Hyun Park, Jonathan King, Simon Levine

    [paper]

  • Generating and evaluating diverse sequences for protein backbones

    Yo Akiyama, Sergey Ovchinnikov

    [paper]

  • SuperMetal: A Generative AI Framework for Rapid and Precise Metal Ion Location Prediction in Proteins

    Xiaobo Lin, Zhaoqian Su, Yunchao Liu, Jingxian Liu, Xiaohan Kuang, Peter Cummings, Jesse Spencer-Smith, Jens Meiler

    [paper]

  • Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design

    Natasa Tagasovska, Ji Won Park, Matthieu Kirchmeyer, Nathan Frey, Andrew Watkins, Aya Ismail, Arian Jamasb, Edith Lee, Tyler Bryson, Stephen Ra, Kyunghyun Cho

    [paper]

  • Controllable All-Atom Generation of Protein Sequence and Structure from Sequence-Only Inputs

    Amy Lu, Wilson Yan, Kevin Yang, Vladimir Gligorijevic, Kyunghyun Cho, Richard Bonneau, Pieter Abbeel, Nathan Frey

    [paper]

  • Loop-Diffusion: an equivariant diffusion model for designing and scoring protein loops

    Kevin Borisiak, Gian Marco Visani, Armita Nourmohammad

    [paper]

  • ProteinZen: combining latent and SE(3) flow matching for all-atom protein generation

    Alex Li, Tanja Kortemme

    [paper]

  • TomoPicker: Annotation-Efficient Particle Picking in Cellular cryo-electron Tomograms

    Mostofa Rafid Uddin, Ajmain Yasar Ahmed, Md Toki Tahmid, Md Zarif Ul Alam, Min Xu

    [paper]

  • Exploring Discrete Flow Matching for 3D De Novo Molecule Generation

    Ian Dunn, David Koes

    [paper]

  • SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints

    Miruna Cretu, Charles Harris, Ilia Igashov, Arne Schneuing, Marwin Segler, Bruno Correia, Julien Roy, Emmanuel Bengio, Pietro Lió

    [paper]

  • RNA-DCGen: Dual Constrained RNA Sequence Generation with LLM-Attack

    Haz Sameen Shahgir, Md. Rownok Zahan Ratul, Md Toki Tahmid, Khondker Salman Sayeed, Atif Rahman

    [paper]

  • RNA-GPT: Multimodal Generative System for RNA Sequence Understanding

    YIJIA XIAO, Edward Sun, Yiqiao Jin, Wei Wang

    [paper]

  • Capturing Protein Dynamics: Encoding Temporal and Spatial Dynamics from Molecular Dynamics Simulations

    Vignesh Bhethanabotla, Amin Tavakoli, Animashree Anandkumar, William Goddard

    [paper]

  • ProPicker: Promptable Segmentation for Particle Picking in Cryogenic Electron Tomography

    Simon Wiedemann, Zalan Fabian, Mahdi Soltanolkotabi, Reinhard Heckel

    [paper]

  • Estimating protein flexibility via uncertainty quantification of structure prediction models

    Charlotte Sweeney, Nele Quast, Fabian Spoendlin, Yee Whye Teh

    [paper]

  • Generative modeling of protein ensembles guided by crystallographic electron densities

    Sai Advaith Maddipatla, Nadav Bojan Sellam, Sanketh Vedula, Ailie Marx, Alex Bronstein

    [paper]

  • Energy-Based Flow Matching for Molecular Docking

    Wenyin Zhou, Christopher Sprague, Hossein Azizpour )

    [paper]

  • Controlling multi-state conformational equilibria of dynamic proteins with Frame2seq

    Deniz Akpinaroglu, Dominic Grisingher, Stephanie Crilly, Tanja Kortemme

    [paper]

Challenge 2024

This year we are running a challenge on the Pinder and Plinder datasets to evaluate how well the community is currently doing for protein-protein interaction prediction and protein-ligand complex prediction.

Details of the challenge

To submit your trained model you will need to make an inference docker image on HuggingFace Spaces using the following templates:

Rules for model training

  • Participants MUST use the sequences and SMILES in the provided train and validation sets from PINDER or PLINDER. In order to ensure no leakage, external data augmentation is not allowed.
  • If starting structures/conformations need to be generated for the model, then this can only be done from the training and validation sequences and SMILES. Note that this is only the case for train & validation - no external folding methods or starting structures are allowed for the test set under any circumstance!. Only the predicted structures/conformers themselves may be used in this way, the embeddings or models used to generate such predictions may not. E.g. it is not valid to “distill” a method that was not trained on PLINDER/PINDER
  • The PINDER and PLINDER datasets should be used independently; combining the sets is considered augmentation and is not allowed.
  • For inference, only the inputs provided in the evaluation sets may be used: canonical sequences, structures and MSAs; no alternate templates or sequences are permitted. The inputs that will be used by assessors for each challenge track is as follows:
    • PLINDER: (SMILES, monomer protein structure, monomer FASTA, monomer MSA)
    • PINDER: (monomer protein structure 1, monomer protein structure 2, FASTA 1, FASTA 2, MSA 1, MSA 2)
  • Model selection must be performed exclusively on the validation set designed for this purpose within the PINDER and PLINDER datasets.
  • Methods relying on any model derivatives or embeddings trained on structures outside the PINDER/PLINDER training set are not permitted (e.g., ESM2, MSA: allowed; ESM3/ESMFold/SAProt/UniMol: not allowed).

Please find the technical documentation for how to use the datasets for the challenge:

Rules for valid inference pipeline

Submission system will use Hugging Face Spaces. To qualify for submission, each team must:

  • Provide an MLSB submission ID or a link to a preprint/paper describing their methodology. This publication does not have to specifically report training or evaluation on the P(L)INDER dataset. Previously published methods, such as DiffDock, only need to link their existing paper. Note that entry into this competition does not equate to an MLSB workshop paper submission.
  • Create a copy of the provided inference template. Go to the top right corner of the page and click on the drop-down menu (vertical ellipsis) right next to the “Community”, then select “Duplicate this space”.
  • Change files in the newly create space to reflect the peculiarities of your model
    • Edit requirements.txt to capture all dependencies.
    • Include a inference_app.py file. This contains a predict function that should be modified to reflect the specifics of inference using their model.
    • Include a train.py file to ensure that training and model selection use only the PINDER/PLINDER datasets and to clearly show any additional hyperparameters used.
    • Provide a LICENSE file that allows for reuse, derivative works, and distribution of the provided software and weights (e.g., MIT or Apache2 license).
    • Modify the Dockerfile as appropriate (including selecting the right base image)
  • Submit to the leaderboard via the designated form.
    • On submission page, add reference to the newly created space in the format username/space (e.g mlsb/alphafold3)
  • How to submit and view results

    Metrics

    Primary Ranking Metric:
    • PLINDER: lDDT-PLI
    • PINDER: DockQ

    Other metrics computed by PINDER/PLINDER will be displayed on the leaderboard but will not influence the ranking.

    The winners will be invited to present their work at the MLSB workshop.

    Evaluation Datasets

    Although the exact composition of the eval set will be shared at a future date, below we provide an overview of the dataset and what to expect

    • Two leaderboards, one for each of PINDER and PLINDER, will be created using a single evaluation set for each.
    • Evaluation sets will be subsets of 150-200 structures from the current PINDER and PLINDER test splits (subsets to enable reasonable eval runtime).
    • Each evaluation sample will contain a predefined input/output to ensure performance assessment is model-dependent, not input-dependent.
    • The focus will be exclusively on flexible docking/co-folding, with a single canonical structure per protein, sampled from apo and predicted structures.
    • Monomer input structures will be sampled from paired structures available in PINDER/PLINDER, balanced between apo and predicted structures and stratified by "flexibility" level according to specified conformational difference thresholds.

    Key Dates

    Training workshop September 24th, 2024, virtual (Register here)

    Leaderboard Opens: October 9th, 2024 (following acceptance notifications for MLSB).

    Leaderboard Closes: November 9th, 2024 November 18th, 2024

    Winner Notification: Wednesday, November 27th, 2024

    Questions?

    If you have trouble we invite you to join the PINDER/PLINDER discord server

    HuggingFace

Organizers

Photo of Gabriele Corso

Gabriele Corso
MIT

Photo of Gina El Nesr

Gina El Nesr
Stanford University

Photo of Vignesh Ram Somnath

Vignesh Ram Somnath
ETH Zurich

Photo of Ellen Zhong

Zeming Lin
EvolutionaryScale

Photo of Simon Duerr

Simon Duerr
EPFL

Photo of Hannah Wayment-Steele

Hannah Wayment-Steele
University of Wisconsin–Madison

Photo of Sergey Ovchinnikov

Sergey Ovchinnikov
MIT