Structural biology, the study of the 3D structure or shape of proteins and other biomolecules, has been transformed by breakthroughs from machine learning algorithms. Machine learning models are now routinely used by experimentalists to predict structures to aid in hypothesis generation and experimental design, accelerate the experimental process of structure determination (e.g. computer vision algorithms for cryo-electron microscopy), and have become a new industry standard for bioengineering new protein therapeutics (e.g. large language models for protein design). Despite all of this progress, there are still many active and open challenges for the field, such as modeling protein dynamics, predicting the structure of other classes of biomolecules such as RNA, learning and generalizing the underlying physics driving protein folding, and relating the structure of isolated proteins to the in vivo and contextual nature of their underlying function. These challenges are diverse and interdisciplinary, motivating new kinds of machine learning methods and requiring the development and maturation of standard benchmarks and datasets.
Machine Learning in Structural Biology (MLSB), seeks to bring together field experts, practitioners, and students from across academia, industry research groups, and pharmaceutical companies to focus on these new challenges and opportunities. This year, MLSB aims to bridge the theoretical and practical by addressing the outstanding computational and experimental problems at the forefront of our field. The intersection of artificial intelligence and structural biology promises to unlock new scientific discoveries and develop powerful design tools.
MLSB will be an in-person NeurIPS workshop on 15th December 2024 in MTG Rooms 11 & 12 at the Vancouver Convention Center.
Please contact the organizers at workshopmlsb@gmail.com with any questions.
Stay updated on changes and workshop news by joining our mailing list.
Congratulations to all accepted presenters! Please find some information on deadlines and expectations leading up to the MLSB Workshop!
We ask all authors to prepare a poster that can be presented as part of our workshop. Posters must be 24W x 36H inches and will be taped to the wall. Poster boards will not be provided at the workshop. We specifically ask for portrait layout because we will be tight on wall space.
Additionally, a virtual copy of each poster must be uploaded to the NeurIPS poster upload portal, by Thursday, December 12. Posters must be PNG with no more than 5120 width x 2880 height (no more than 10 MB). Thumbnail images should be 320 width x 256 height PNG and no more than 5 MB. We know these are different dimensions than what we're asking for in-person posters, the poster upload dimensions are set by NeurIPS.
Users should log in using the neurips.cc account associated with their CMT email address. If they did not already have a neurips.cc account, then it should have automatically been created and can be accessed by resetting the password.
De-anonymized, camera-ready versions of the workshop paper will be due on Microsoft CMT by Monday, Dec 2. Papers must indicate that they are NeurIPS MLSB workshop papers by using the modified NeurIPS style file here. Papers should be compiled with the `final` argument, e.g. \usepackage[final]{neurips_mlsb_2024}
We plan to make all camera-ready submitted papers available on the workshop website (https://www.mlsb.io/). If you would prefer that your work not be shared, then there is no need to submit a camera-ready version..
This year we will try to cover as many workshop registrations as possible for student/academic attendees with oral presentations or posters who need financial assistance.
If you would like to be considered, please fill out the following form by Friday, Nov 15 Friday, Nov 8.
If you have any questions, please don't hesitate to contact us at workshopmlsb@gmail.com.
Application for Registration Reimbursement: Friday, November 15th, 2024 November 8th, 2024, at 11:59PM, Anywhere on Earth.
Camera-Ready PDF due on Microsoft CMT: Monday, December 2nd, 2024.
Poster due: Thursday, December 12th, 2024.
We welcome submissions of short papers leveraging machine learning to address problems in structural biology, including but not limited to:
We request anonymized PDF submissions by Friday, September 20, 2024, at 11:59PM, AoE (anywhere on earth) through our submission website on CMT.
Papers should present novel work that has not been previously accepted at an archival venue at the time of submission. Submissions should be a maximum of 5 pages (excluding references and appendices) in PDF format, using the NeurIPS style files, and fully anonymized as per the requirements of NeurIPS. The NeurIPS checklist can be omitted from the submission. Submissions meeting these criteria will go through a light, double-blind review process. Reviewer comments will be returned to the authors as feedback.
Accepted papers will be invited to present a poster at the workshop, with nominations of spotlight talks at the discretion of the organizers.
New this year, we will have two special tracks for models for predicting protein-protein and protein-ligand interactions, evaluated on two new large-scale benchmarks, PINDER and PLINDER. The highest-performing open-source methods from these two tracks will receive invitations to a spotlight presentation. Stay tuned for more information on how to submit to these tracks.
Like last year, authors that commit to open-sourcing code, model weights, and datasets used in the work will be given precedence for spotlight talks. This change only affects consideration for spotlights. Submissions that cannot make this commitment will still be considered for posters and will not be penalized for acceptance.
This workshop is considered non-archival, however, authors of accepted contributions will have the option to make their work available through the workshop website. Presentation of work that is concurrently in submission is welcome. We welcome papers sharing encouraging work-in-progress results or forward-looking position papers that would benefit from feedback and community discussion at our workshop.
Submission Deadline: Friday, September 20th, 2024, at 11:59PM, Anywhere on Earth.
Notification of Acceptance: Wednesday, October 9th, 2024.
Workshop Date: December 15th 2024, Vancouver, Canada.
Assistant Professor, Department of Pharmacology, Northwestern University.
Show/Hide Bio08:30 | Opening Remarks | |||
---|---|---|---|---|
08:35 | Invited Speaker - Noelia Ferruz Title: TBA |
|||
08:40 | ||||
08:45 | ||||
08:50 | ||||
08:55 | ||||
09:00 | Contributed Talk
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling |
|||
09:05 | ||||
09:10 | ||||
09:15 | Invited Speaker - Josh Abramson, AlphaFold3 Team Biomolecular Structure Prediction with AlphaFold3 |
|||
09:20 | ||||
09:25 | ||||
09:30 | ||||
09:35 | ||||
09:40 | Break | |||
09:45 | ||||
09:50 | ||||
09:55 | Invited Speaker - Erika Alden DeBenedictis
Title: TBA |
|||
10:00 | ||||
10:05 | ||||
10:10 | ||||
10:15 | ||||
10:20 | Contributed Talk
Controllable All-Atom Generation of Protein Sequence and Structure from Sequence-Only Inputs |
|||
10:25 | ||||
10:30 | ||||
10:35 | Contributed Talk
Protein Language Model Fitness is a Matter of Preference |
|||
10:40 | ||||
10:45 | ||||
10:50 | Invited Speaker - Gabe Rocklin
Title: TBA |
|||
10:55 | ||||
11:00 | ||||
11:05 | ||||
11:10 | ||||
11:15 | Word from Sponsors | |||
11:20 | Poster Session/Lunch | |||
11:25 | ||||
11:30 | ||||
11:35 | ||||
11:40 | ||||
11:45 | ||||
11:50 | ||||
11:55 | ||||
12:00 | ||||
12:05 | ||||
12:10 | ||||
12:15 | ||||
12:20 | Invited Speaker - Jennifer Listgarten
Title: TBA |
|||
12:25 | ||||
12:30 | ||||
12:35 | ||||
12:40 | ||||
12:45 | Contributed Talk
Generative modeling of protein ensembles guided by crystallographic electron densities |
|||
12:50 | ||||
12:55 |
01:00 | Contributed Talk
Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference |
|||
---|---|---|---|---|
01:05 | ||||
01:10 | ||||
01:15 | Break | |||
01:20 | ||||
01:25 | ||||
01:30 | Invited Speaker - Milot Mirdita
Title: TBA |
|||
01:35 | ||||
01:40 | ||||
01:45 | ||||
01:50 | ||||
01:55 | Invited Speaker - Jeremy Wohlwend, Boltz-1
Title: Democratizing Biomolecular Structure Prediction with Boltz-1 |
|||
02:00 | ||||
02:05 | ||||
02:10 | ||||
02:15 | ||||
02:20 | Contributed Talk
FlowPacker: protein side-chain packing with torsional flow matching |
|||
02:25 | ||||
02:30 | ||||
02:35 | Contributed Talk
HelixFlow, SE(3)–equivariant Full-atom Design of Peptides With Flow-matching Models |
|||
02:40 | ||||
02:45 | ||||
02:50 | Break | |||
02:55 | ||||
03:00 | Word from PLINDER / PINDER competition | |||
03:05 | PLINDER / PINDER competition talks | |||
03:10 | ||||
03:15 | Panel Session | |||
03:20 | ||||
03:25 | ||||
03:30 | ||||
03:35 | ||||
03:40 | ||||
03:45 | ||||
03:50 | ||||
03:55 | Closing Remarks | |||
04:00 | Poster Session / Happy Hour | |||
04:05 | ||||
04:10 | ||||
04:15 | ||||
04:20 | ||||
04:25 | ||||
04:30 | ||||
04:35 | ||||
04:40 | ||||
04:45 | ||||
04:50 | ||||
04:55 | ||||
05:00 |
LatentDE: Latent-based Directed Evolution accelerated by Gradient Ascent for Protein Sequence Design
Assessing interaction recovery of predicted protein-ligand poses
Improving Inverse Folding models at Protein Stability Prediction without additional Training or Data
Improving Antibody Design with Force-Guided Sampling in Diffusion Models
Equivariant Blurring Diffusion for Multiscale Generation of Molecular Conformer
Active Learning for Affinity Prediction of Antibodies
IgBlend: Unifying 3D Structure and Sequence for Antibody LLMs
Learning the Language of Protein Structures
moPPIt: De Novo Generation of Motif-Specific Binders with Protein Language Models
Improving generalisability of 3D binding affinity models in low data regimes
Active Learning for Energy-Based Antibody Optimization and Enhanced Screening
Conditional Enzyme Generation Using Protein Language Models with Adapters
Improving Structural Plausibility in 3D Molecule Generation via Property-Conditioned Training with Distorted Molecules
Understanding Protein-DNA Interactions by Paying Attention to Protein and Genomics Foundation Models
SPECTRE: A Spectral Transformer for Molecule Identification
FlowPacker: protein side-chain packing with torsional flow matching
Similarity-Quantized Relative Difference Learning for Improved Molecular Activity Prediction
HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction
Adapting protein language models for structure-conditioned design
Allo-Allo: Data-efficient prediction of allosteric sites
CryoSPIN: Improving Ab-Initio Cryo-EM Reconstruction with Semi-Amortized Pose Inference
GFlowNet Pretraining with Inexpensive Rewards
Benchmarking text-integrated protein language model embeddings and embedding fusion on diverse downstream tasks
RNAgrail: graph neural network and diffusion model for RNA 3D structure prediction
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
Functional Alignment of Protein Language Models via Reinforcement Learning with Experimental Feedback
Antibody Library Design by Seeding Linear Programming with Inverse Folding and Protein Language Models
EpiGraph: Recommender-Style Graph Neural Networks for Highly Accurate Prediction of Conformational B-Cell Epitopes
MF-LAL: Drug Compound Generation Using Multi-Fidelity Latent Space Active Learning
Higher-Order Message Passing for Glycan Representation Learning
LOCAS: Multi-label mRNA Localization with Supervised Contrastive Learning
Does Structural Information Improve ESM3 for Protein Binding Affinity Prediction?
Unified Sampling and Ranking for Protein Docking with DFMDock
Expanding Automated Multiconformer Ligand Modeling to Macrocycles and Fragments
Protein Sequence Domain Annotation using Language Models
ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
Rapid protein structure assessment via a forward model for NMR spectra
PropEn: Optimizing Proteins with Implicit Guidance
Bayesian Optimisation for Protein Sequence Design: Gaussian Processes with Zero-Shot Protein Language Model Prior Mean
Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences
Retrieval Augmented Protein Language Models for Protein Structure Prediction
CompassDock: A Comprehensive Accurate Assessment Approach for Deep Learning-Based Molecular Docking in Inference and Fine-Tuning
BoostMD: Accelerating Molecular Sampling using ML Force Field Feature
DockFormer: Efficient Multi-Modal Receptor-Ligand Interaction Prediction using Pair Transformer
Cryo-EM images are intrinsically low dimensional
Open-source Tools for CryoET Particle Picking Machine Learning Competitions
Protein Language Model Fitness is a Matter of Preference
AptaBLE: An Enhanced Deep Learning Platform for Aptamer Protein Interaction Prediction and Design
Balancing Locality and Reconstruction in Protein Structure Tokenizer
What has AlphaFold3 learned about antibody and nanobody docking, and what remains unsolved?
HelixFlow, SE(3)–equivariant Full-atom Design of Peptides With Flow-matching Models
MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning
Integrating Macromolecular X-ray Diffraction Data with Variational Inference
Fine-Tuning Discrete Diffusion Models via Reward Optimization: Applications to DNA and Protein Design
Low-N OpenFold fine-tuning improves peptide design without additional structures
SPRINT Enables Interpretable and Ultra-Fast Virtual Screening against Thousands of Proteomes
Ranking protein-peptide binding affinities with protein language models
Generating and scoring stable proteins using joint structure and sequence modeling
FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Adjusted Rate Masking
Systems-Structure-Based Drug Design
Learning the language of protein-protein interactions with ESM-Multimer
Guided Multi-objective Generative AI for Structure-based Drug Design
Tradeoffs of alignment-based and protein language models for predicting viral mutation effects
IgFlow: Flow Matching for De Novo Antibody Design
Generating and evaluating diverse sequences for protein backbones
SuperMetal: A Generative AI Framework for Rapid and Precise Metal Ion Location Prediction in Proteins
Antibody DomainBed: Out-of-Distribution Generalization in Therapeutic Protein Design
Controllable All-Atom Generation of Protein Sequence and Structure from Sequence-Only Inputs
Loop-Diffusion: an equivariant diffusion model for designing and scoring protein loops
ProteinZen: combining latent and SE(3) flow matching for all-atom protein generation
TomoPicker: Annotation-Efficient Particle Picking in Cellular cryo-electron Tomograms
Exploring Discrete Flow Matching for 3D De Novo Molecule Generation
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
RNA-DCGen: Dual Constrained RNA Sequence Generation with LLM-Attack
RNA-GPT: Multimodal Generative System for RNA Sequence Understanding
Capturing Protein Dynamics: Encoding Temporal and Spatial Dynamics from Molecular Dynamics Simulations
ProPicker: Promptable Segmentation for Particle Picking in Cryogenic Electron Tomography
Estimating protein flexibility via uncertainty quantification of structure prediction models
Generative modeling of protein ensembles guided by crystallographic electron densities
Energy-Based Flow Matching for Molecular Docking
Controlling multi-state conformational equilibria of dynamic proteins with Frame2seq
This year we are running a challenge on the Pinder and Plinder datasets to evaluate how well the community is currently doing for protein-protein interaction prediction and protein-ligand complex prediction.
To submit your trained model you will need to make an inference docker image on HuggingFace Spaces using the following templates:
(SMILES, monomer protein structure, monomer FASTA, monomer MSA)
(monomer protein structure 1, monomer protein structure 2, FASTA 1, FASTA 2, MSA 1, MSA 2)
Please find the technical documentation for how to use the datasets for the challenge:
Submission system will use Hugging Face Spaces. To qualify for submission, each team must:
requirements.txt
to capture all dependencies.inference_app.py
file. This contains a predict
function that should be modified to reflect the specifics of inference using their model.train.py
file to ensure that training and model selection use only the PINDER/PLINDER datasets and to clearly show any additional hyperparameters used.Other metrics computed by PINDER/PLINDER will be displayed on the leaderboard but will not influence the ranking.
The winners will be invited to present their work at the MLSB workshop.
Although the exact composition of the eval set will be shared at a future date, below we provide an overview of the dataset and what to expect
Training workshop September 24th, 2024, virtual (Register here)
Leaderboard Opens: October 9th, 2024 (following acceptance notifications for MLSB).
Leaderboard Closes: November 9th, 2024 November 18th, 2024
Winner Notification: Wednesday, November 27th, 2024
If you have trouble we invite you to join the PINDER/PLINDER discord server