Structural biology, the study of the 3D structure or shape of proteins and other biomolecules, has been transformed by breakthroughs from machine learning algorithms. Machine learning models are now routinely used by experimentalists to predict structures to aid in hypothesis generation and experimental design, accelerate the experimental process of structure determination (e.g. computer vision algorithms for cryo-electron microscopy), and have become a new industry standard for bioengineering new protein therapeutics (e.g. large language models for protein design). Despite all of this progress, there are still many active and open challenges for the field, such as modeling protein dynamics, predicting the structure of other classes of biomolecules such as RNA, learning and generalizing the underlying physics driving protein folding, and relating the structure of isolated proteins to the in vivo and contextual nature of their underlying function. These challenges are diverse and interdisciplinary, motivating new kinds of machine learning methods and requiring the development and maturation of standard benchmarks and datasets.
Machine Learning in Structural Biology (MLSB), seeks to bring together field experts, practitioners, and students from across academia, industry research groups, and pharmaceutical companies to focus on these new challenges and opportunities. This year, MLSB aims to bridge the theoretical and practical by addressing the outstanding computational and experimental problems at the forefront of our field. The intersection of artificial intelligence and structural biology promises to unlock new scientific discoveries and develop powerful design tools.
MLSB will be an in-person workshop on December 15th at NeurIPS.
Please contact the organizers at workshopmlsb@gmail.com with any questions.
Stay updated on changes and workshop news by joining our mailing list.
Congratulations to all accepted presenters! Please find some information on deadlines and expectations leading up to the MLSB Workshop!
We expect all authors to prepare a poster that can be presented as part of our workshop. Posters must be 24W x 36H inches and will be taped to the wall. Poster boards will not be provided at the workshop. Posters should be on lightweight paper, and not laminated.
Additionally, a virtual copy of each poster must be uploaded to the NeurIPS poster upload portal by Thursday, December 14. Posters must be PNG with no more than 5120 width x 2880 height (no more than 10 MB). Thumbnail images should be 320 width x 256 height PNG and no more than 5 MB. Users should log in using the neurips.cc account associated with their CMT email address. If they did not already have a neurips.cc account, then it should have automatically been created and can be accessed by resetting the password.
De-anonymized, camera-ready versions of the workshop paper will be due on Microsoft CMT by Monday, Dec 4. Papers must indicate that they are NeurIPS MLSB workshop papers by using the modified NeurIPS style file here. Papers should be compiled with the 'final' argument, e.g. \usepackage[final]{neurips_mlsb_2023}
We plan to make all submitted papers available on the workshop website (https://www.mlsb.io/). If you would prefer that your work not be shared, please email the organizers by responding to this email as soon as possible. Additionally, please let us know if there is an arXiv/biorXiv link for the paper that should be linked as well.
This year we will try to cover as many workshop registrations as possible for student/academic attendees with oral presentations or posters who need financial assistance. If you would like to be considered, please fill out the following form by Friday, Nov 17th. If you have any questions, please don't hesitate to contact us at workshopmlsb@gmail.com.
Application for Registration Reimbursement: Friday, November 17th, 2023, at 11:59PM, Anywhere on Earth.
Camera-Ready PDF due on Microsoft CMT: Monday, December 4th, 2023.
Poster due: Thursday, December 14th, 2023.
Founding Technical Director of the Chan-Zuckerberg Imaging Institute.
Show/Hide BioAssociate Professor at NYU
Senior Director of Frontier Research at Prescient Design.
HHMI Investigator, Associate Professor of Biochemistry at Stanford University.
Show/Hide BioAssociate Professor of Genetics and
Bioengineering at Stanford University.
Professor of Bioengineering at University of California, San Francisco.
Show/Hide BioCo-Founder and CTO of Generate Biomedicines.
Associate Professor at Dartmouth College.
08:30 | Opening Remarks | |||
---|---|---|---|---|
08:35 | Invited Speaker - Kyunghyun Cho Health system scale language models for clinical and operational decision making |
|||
08:40 | ||||
08:45 | ||||
08:50 | ||||
08:55 | ||||
09:00 | Contributed Talk
Validation of de novo designed water-soluble and membrane proteins by in silico folding and melting |
|||
09:05 | ||||
09:10 | ||||
09:15 | Invited Speaker - Tanja Kortemme Accurate and tunable de novo protein shapes for new functions |
|||
09:20 | ||||
09:25 | ||||
09:30 | ||||
09:35 | ||||
09:40 | Break | |||
09:45 | ||||
09:50 | ||||
09:55 | ||||
10:00 | Invited Speaker - Bridget Carragher A CryoET Data Portal to Foster a Collaboration between the Machine Learning and CryoET Communities |
|||
10:05 | ||||
10:10 | ||||
10:15 | ||||
10:20 | ||||
10:25 | Contributed Talk
AlphaFold Meets Flow Matching for Generating Protein Ensembles |
|||
10:30 | ||||
10:35 | ||||
10:40 | Contributed Talk
DSMBind: an unsupervised generative modeling framework for binding energy prediction |
|||
10:45 | ||||
10:50 | ||||
10:55 | Invited Speaker - Polly Fordyce Leveraging microfluidics for high-throughput and quantitative biochemistry and biophysics |
|||
11:00 | ||||
11:05 | ||||
11:10 | ||||
11:15 | ||||
11:20 | Poster Session/Lunch | |||
11:25 | ||||
11:30 | ||||
11:35 | ||||
11:40 | ||||
11:45 | ||||
11:50 | ||||
11:55 | ||||
12:00 | ||||
12:05 | ||||
12:10 | ||||
12:15 | ||||
12:20 | ||||
12:25 | ||||
12:30 | ||||
12:35 | ||||
12:40 | Invited Speaker - Gevorg Grigoryan Illuminating protein space with a programmable generative model |
|||
12:45 | ||||
12:50 | ||||
12:55 | ||||
01:00 |
01:05 | Contributed Talk
Protein generation with evolutionary diffusion: sequence is all you need |
|||
---|---|---|---|---|
01:10 | ||||
01:15 | ||||
01:20 | Invited Speaker - Jason Yim / Brian Trippe De novo design of protein structure and function with RFdiffusion |
|||
01:25 | ||||
01:30 | ||||
01:35 | ||||
01:40 | ||||
01:45 | Break | |||
01:50 | ||||
01:55 | ||||
02:00 | Contributed Talk
DiffDock-Pocket: Diffusion for Pocket-Level Docking with Sidechain Flexibility |
|||
02:05 | ||||
02:10 | ||||
02:15 | Contributed Talk
PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses |
|||
02:20 | ||||
02:25 | ||||
02:30 | Invited Speaker - Rhiju Das World-wide competitions and the RNA folding problem |
|||
02:35 | ||||
02:40 | ||||
02:45 | ||||
02:50 | ||||
02:55 | Break | |||
03:00 | Panel Discussion | |||
03:05 | ||||
03:10 | ||||
03:15 | ||||
03:20 | ||||
03:25 | ||||
03:30 | ||||
03:35 | ||||
03:40 | ||||
03:45 | Poster Session / Happy Hour | |||
03:50 | ||||
03:55 | ||||
04:00 | ||||
04:05 | ||||
04:10 | ||||
04:15 | ||||
04:20 | ||||
04:25 | ||||
04:30 | ||||
04:35 | ||||
04:40 | ||||
04:45 | ||||
04:50 | ||||
04:55 | ||||
05:00 | Closing Remarks |
ESMFold Hallucinates Native-Like Protein Sequences
Conditioned Protein Structure Prediction
Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design
Guiding diffusion models for antibody sequence and structure co-design with developability properties
AlphaFold Distillation for Protein Design
Binding Oracle: Fine-Tuning From Stability to Binding Free Energy
Scalable Multimer Structure Prediction using Diffusion Models
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction
Molecular Diffusion Models with Virtual Receptors
CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM.
Learning Scalar Fields for Molecular Docking with Fast Fourier Transforms
VN-EGNN: Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification
Enhancing Ligand Pose Sampling for Machine Learning–Based Docking
Improved encoding of ensembles in PDBx/mmCIF
AlphaFold Meets Flow Matching for Generating Protein Ensembles
The Discovery of Binding Modes Requires Rethinking Docking Generalization
Conformational sampling and interpolation using language-based protein folding neural networks
FLIGHTED: Inferring Fitness Landscapes from Noisy High-Throughput Experimental Data
Contrasting Sequence with Structure: Pre-training Graph Representations with PLMs
Target-Aware Variational Auto-Encoders for Ligand Generation with Multi-Modal Protein Modeling
SE(3) denoising score matching for unsupervised binding energy prediction and nanobody design
Fast non-autoregressive inverse folding with discrete diffusion
TopoDiff: Improving Protein Backbone Generation with Topology-aware Latent Encoding
Harmonic Prior Self-conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design
CrysFormer: Protein Crystallography Prediction via 3d Patterson Maps and Partial Structure Attention
PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses
Sampling Protein Language Models for Functional Protein Design
A framework for conditional diffusion modelling with applications in protein design
DiffRNAFold: Generating RNA Tertiary Structures with Latent Space Diffusion
Pair-EGRET: Enhancing the prediction of protein-protein interaction sites through graph attention networks and protein language models
FlexiDock: Compositional diffusion models for flexible molecular docking
In vitro validated antibody design against multiple therapeutic antigens using generative inverse folding
Evaluating Zero-Shot Scoring for In Vitro Antibody Binding Prediction with Experimental Validation
PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Optimizing protein language models with Sentence Transformers
DiffDock-Pocket: Diffusion for Pocket-Level Docking with Sidechain Flexibility
Transition Path Sampling with Boltzmann Generator-based MCMC Moves
Pre-training Sequence, Structure, and Surface Features for Comprehensive Protein Representation Learning
Inpainting Protein Sequence and Structure with ProtFill
Investigating Protein-DNA Binding Energetic of Mismatched DNA
AntiFold: Improved antibody structure design using inverse folding
Improved B-cell epitope prediction using AlphaFold2 modeling and inverse folding latent representations
Combining Structure and Sequence for Superior Fitness Prediction
Epitope-specific antibody design using diffusion models on the latent space of ESM embeddings
Protein language models learn evolutionary statistics of interacting sequence motifs
Using artificial sequence coevolution to predict disulfide-rich peptide structures with experimental connectivity in AlphaFold
Preferential Bayesian Optimisation for Protein Design with Ranking-Based Fitness Predictors
FAFormer: Frame Averaging Transformer for Predicting Nucleic Acid-Protein Interactions
LightMHC: A Light Model for pMHC Structure Prediction with Graph Neural Networks
FrameDiPT: SE(3) Diffusion Model for Protein Structure Inpainting
An Active Learning Framework for ML-Assisted Labeling of Cryo-EM Micrographs
Validation of de novo designed water-soluble and membrane proteins by in silico folding and melting.
Structure, Surface and Interface Informed Protein Language Model
De Novo Short Linear Motif (SLiM) Discovery With AlphaFold-Multimer
AF2BIND: Predicting ligand-binding sites using the pair representation of AlphaFold2
Protein generation with evolutionary diffusion: sequence is all you need
LatentDock: Protein-Protein Docking with Latent Diffusion
HiFi-NN annotates the microbial dark matter with Enzyme Commission numbers
Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein Complexes with SE(3)-Discrete Diffusion
SO(3)-Equivariant Representation Learning in 2D Images
HelixDiff: Conditional Full-atom Design of Peptides With Diffusion Models
DiffMaSIF: Surface-based Protein-Protein Docking with Diffusion Models
FLAb: Benchmarking deep learning methods for antibody fitness prediction
Parameter-Efficient Fine-Tuning of Protein Language Models Improves Prediction of Protein-Protein Interactions
TriFold: A New Architecture for Predicting Protein Sequences from Structural Data
End-to-End Sidechain Modeling in AlphaFold2: Attention May or May Not Be All That You Need
Coarse-graining via reparametrization avoids force-matching and back-mapping
SE3Lig: SE(3)-equivariant CNNs for the reconstruction of cofactors and ligands in protein structures
Cramming Protein Language Model Training in 24 GPU Hours
Preparation Of Labeled Cryo-ET Datasets For Training And Evaluation Of Machine Learning Models
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence
Fast protein backbone generation with SE(3) flow matching
Frame2seq: structure-conditioned masked language modeling for protein sequence design
Structure-Conditioned Generative Models for De Novo Ligand Generation: A Pharmacophore Assessment
Jointly Embedding Protein Structures and Sequences through Residue Level Alignment
Evaluating Representation Learning on the Protein Structure Universe
Enhancing Antibody Language Models with Structural Information
Amortized Pose Estimation for X-Ray Single Particle Imaging
Rethinking Performance Measures of RNA Secondary Structure Problems
Structure-based and leakage-free data splits for rigorous protein function evaluation
Uncovering sequence diversity from a known protein structure
Exploiting language models for protein discovery with latent walk-jump sampling