AI Tools for Protein Structures
trRosetta
They inverted this network to generate new protein sequences from scratch, aiming to design proteins with structures and functions not found in nature.By conducting Monte Carlo sampling in sequence space and optimizing the predicted structural features, they managed to produce a variety of new protein sequences.
RFdiffusion
Watson, Joseph L., et al[1] published the RFdiffusion at github in 2023. It fine-tune the RoseTTAFold[2] and designed for tasks like: protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding and symmetric motif scaffolding for therapeutic and metal-binding protein design. It is a very powerful tool according to the paper. It is based on the Denoising diffusion probabilistic models (DDPMs) which is a powerful class of machine learning models demonstrated to generate new photorealistic images in response to text prompts[3]. |
They use the ProteinMPNN[4] network to subsequently design sequences encoding theses structure. The diffusion model is based on the DDPMs. It can not only design a protein from generation, but also able to predict multiple types of interactions as is shown of the left. It was based on the RoseTTAFold.
Compared with AF2
- AlphaFold2 is like a very smart detective that can figure out the 3D shape of a protein just by looking at its amino acid sequence. On the other hand, RFdiffusion is more like an architect that designs entirely new proteins with specific properties. Instead of just figuring out shapes, it creates new proteins that can do things like bind to specific molecules or perform certain reactions. This makes it incredibly useful for designing new therapies and industrial enzymes.
ImmuneBuilder
ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins
Method of ABodyBuilder2
- First, the heavy and light chain sequences are fed into four separate deep-learning models to predict an ensemble of structures. The closest structure to the average is then selected and refined using OpenMM[5] to remove clashes and other stereo-chemical errors. The same pipeline is used for NanoBodyBuilder2 and TCRBuilder2.
- Training data set: 7084 structures from SAbDab. Filtering: No missing residues and resolution ≤ 3.5 Å
- Architect: The architecture of the deep learning model behind ABodyBuilder2 is an antibody-specific version of the structure module in AlphaFold-Multimer with several modifications
- Frame Aligned Point Error (FAPE) loss (like AFM)
A set of deep learning models trained to accurately predict the structure of antibodies (ABodyBuilder2), nanobodies (NanoBodyBuilder2) and T-Cell receptors (TCRBuilder2). ImmuneBuilder generates structures with state of the art accuracy while being much faster than AlphaFold2.
Experience it online: Google Colab
GitHub: oxpig/ImmuneBuilder
They have built three models
- ABodyBuilder2, an antibody-specific model
- NanoBodyBuilder2, a nanobody-specific model
- TCRBuilder2, a TCR-specific model.
It compared the performance with other similar tools:
- homology modelling method; ABodyBuilder[6]
- general protein structure prediction method: AlphaFold-Multimer[7]
- antibody-specific methods: ABlooper[8] (ABL), IgFold[9] (IgF) and EquiFold[10] (EqF)
How: compare 34 antibody structures recently added
Method | CDR-H1 | CDR-H2 | CDR-H3 | Fw-H | CDR-L1 | CDR-L2 | CDR-L3 | Fw-L |
---|---|---|---|---|---|---|---|---|
ABodyBuilder (ABB) | 1.53 | 1.09 | 3.46 | 0.65 | 0.71 | 0.55 | 1.18 | 0.59 |
ABlooper (ABL) | 1.18 | 0.96 | 3.34 | 0.63 | 0.78 | 0.63 | 1.08 | 0.61 |
IgFold (IgF) | 0.86 | 0.77 | 3.28 | 0.58 | 0.55 | 0.43 | 1.12 | 0.60 |
EquiFold (EqF) | 0.86 | 0.80 | 3.29 | 0.56 | 0.47 | 0.41 | 0.93 | 0.54 |
AlphaFold-M (AFM) | 0.86 | 0.68 | 2.90 | 0.55 | 0.47 | 0.40 | 0.83 | 0.54 |
ABodyBuilder2 (AB2) | 0.85 | 0.78 | 2.81 | 0.54 | 0.46 | 0.44 | 0.87 | 0.57 |
- What is an acceptable RMSD[11]?
What is an acceptable RMSD?
The experimental error in protein structures generated via X-ray crystallography has been estimated to be around 0.6Å for regions with organised secondary structures (such as the antibody frameworks) and around 1Å for protein loops.
Side Chain Prediction
ABlooper and IgFold only predict the position of backbones, leaving the side chain to OpenMM[5:1] and Rosetta[12], while EquiFold, AlphaFold-Multimer and ABodyBuilder2, all of which output all-atom structures.
equifold
Designing proteins to achieve specific functions often requires in silico modeling of their properties at high throughput scale and can significantly benefit from fast and accurate protein structure prediction. We introduce EquiFold, a new end-to-end differentiable, SE(3)-equivariant, all-atom protein structure prediction model. EquiFold uses a novel coarse-grained representation of protein structures that does not require multiple sequence alignments or protein language model embeddings, inputs that are commonly used in other state-of-the-art structure prediction models. Our method relies on geometrical structure representation and is substantially smaller than prior state-of-the-art models. In preliminary studies, EquiFold achieved comparable accuracy to AlphaFold but was orders of magnitude faster. The combination of high speed and accuracy make EquiFold suitable for a number of downstream tasks, including protein property prediction and design.
https://github.com/Genentech/equifold
IgFold
Official repository for IgFold: Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.
The code and pre-trained models from this work are made available for non-commercial use (including at commercial entities) under the terms of the JHU Academic Software License Agreement. For commercial inquiries, please contact Johns Hopkins Tech Ventures at awichma2@jhu.edu.
Try antibody structure prediction in Google Colab.
https://github.com/Graylab/IgFold
!!! Personal experience
I feel that the IgFold is kind of horrible in CDRH3 regions. It predicted CDRH3 loop in an erect conformation incorrectly. It is worse than ABodyBuilder2. It is even slower than ABodyBuilder2, too. It only takes the perfect Fab sequences. Any longer seqeunces would end up as a mass.
Watson J L, Juergens D, Bennett N R, et al. De novo design of protein structure and function with RFdiffusion[J]. Nature, 2023, 620(7976): 1089-1100. ↩︎
Baek M, et al. Accurate prediction of protein structures and interactions using a 3-track network. Science. July 2021. ↩︎
Ramesh, A. et al. Zero-shot text-to-image generation. in Proc. 38th International Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 8821–8831 (PMLR, 2021). ↩︎
Dauparas J, Anishchenko I, Bennett N, et al. Robust deep learning–based protein sequence design using ProteinMPNN[J]. Science, 2022, 378(6615): 49-56. ↩︎
Eastman, P. et al. OpenMM 7: rapid development of high-performance algorithms for molecular dynamics. PLoS Comput. Biol. 13, e1005659 (2017). ↩︎ ↩︎
Leem, J., Dunbar, J., Georges, G., Shi, J. & Deane, C. M. ABodyBuilder: automated antibody structure prediction with data-driven accuracy estimation. MAbs 8, 1259–1268 (2016). ↩︎
Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv (2021). ↩︎
Abanades, B., Georges, G., Bujotzek, A. & Deane, C. M. ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics 38, 1877–1880 (2022). ↩︎
Ruffolo, J. A., Chu, L.-S., Mahajan, S. P. & Gray, J. J. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat. Commun. 14, 2389 (2023). ↩︎
Lee, J. H. et al. Equifold: Protein structure prediction with a novel coarse-grained structure representation. bioRxiv (2022). ↩︎
Eyal, E., Gerzon, S., Potapov, V., Edelman, M. & Sobolev, V. The limit of accuracy of protein modeling: influence of crystal packing on protein structure. J. Mol. Biol. 351, 431–442 (2005).Return to ref 35 in article ↩︎
Alford, R. F. et al. The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13, 3031–3048 (2017). ↩︎
AI Tools for Protein Structures