0 Posted 2024-10-15Updated 2025-04-22Biology / Bioinformatics / Protein Structure17 minutes read (About 2496 words)

Protein Dock Overview

Physical Based Docking

1982: Dock; Kuntz, Irwin D., et al.^[1] (Rigid body-shape based)


© Kuntz, Irwin D., et al. 1982^[1:1]

In this paper, Kuntz present a way of docking prediction by searching the steric overlap based on the knowing surface structure of 2 proteins. It originally developed by Irwin “Tack” Kuntz and colleagues at the University of California, San Francisco (UCSF), DOCK was initially used for small-molecule docking. However, it laid the foundation for the development of more advanced docking algorithms and software that could handle macromolecular docking.

In the first generation of the Dock, it focus on 2 rigid bodies. It treat 2 proteins as one object. The goal of this program is to fix the 6 degree of freedom (3 transitions and 3 orientations) that determine the best relative position. For achieving this goal, three rules are followed:

No overlap between 2 proteins
all hydrogen are pared with N or O within 3.5 Å.
all ligand atoms within the receptor binding cite.

Dock families:

1994: Firstly extend the DOCK into DNA-protein Docking and by screening the Cambridge Crystallographic Database, they find that the protein CC-1065 has high score.^[2]
- 1999: DREAM++^[3]: It is a extent package for Dock. It use Dock to predict binding and evaluated the interaction and predicts the product, finally search to find the prohibits.
2001: DOCK 4.0^[4]: It added incremental construction (to sample the internal degrees of freedom of the ligand) and random search. In the Dock4, the ligand is not rigid anymore. Ligands with rotatable-bonds generated multiple conformation by other model.
2006: DOCK 5.0^[5]:
- anchoring: new scoring functions, sampling methods and analysis tools; energy minimizing was mentioned during the.
- scoring: energy scoring function based on the AMBERL: only intermolecular van der Waals (VDW) and electrostatic components in the function.
- main limitation: Ligands has lots of rotatable-bonds would cause lots of resource. During the test set, ligands with > 7 rotatable bonds were removed.
- Some test data correction: using “Compute” and “Biopolymer” from Sybyl^[6] to calculate the Gasteiger–Hückel partial electrostatic charges and add hydrogen for residues.
2009: DOCK 6^[7]: In this version, it extents it’s abilities in RNA-ligands. But the rotatable-bonds from the ligands are still limited into 7~13. With the increasing of the RNA, the accuracy are decreased.
- update scoring in solvation energy:
  - Hawkins–Cramer–Truhlar (HCT) generalized Born with solvent-accessible surface area (GB/SA) solvation scoring with optional salt screening
  - Poisson–Boltzmann with solvent-accessible surface area (PB/SA) solvation scoring
  - AMBER molecular mechanics with GB/SA solvation scoring and optional receptor flexibility
- other scoring:
  - VDW: grid-based form of the Lennard-Jones potential
  - electrostatic: Zap Tool Kit from OpenEye
2013: DOCK3.7^[8]:

DOCK4	DOCK5	DOCK6

	incremental: anchor-and-grow	The number of rotatable-bonds hashuge effects on success rate

anchor-and-grow

The “anchor-and-grow” conformational search algorithm. The algorithm performs the following steps: (1) DOCK perceives the molecule’s rotatable bonds, which it uses to identify an anchor segment and overlapping rigid layer segments. (2) Rigid docking is used to generate multiple poses of the anchor within the receptor. (3) The first layer atoms are added to each anchor pose, and multiple conformations of the layer 1 atoms are generated. An energy score within the context of the receptor is computed for each conformation. (4) The partially grown conformations are ranked by their score and are spatially clustered. The least energetically favorable and spatially diverse conformations are discarded. (5) The next rigid layer is added to each remaining conformation, generating a new set of conformations. (6) Once all layers have been added, the set of completely grown conformations and orientations is returned

Method	Ligand sampling method^a	Receptor sampling method^a	Scoring function^b	Solvation scoring^c,d
DOCK 4/5	IC	SE	MM	DDD, GB, PB
FlexX/FlexE	IC	SE	ED	NA
Glide	CE + MC	TS	MM + ED	DS
GOLD	GA	GA	MM + ED	NA

^aSampling methods are defined as Genetic Algorithm (GA), Conformational Expansion (CE), Monte Carlo (MC), incremental construction (IC), merged target structure ensemble (SE), torsional search (TS)
^bScoring functions are defined as either empirically derived (ED) or based on molecule mechanics (MM)
^cIf the package does not accommodate this option, the symbol NA (Not Available) is used
^dAdditional accuracy can be added to the scoring function using implicit solvent models. The most commonly used options are distance dependent dielectric (DDD), a parameterized desolvation term (DS), generalized Born (GB) and linearized Poisson Boltzmann (PB)

2003: ZDock

Version iteration:

ZDOCK 2.3/2.3.2 Scoring Function: Chen R, Li L, Weng Z. (2003) ZDOCK^[9]
ZDOCK 3.0/3.0.2 Scoring Function: Mintseris J, Pierce B, Wiehe K, Anderson R, Chen R, Weng Z. (2007)^[10]
M-ZDOCK: Pierce B, Tong W, Weng Z. (2005) M-ZDOCK^[11]
ZDOCK 3.0.2/2.3.2: Pierce BG, Hourai Y, Weng Z. (2011)^[12]
Online Server: Pierce BG, Wiehe K, Hwang H, Kim BH, Vreven T, Weng Z. (2014) ZDOCK Server^[13]

Abstract

ZDock was developed for ubbound docking. It is based on pairwise shape complementarity (Docking) with desolvation and electrostatics (Scoring). In there test, it shows high success rate in the antibody-antigen docking test case. It is especially helpful in “large concave binding pocket”.

Before the ZDock, there are:

FTDOck: gird-based shape complementarity (GSC) and electrostatic using a Fast Fourier Transform (FFT)
DOT: FFT-based computes Poission-Bolzmann electrostatics.
HEX: evaluates overlapping surface skins and electrostatic complementarity with Fourier coorelation.
GRAMM: low-resolutoin docking with the similar scoring as FTDOck;
PPD: matches critial poitns by using geometric hashing.
GIGGER: maximal surface mapping and favorable amino acid contacts by bit-mapping.
DARWIN: molecular mechanics energy defined according to CHARMM.

For ZDock:

Optimizes desolvation (GSC), key scoring function.
- GSC = grid points surrounding the receptor corresponding to ligand atoms - clash penalty
*FFT for electrostatics
Novel pairwise shape complementarity function (PSC) by distance cut-off of receptor-ligand atom minus clash penalty.
- Favorable: Number of pair within cutoff
- Penalty: The clash penalty for core-core, surface-core, and surface-surface (9⁹, 9³, 9)
DE: desolvation, estimated by atomic contact energy (ACE), which is a free energy change of breaking two protein atom-water contacts and forming a protein atom-protein atom contact and water-water contact. The sum of ACE is DE

Version for scoring functions:

ZDOCK1.3^[14]: GSC+DE+ELEC
ZDOCK2.1^[15]: PSC
ZDOCK2.2^[9:1]: PSC+DE
ZDOCK2.3^[9:2]: PSC+DE+ELEC

2004: ClusPro

ClusPro: a fully automated algorithm for protein–protein docking

2010: Hex

Ultra-fast FFT protein docking on graphics processors

Home page, Documentation

Hex is extremely fast but lack of accuracy. I tried to sampling over 100,1000 but results even close to native structure.
On the other hand, I didn’t find a way to mark the surface residues so we could focus on specific area. Although, GhatGPT said it could do constrained docking, but it seems we could only constrain the range angles of the receptor and the ligand.


© SAMSON

2014: rDock

rDock: a fast, versatile and open source program for docking ligands to proteins and nucleic acids

2018: InterEvDock

Protein-Protein Docking Using Evolutionary Information

Machine Learning Based Docking

2021: DeepRank

Model Grpah Abstract	Model Name
© Chen, M., & Zhou, X	DeepRank
© Réau, M.	DeepRank-GNN
© Crocioni, G.	Deeprank2

DeepRank^[16] is a open source framework designed to analyze 3D protein-protein interfaces by using deep learning to capture spatial and biochemical features. The paper presents DeepRank’s approach to transforming 3D structural data into 3D grids that a neural network can process. This setup allows DeepRank to identify interaction patterns, rank docking models, and predict binding affinities with high accuracy. It’s especially useful for discovering patterns in protein interfaces that might be overlooked with traditional scoring functions.

In this model, it turn the pdb into sql for efficient processing. The interfacing residues cut-off is 5.5 Å. When find all interfacing atoms, they would be mapped into **3D grid using a Gaussian mapping. The target value is very flexible, too. You can using any kind of values, iRMSD, FNAT, or DockQ score for instance, as the target values (Predicted value). The data was stored as hdf5 format which keep the efficiency and small storage size.

DeepRank family:

DeepRank^[16:1]: 2021, Chen, M., et al.; It mapped the protein interfacing into a 3D grid and using CNN to train the regression model. It established the foundation of the architectural of DeepRank.
- In the DeepRank, it use information both from atom-level and residue-level. From the atom level, it calculates the atom density, charges, electrostatic energy, and VDW contacts. In residue-level, it included number of residue-residue contacts, buried surface area, and Position specific scoring matrix (PSSM)
DeepRank-GNN^[17]: 2023, Réau, M. et al.; from the same team replace the 3D grid based CNN into GNN which could avoid rotation challenge in 3D grid.
- The input information is very similar to the DeepRank. Instead of 3D grid, it relies on the adjacent matrix to build the network. In this time, the cut-off became 8.5 Å.
- It has more rich features like Distance, residue half sphere exposure, Residue depth (from biopython, MSMS)
Deeprank_GNN_ESM^[18]: 2024, Xu, X., et al.; The PSSM calculating requires sequence alignment which consumes lots of time. For generate the graph efficiently, they replaced the PSSM with ESM embedding vectors.
DeepRank2^[19]: 2024, Crocioni, G., et al.; In the DeepRank2., it supports both 3D grid and graph network as inputs. It also integrated the Deep-Mut to do in silicon mutation screening.

DockQ

DockQ is a knowledge based docking evaluation tool. It divided the docking results into 4 categories: Incorrect, Acceptable, Medium, or High quality. The score it uses is: F_nat, LRMS, and iRMS as proposed and standardized by CAPRI. The training set are extremely in balanced. It has over 56,000 incorrect docking, 760 acceptable, 850 mdeium, and 74 high quality. w

Online Tools

ClusPro AbEMap: The ClusPro AbEMap web server for the prediction of antibody epitopes
ClusterPro?
CCharPPI
AbAgIntPre: No structure, binary output results only
CSM-AB: CSM-AB: graph-based antibody–antigen binding affinity prediction and docking scoring function

Other tools

[ZRANK2]

Other Infor

Antibody-Antigen Structures and affinities

Protein Dock Overview

https://karobben.github.io/2024/10/15/AI/proteindock/

Author

Karobben

Posted on

2024-10-15

Updated on

2025-04-22

Licensed under

Protein Dock Overview

Physical Based Docking