PPI4DOCK FAQ

1. How to read the PPI4DOCK_list.txt?

This text file contains the information of all 1444 docking targets. The header line indicates the names of the 30 items:
(1) Target (interface) name, of a format of "xxxx_MN", with xxxx being the PDB code and M/N are respectively receptor and ligand chain IDs;
(2) X-ray resolution of the reference complex;
(3) Score "Biological" evaluated by noxClass program for the reference interface;
(4) Score "Obligate" evaluated by noxClass program for the reference interface;
(5) Reference interface area;
(6) Reference interface area after stripping corresponding residues (In each subunit model, ab initio modeled tails and less contacting separate sub-regions; so the corresponding residues in reference structures are also removed so that the sequences are identical between reference structure and subunit models);
(7) Number (rounded up) of residues in contact per chain;
(8) TMalign score (structural similarity) between the 2 chains of reference complex;
(9) Interolog tag: 0 means the target is the only member of its interolog group and interfaces with the same non-zero tag belong to the same interolog group. See also items (28) and (29);
(10) Target 1 (receptor) name: xxxx_M;
(11) Number of residues in the receptor;
(12) Template used for building the homology model for the receptor;
(13) Sequence identity between the receptor (10) and its template (12);
(14) TMscore for the receptor model (taking complexed subunit as reference);
(15) GDT_TS score for the receptor model (taking complexed subunit as reference);
(16) RMSD for the receptor model (taking complexed subunit as reference);
(17-23) the same terms as (10-16) for the ligand instead of the receptor;
(24) Number of clashing residues per chain (rounded up);
(25) I-rms (interface RMSD) of the "superimposed decoy" (two subunit models superimposed onto the reference complex);
(26) CAPRI rank of the "superimposed decoy";
(27) Difficulty category;
(28) Interolog group status (redundancy at the superfamily level for the two chains of a complex compared with homologous chains in an interolog complex, using thresholds HHsearch probability 90% and Matras probability 80%): "Unique", "Repres" (representative) or "Redund" (redundant). "Unique" means a target with no interolog; for an interolog group containing multiple members, the representative is the one with the easiest difficulty level (27) (and the highest resolution (2) given the same difficulty category) and other members are labeled as "Redund";
(29) Interolog group representative: see explanation in (28);
(30) Interface group status (redundancy at the interface level, using full-linkage clustering within each Interolog group with iAlign p-value 10E-3): "Unique", "Repres" (representative) or "Redund" (redundant). "Unique" means a target from an interface group containing only one member; for an interface group containing multiple members, the representative is the one with the easiest difficulty level (27) (and the highest resolution (2) given the same difficulty category) and other members are labeled as "Redund";
(31) Interface group representative: see explanation in (30);
(32) Antibody-antigen information: False means not an AA complex, otherwise 2 letters ('H' for heavy chain, 'L' for light chain, 'A' for antigen chain) label two chains target1 (see 10) and target2 (see 17), respectively;
(33) Number of sequences in coupled MSA (the first sequence in fasta files in PPI4DOCK_MSA/ is the residue sequence of PDB, so not counted);

2. How to run a docking experiment for a target?

In the folder PPI4DOCK_docking_set, each target sub-folder (xxxx_MN) contains 4 files named as M_model_st.pdb (input receptor PDB file), N_model_st.pdb (input ligand PDB file), xxxx_MN_st.pdb (reference PDB file) and stripped_res.txt (information about the residues which were stripped). The suffix "_st" means stripped version. Users need to take as input the two input PDB files for the docking and compare the results with the reference PDB file. The chain IDs, residue sequences and residue numbers are exactly the same between the input PDB files and the reference PDB file.
Particularly, during the docking experiment, evolutionary information can also be taken into account. In PPI4DOCK_MSA folder, there are 1444 sub-folders, each containing 4 FASTA files. The pair of xxxx_M_coMSA.fasta and xxxx_N_coMSA.fasta are coupled multiple sequence alignment files for a study of co-evolutionary information, while the pair ending by "_separate_MSA.fasta" are MSA files obtained independently of each other for those who are interested in conservation information only and do not require to consider the co-evolutionary aspect. Each FASTA file contains at least one sequence -- the first one which is the same as the sequence of the protein in the PDB coordinates. This very first sequence is not counted in the number of sequences of MSA (e.g. nb_coMSA = 10 in the PPI4DOCK_list.txt means the corresponding file contains 11 sequences).

3. How to run a scoring experiment on pre-calculated decoys?

The decoys were pre-calculated by Zdock3.0.2 and are located in PPI4DOCK_zd302_decoy_set/ (xxxx_MN_zdock.out) as well as the evaluation by CAPRI standards (xxxx_MN_CAPRI.txt). In order to extract decoys (pdb files) from xxxx_MN_zdock.out, one need to first download Zdock3.0.2 (http://zdock.umassmed.edu/software/). The Python script (python2.x) create_decoys.py can help generate decoys automatically. Before using this script, edit this file to set the correct paths for Zdock3.0.2 and PPI4DOCK directories. The command is "create_decoys.py target_name out_directory". Decoys can also be generated manually by following steps: (1) make an empty directory and copy xxxx_MN_zdock.out, M_model_st.pdb, N_model_st.pdb from PPI4DOCK and all the content from Zdock3.0.2 directory into this folder; (2) commands "./mark_sur M_model_st.pdb M_model_st_m.pdb" and "./mark_sur N_model_st.pdb N_model_st_m.pdb"; (3) command "./create.pl xxxx_MN_zdock.out".
Either way, there are all 54000 decoy PDB files named "complex.i.pdb" with i ranging from 1 to 54000 as results. A scoring method can start to assess the ensemble of decoys and see if scoring results are consistent with the evaluation results by CAPRI standards in xxxx_MN_CAPRI.txt. When we evaluated all decoys, those with F-nat (fraction of native contacts) < 10% were first filtered out, because they were certainly "Incorrect". For remaining decoys, more features were calculated and they were stored in xxxx_MN_CAPRI.txt: (1) decoy name; (2) RMSD of receptor; (3) RMSD of ligand; (4) RMSD of the entire decoy; (5) L-RMSD (RMSD of ligand if receptor chain is superimposed); (6) "Receptor-RMSD" (RMSD of receptor if ligand is superimposed); (7) F-nat (Fraction of NATive contacts); (8) F-nonnat (Fraction of NON-NATive contacts); (9) F-IR (Fraction of correct Interface Residues for both receptor and ligand chains); (10) I-RMSD (Interface RMSD); (11) CAPRI rank ("Incorrect", "Acceptable", "Medium" and "High"). CAPRI rank is obtained based on L-RMSD, F-nat and I-RMSD, following CAPRI standards (Méndez et al., 2005).
As above, (co-)evolutionary information can be taken into account during the scoring phase by using the provided MSAs.

How to cite us:

Jinchao Yu and Raphael Guerois: PPI4DOCK: Large scale assessment of the use of homology models in free docking over more than 1000 realistic targets (Bioinformatics, 2016, e-pub)

Comment and request: Coordinator

PPI4DOCK FAQ

Comment and request:
Coordinator