What is PPI4DOCK?
PPI4DOCK is a large benchmark set for studying protein docking/scoring methods. It contains 1417 non-redundant docking targets between two subunits. Each subunit is a protein model obtained by homology modeling based on an unbound template.
Download list:
Standard dataset in which ab initio modeled tails and less interacting regions were removed (see usage below):
->
PPI4DOCK.zip
- PPI4DOCK_list.txt
- PPI4DOCK_docking_set/
- PPI4DOCK_zd302_decoy_set/
- PPI4DOCK_MSA/
- create_decoys.py
- README.txt
Unstripped docking set (ab initio modeled tails and less interacting subregions were kept):
->
PPI4DOCK_unstripped_docking_set.zip
Supplementary data (XML headers and pdb files of the biological assembly according to XML files):
->
SUPPLEMENTARY.zip
- library_biological_assembly_pdb/
- library_xml/
Usage:
The PPI4DOCK docking dataset aims at contributing to the development of protein docking protocols. It contains 1417 non-redundant docking targets. Each target contains two binding partners' models in their unbound state for docking simulations and the reference experimental structure of the complex for the assessment of docking results. In particular, the 3 easiest categories (very easy, easy and hard) are appropriate for studying rigid-body docking methods. The file PPI4DOCK_list.txt provides useful information for docking (see more in
FAQ question 1) and pdb_annotation_list.txt lists annotations for the complex and each chain in Protein Data Bank. See how to do a docking experiment in
FAQ question 2.
For scientists who are interested only in scoring methods but not in the docking process, the PPI4DOCK decoy set contains the pre-calculated docking results by Zdock 3.0.2. Users can quickly create at most 54000 decoys after installing the Zdock 3.0.2 program (
http://zdock.umassmed.edu/software/) and apply their scoring functions/programs. All the decoys were pre-evaluated by the standard CAPRI criteria (decoys with less than 10% of correct contacts were not stored in the file because they were certainly incorrect, see
FAQ question 3). Users can compare their scoring results with those data. See more in
FAQ question 3.
Note that, in the standard dataset, residues modeled ab initio for protein tails are removed and if one chain is split into several spatially separate sub-regions, only the sub-region with the highest number of residues in contact with the other chain was kept. If one wants to take into account the ab initio modeled tails and the spatially separate sub-regions, a version of the models generated with the tails and sub-regions is also available (it needs to be downloaded separately).
All XML header files from Protein Data Bank and PDB files representing the biological assembly are available, for those who want to study all subunits in assemblies, small ligands and/or extract additional information from XML files.
How to cite us:
Jinchao Yu and Raphael Guerois: PPI4DOCK: Large scale assessment of the use of homology models in free docking over more than 1000 realistic targets (Bioinformatics, 2016, e-pub)
Tools used for the development of PPI4DOCK:
InterEvol database (Faure, G., Andreani, J., & Guerois, R. (2012). InterEvol database: exploring the structure and evolution of protein complex interfaces. Nucleic Acids Research, 40(Database issue), D847–56. doi:10.1093/nar/gkr845)
RosettaCM (Song, Y., Dimaio, F., Wang, R. Y., Kim, D., Miles, C., Brunette, T. J., … Baker, D. (2013). High-Resolution Comparative Modeling with RosettaCM. Structure, 21(SuppMat), 1735–42. doi:10.1016/j.str.2013.08.005)
HHsearch (Söding, J. (2005). Protein homology detection by HMM-HMM comparison. Bioinformatics (Oxford, England), 21(7), 951–60. doi:10.1093/bioinformatics/bti125)
Zdock 3.0.2 (Pierce, B. G., Wiehe, K., Hwang, H., Kim, B., Vreven, T., & Weng, Z. (2014). ZDOCK server: interactive docking prediction of protein-protein complexes and symmetric multimers. Bioinformatics (Oxford, England), 30(12), 1771–3. doi:10.1093/bioinformatics/btu097)
iAlign (Gao, M., & Skolnick, J. (2010). iAlign: a method for the structural comparison of protein-protein interfaces. Bioinformatics (Oxford, England), 26(18), 2259–65. doi:10.1093/bioinformatics/btq404)
TM-align (Zhang, Y., & Skolnick, J. (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research, 33(7), 2302–9. doi:10.1093/nar/gki524)
TM-score (Zhang, Y., & Skolnick, J. (2004). Scoring function for automated assessment of protein structure template quality. Proteins, 57(4), 702–10. doi:10.1002/prot.20264)
NOXclass (Zhu, H., Domingues, F. S., Sommer, I., & Lengauer, T. (2006). NOXclass: prediction of protein-protein interaction types. BMC Bioinformatics, 7, 27. doi:10.1186/1471-2105-7-27)