This website is free and open to all users and there is no login requirement.
 

Welcome to InterEvData


Content

We share a novel way of combining scores with evolutionary information at atomic level to improve near-native discrimination rates in scoring. We make use of explicitely modelled homologous decoys to enrich scoring of each query decoy by averaging the scores over the query and its homologs. We illustrate this concept on 752 protein complexes from the PPI4DOCK benchmark (detailed list) and provide the necessary data and template scripts to be able to reproduce the pipeline.

Download list:

Dataset containing input query and homolog monomers as well as the coMSA they were extracted from, rotation and translation matrices outputed by the FRODOCK docking software and scoring and consensus outputs with InterEvScore, SOAP-PP and Rosetta's Interface Score (see description below):
-> PPI4DOCK_data.tar.gz (2.3 Go)
Description:

The provided data aims at illustrating our novel concept of extrapolating evolutionary information contained in coMSAs to an atomic level of detail, making it directly compatible with multiple scoring functions in the light of molecular docking. Our example makes use of the 752 cases in the PPI4DOCK docking dataset for which all data and scripts necessary for application are provided.

Briefly, our method consists in modelling the unbound homologs identified in the coMSAs of both query protein partners based on the rapid threading protocol in RosettaCM. Once the query proteins are docked with FRODOCK2.1, homolog equivalents of each query decoy can be easily generated in a rigid-body fashion using FRODOCK's outputed rotation and translation docking coordinates, thus providing atomic models of homologous decoys directly associated with each query decoy without needing a computationally expensive clustering step. From then on, homolog decoys can be scored, just as query decoys, by a scoring function of your choice and we decided to average over the query and homolog scores of each decoy in order to get a final homology-based variant of our scoring function. In our example, we focus on three scores in addition to FRODOCK's mainly physics-based score: InterEvScore, which already implicitly takes co-evolutionary information into account in combination with a multi-body statistical potential; SOAP-PP, which uses a sophisticated atomic statistical potential; and Rosetta's Interface Score, which combines empirical and physical interaction terms.

A detailed list of what is provided in the attached folder can be found in FAQ question 1 as well as detailed description of how to read the pdb list PPI4DOCK_list_752_at_least_one_FRODOCK_Acc_top10000_10plus_co_seqs_noAntibody.txt (see FAQ question 2).

We provide :


How to cite us:

Chloé Quignot, Pierre Granger, Pablo Chacón, Raphaël Guerois and Jessica Andreani: Atomic-level evolutionary information improves protein-protein interface scoring (in preparation)



Tools used:

FRODOCK2.1 (Ramírez-Aportela, E., et al. (2016). "FRODOCK 2.0: Fast Protein-Protein docking server." Bioinformatics: btw141. doi:10.1093/bioinformatics/btw141)

InterEvScore (Andreani, J., et al. (2013). "InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution." Bioinformatics 29(14): 1742-1749. doi:10.1093/bioinformatics/btt260)

SOAP-PP (Dong, G. Q., et al. (2013). "Optimized atomic statistical potentials: assessment of protein interfaces and loops." Bioinformatics 29(24): 3158-3166. doi: 10.1093/bioinformatics/btt560)

RosettaCM (Song, Y., Dimaio, F., Wang, R. Y., Kim, D., Miles, C., Brunette, T. J., … Baker, D. (2013). High-Resolution Comparative Modeling with RosettaCM. Structure, 21(SuppMat), 1735–42. doi:10.1016/j.str.2013.08.005)

Rosetta Interface Score (Lyskov, S. and J. J. Gray (2008). "The RosettaDock server for local protein-protein docking." Nucleic Acids Res 36(Web Server issue): W233-238. doi:10.1093/nar/gkn216; Chaudhury, S., et al. (2011). "Benchmarking and analysis of protein docking performance in Rosetta v3.2." PLoS One 6(8): e22477. doi:10.1371/journal.pone.0022477)

HHfilter (Steinegger, M., et al. (2019). "HH-suite3 for fast remote homology detection and deep protein annotation." BMC Bioinformatics 20(1): 473. doi:10.1186/s12859-019-3019-7)

Mafft (Katoh, K. and D. M. Standley (2013). "MAFFT multiple sequence alignment software version 7: improvements in performance and usability." Mol Biol Evol 30(4): 772-780. doi:10.1093/molbev/mst010)

Clustalw (Larkin, M. A., et al. (2007). "Clustal W and Clustal X version 2.0." Bioinformatics 23: 2947-2948. doi:10.1093/bioinformatics/btm404)