Content
We share a novel way of combining scores with
evolutionary information at atomic level to improve near-native
discrimination rates in scoring. We make use of explicitely modelled
homologous decoys to enrich scoring of each query decoy by averaging the
scores over the query and its homologs. We illustrate this concept on
752 protein complexes from the PPI4DOCK benchmark (
detailed list) and provide the
necessary data and scripts within a singularity image to be able to reproduce the
pipeline.
Download list:
Dataset containing input query and homolog monomers as
well as the coMSA they were extracted from, rotation and translation
matrices outputed by the FRODOCK docking software and scoring and
consensus outputs with InterEvScore, SOAP-PP and Rosetta's Interface
Score (see description below):
->
PPI4DOCK_data.tar.gz (2.1 G)
- PPI4DOCK_list_752_at_least_one_FRODOCK_Acc_top10000_10plus_co_seqs_noAntibody.txt
- PPI4DOCK_pdbs/
- PPI4DOCK_MSA/
- PPI4DOCK_docking/
- PPI4DOCK_scores/
- PPI4DOCK_consensus/
A Singularity Image containing all tools and scripts necessary to execute the whole docking pipeline (see description below):
->
interevdata_tools.sif (768 Mo)
Description:
The provided data aims at illustrating our novel
concept of extrapolating evolutionary information contained in coMSAs to
an atomic level of detail, making it directly compatible with multiple
scoring functions in the light of molecular docking. Our example makes
use of the 752 cases in the PPI4DOCK docking dataset for which all data
and scripts necessary for application are provided.
Briefly, our method consists in modelling the
unbound homologs identified in the coMSAs of both query protein partners
based on the rapid threading protocol in RosettaCM. Once the query
proteins are docked with FRODOCK2.1, homolog equivalents of each query
decoy can be easily generated in a rigid-body fashion using FRODOCK's
outputed rotation and translation docking coordinates, thus providing
atomic models of homologous decoys directly associated with each query
decoy without needing a computationally expensive clustering step. From
then on, homolog decoys can be scored, just as query decoys, by a
scoring function of your choice and we decided to average over the query
and homolog scores of each decoy in order to get a final homology-based
variant of our scoring function. In our example, we focus on three
scores in addition to FRODOCK's mainly physics-based score:
InterEvScore, which already implicitly takes co-evolutionary information
into account in combination with a multi-body statistical potential;
SOAP-PP, which uses a sophisticated atomic statistical potential; and
Rosetta's Interface Score, which combines empirical and physical
interaction terms.
A detailed list of what is provided in the attached
folder can be found in
FAQ question 1 as well as
detailed description of how to read the pdb list
PPI4DOCK_list_752_at_least_one_FRODOCK_Acc_top10000_10plus_co_seqs_noAntibody.txt (see
FAQ question 2).
We provide :
- All unbound pdb files (query and modelled homology models)
- The translation and rotation matrices outputed by FRODOCK2.1
- The coMSAs
- A Singularity Image containing all tools and scripts necessary to
execute the whole docking pipeline or generate or score decoy structures (InterEvData Code).
How to cite us:
Chloé Quignot, Pierre Granger, Pablo Chacón, Raphaël Guerois and Jessica Andreani:
Atomic-level evolutionary information improves protein-protein interface scoring (BioRxiv doi:10.1101/2020.10.26.355073)
Tools used:
FRODOCK2.1 (Ramírez-Aportela, E., et al.
(2016). "FRODOCK 2.0: Fast Protein-Protein docking server."
Bioinformatics: btw141. doi:10.1093/bioinformatics/btw141)
InterEvScore (Andreani, J., et al. (2013).
"InterEvScore: a novel coarse-grained interface scoring function using a
multi-body statistical potential coupled to evolution." Bioinformatics
29(14): 1742-1749. doi:10.1093/bioinformatics/btt260)
SOAP-PP (Dong, G. Q., et al. (2013).
"Optimized atomic statistical potentials: assessment of protein
interfaces and loops." Bioinformatics 29(24): 3158-3166. doi:
10.1093/bioinformatics/btt560)
RosettaCM (Song, Y., Dimaio, F., Wang, R. Y.,
Kim, D., Miles, C., Brunette, T. J., … Baker, D. (2013).
High-Resolution Comparative Modeling with RosettaCM. Structure,
21(SuppMat), 1735–42. doi:10.1016/j.str.2013.08.005)
Rosetta Interface Score (Lyskov, S. and J. J. Gray
(2008). "The RosettaDock server for local protein-protein docking."
Nucleic Acids Res 36(Web Server issue): W233-238.
doi:10.1093/nar/gkn216; Chaudhury, S., et al. (2011). "Benchmarking and
analysis of protein docking performance in Rosetta v3.2." PLoS One 6(8):
e22477. doi:10.1371/journal.pone.0022477)
HHfilter (Steinegger, M., et al. (2019).
"HH-suite3 for fast remote homology detection and deep protein
annotation." BMC Bioinformatics 20(1): 473.
doi:10.1186/s12859-019-3019-7)
Mafft (Katoh, K. and D. M. Standley (2013).
"MAFFT multiple sequence alignment software version 7: improvements in
performance and usability." Mol Biol Evol 30(4): 772-780.
doi:10.1093/molbev/mst010)
Clustalw (Larkin, M. A., et al. (2007).
"Clustal W and Clustal X version 2.0." Bioinformatics 23: 2947-2948.
doi:10.1093/bioinformatics/btm404)
Singularity (Kurtzer, G.M., Sochat, V., Bauer, M.W. (2017).
"Singularity: Scientific containers for mobility of compute." PLoS ONE 12(5): e0177459.
doi:10.1371/journal.pone.0177459)