This website is free and open to all users and there is no login requirement.
 
InterEvScore help

Help contents

1. What is shipped in the InterEvScore package?

The InterEvScore package contains the following:


2. Running InterEvScore

2.1. Running options (command line)

InterEvScore should be run from the command line using:

[Python_executable] {InterEvScore_directory}/InterEvScore.py [running_options]
Example:

python InterEvScore.py -p my_PDB_id
                       -f my_complex_file.pdb
                       -a1 alig_for_first_chain.fasta
                       -a2 alig_for_second_chain.fasta
                       -o my_output_file.txt
                       -ch AB
                       -c config_IvS.ini

If no running options are provided, a help message will be displayed.

The following arguments should be provided:

2.2. Settings (config_IvS file)

2.2.1. [MATRIX] Propensity matrices

The propensity matrices for two- and three-body statistical potentials are provided in the InterEvScore package. Update file locations according to where you unpacked InterEvScore.


2.2.2. [PROGRAMS] Paths to the necessary programs

Update the locations of the necessary programs: Kalign, Naccess and the AlphaShape executable.


2.2.3. [MODULES] Paths to the necessary modules

Update the location of the networkx module. If this line is not provided, InterEvScore will assume that your paths are already set to include the networkx module (otherwise InterEvScore will exit without scoring).


2.2.4. [ALPHASHAPE_PARAMETERS] Parameters for alpha-shape calculations

The parameter file param.dat is provided with InterEvScore; its location should be updated depending on where you unpacked InterEvScore.
The rest of the parameters in this section can be changed but this will affect the calculation of alpha-shape-based contacts and might lead to unexpected results.
If the proteins of interest have special properties regarding the apolarity of the surface, the patch_expansion value may be changed to yield more appropriate apolar patch calculation. If the protein surfaces are very apolar, the value can be increased for a more stringent patch detection (typically to 1.25). If the protein surface is very polar, the value can be decreased to detect more patches (typically to 0.75). It is however recommended to keep the patch_expansion value between 0.5 and 1.5.


2.2.5. [MODES] Modes to run InterEvScore

Two modes are defined in this section.

2.2.6. [STORAGE] Directory where to store patch files

A directory should be provided in order to store the patch file which is calculated only once in each run of InterEvScore and only once for a given PDB id (the PDB id is given by the -p argument). If no directory is provided or the provided directory does not exist, the default output directory will be ~/PDBid (where ~ refers to the home directory of the current user and PDBid to the PDB id provided as -p argument).



2.3. Input

2.3.1. PDB file(s)

InterEvScore can be run using either a single PDB file (-f argument) or a list of PDB files (-l argument) or a directory containing PDB files (-d argument). If a list of PDB files is used, the list file should contain the full paths of all PDB files (one file per line). If a directory is provided, then all files with .pdb extension in this directory will be scored.

All PDB files for a given InterEvScore run should be coordinate files containing at least the two chains provided as argument (-ch argument). They should be properly formatted and include chain identifiers. InterEvScore can normally handle files containing non-ATOM lines and positions corresponding to non-amino acids (DNA, RNA) or non-canonical amino acids; however, this information will globally be ignored.

The two chains which will be considered by InterEvScore should have the same composition and internal set of coordinates (i.e. they should be exactly superimposable) in all PDB files. This means that the list provided as -l argument can be a list of decoys (candidate interfaces) generated for the same pair of proteins, but it cannot be a list of PDB files corresponding to interfaces between different proteins. The same applies to the .pdb files in the directory provided as -d argument.


2.3.2. Alignment files

Two alignment files should be provided as input to InterEvScore. The first alignment file (-a1 argument) should correspond to the first considered chain (first character of the -ch argument) and the second alignment file (-a2 argument) to the second considered chain (second character of the -ch argument).

In each alignment file, the first sequence should correspond to the sequence closest to the PDB sequence. A few differences between the first aligned sequence and the PDB sequence can be tolerated but it is best to have as little discrepancy as possible.

The two alignment files should contain the same number of sequences, corresponding to the sequences homologous to the PDB sequence, provided for the same species in the same order. If these conditions are not met, the evolutionary-based scores are likely to make no sense.


The best way to ensure compatibility of the provided alignments with InterEvScore is to generate these alignments using InterEvolAlign. Two input sequences should be provided in FASTA format (typically, the two sequences extracted from the PDB input file or the PDB sequences of the bound structure if the PDB input file is generated from the unbound conformations).

InterEvolAlign can be run with various options; the recommended settings include restriction to the relevant superkingdom and activation of the reciprocal blast procedure (with medium stringency); other options can be left as default (or modified if you feel other values are more appropriate).

If the result of the InterEvolAlign run gives too many sequences or alignments that are visibly unreliable, you may try to restrict the run to only one iteration on the OMA database and/or to activate the reciprocal blast procedure with high stringency.

If the result of the InterEvolAlign run gives too few sequences (or none at all), you may try to add a third iteration on the REF database.



2.4. Output

InterEvScore values will be written as one result line per input PDB file. Each line contains:




2.5. Execution speed and parallelization

If you need to score a large number of decoys, each can be scored independently with InterEvScore which enables easy parallelization of the scoring. Scoring one decoy for a protein of standard size (200-300 residues per chain) with alphashape-based contact calculation (in standard or patches mode) takes approximately one second (indicative real running time on a standard processor). The distance-based mode takes much longer - around 4 to 5 seconds.



3. Analysis of results

3.1. Important information

There are several important elements that should be kept in mind when analyzing the results.



3.2. Clustering analysis

A clustering script is provided which selects the N top ranked decoys (using the 2or3body_best_evol score by default, or any other score value from InterEvScore), calculates the cross lRMSDs for each pair of decoys among these N top ranked decoys, and clusters the N decoys with a default cutoff of 7.5 Å.
The default value for N is 1,000, chosen as a compromise between the inclusion of as many hits as possible and the computing time of cross lRMSDs which increases exponentially with N. The default clustering algorithm is the GROMOS algorithm from Daura et al, Angew Chem Int Ed 1999. Default values are based on the assumption that 54,000 decoys were previously generated with ZDOCK.

If the decoys were generated using a program that leaves one of the two chains fixed (such as ZDOCK), then the alignment step is not necessary (set align to False in the config_clustering.ini file). However, this means that the ligand RMSD will be calculated using the second chain from the command line option -ch as the ligand (this should be the moving chain).


3.2.1. Clustering algorithms

Two clustering algorithms are proposed. The default clustering algorithm is the GROMOS greedy algorithm from Daura et al, Angew Chem Int Ed 1999. It basically works as follows:

The alternative clustering algorithm is the full linkage algorithm: any two decoys with mutual lRMSD value less than cutoff will be clustered together. This means that the clusters in this algorithm correspond to the connected components of the graph. This algorithm will lead to much larger clusters (so you might want to decrease the cutoff); moreover two decoys in the same cluster can have large mutual lRMSD value (the fact that they belong to the same cluster simply means that they are connected through a (potentially long) chain of decoys with mutual lRMSD value less than cutoff).


3.2.2. Running options (command line)

The clustering script should be run from the command line using:
[Python_executable] {InterEvScore_directory}/cluster_InterEvScore_results.py
                                            [running_options]

Example:

python cluster_InterEvScore_results.py -s score_file.txt
				       -c config_clustering.ini
				       -ch AB
				       -d PDB_file_directory/
				       -o clustering_output.txt


If no running options are provided, a help message will be displayed.

The following arguments should be provided:

3.2.3. Settings (config_clustering file)



3.2.4. Input

The score file provided as input to the clustering script should be the output file from InterEvScore.


3.2.5. Output

The clustering output comes in the form of a file where each line corresponds to a different cluster.
The clusters are ordered by decreasing size.

On each line, cluster elements are separated by single spaces and ordered by decreasing score value (in the case of the Daura et al algorithm, the first element on each line is the cluster center i.e. the decoy with highest number of neighbors starting from which the cluster is pulled out; the remaining elements are ordered by decreasing score value). Each element is written in the form [structurename]_[scorevalue]. The structure name corresponds to the structure identifier (first column of score file). The score value corresponds to the score chosen in the config_clustering file.


3.2.6. Execution speed and parallelization

An option for MPI parallelization is provided in the config file (see above). Indeed, the cross-calculation of lRMSD values for 1,000 decoys is very time consuming and parallelization of the clustering process (actually of the lRMSD calculation process, which is the most time-consuming part of the process) is highly recommended. The clustering takes approximately 2 ½ hours for 1,000 decoys on a medium-sized complex; this duration is reduced to 20-30 min when parallelized on 8 processors.

If you plan to test the two clustering algorithms and/or test several cutoff values, it is recommended to set "store" to True in the config_clustering.ini file. This stores the lRMSD calculations in files (located in the same directory as the PDB files i.e. the directory provided with the -d command line option). The next time the clustering script is run, the lRMSD values will not need to be re-calculated. Simple reclustering if lRMSD files are present takes only a few seconds.



4. FAQ

This FAQ will be updated as the questions come along!



5. Reference and contact

If you find this program useful, please refer to:

J. Andreani, G. Faure, R. Guerois - InterEvScore: A novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution (submitted)

Corresponding author: guerois@cea.fr



6. Installation: Pre-required programs, modules and libraries

This section of the help file is intended as a guide to make the installation of the various modules and libraries easier. It is provided "as is" with no warranty.
The required modules and libraries should be obtained directly from the distribution sites. This section is intended for Linux users and has been written based on installation on a system using Linux version 2.6.18 with 64-bit RedHat 4.1.2 (CentOS). Installation was also tested for systems using Debian and Ubuntu server 64-bit distributions.

Some modules and programs are required to run the main program InterEvScore.py. Others are required only to run the clustering analysis script cluster_InterEvScore_results.py.


6.1. Python (Open Source, GPL compatible)

InterEvScore is written as a Python script. In order to run InterEvScore, Python must be installed together with some modules and packages.


6.1.1. Python version

InterEvScore can be executed with Python2.4, Python2.6 and Python2.7.
Compatibility with Python3.x is not guaranteed.


6.1.2. Required Python built-in modules

The following (quite standard) Python libraries should be installed and functional in order to run InterEvScore: sys, os, string, math, tempfile, copy, re, ConfigParser, shutil, time, glob.


6.1.3. networkx (Open source BSD license)

InterEvScore has been developed to run with networkx, a Python software package which allows simple manipulation of graphs. InterEvScore should run smoothly with networkx 1.0, 1.2 or any more recent version.

All necessary information about how to download and install networkx is available from the corresponding website:
http://networkx.lanl.gov/
In particular, networkx can be installed using the easy_install Python module.


6.1.4. mpi4py (for the clustering analysis only) (BSD license)

The clustering analysis can be run in a parallelized manner by setting the proper options in the config_clustering file. This requires the mpi4py module in order to run MPI jobs.

For Debian/Ubuntu users, the mpi4py module can be installed using apt-get. mpi4py can be installed using the easy_install Python module.



6.2. AlphaShape calculation (using CGAL)

In InterEvScore, the contacts as well as the apolar patches are calculated using alpha-shapes. In order to calculate alpha-shapes, the script requires the program AlphaShape to be compiled. This C++ program is built using the CGAL library.

The script was inspired from the examples presented on the CGAL website (3D Alpha Shapes CGAL Documentation by Tran Kai Frank Da, Sébastien Loriot and Mariette Yvinec).

If the AlphaShape program is not present, the InterEvScore script can run in degraded mode using contact calculations based on distances. However, it is highly recommended to compile the AlphaShape program as the degraded version is slower, less efficient and the "patches" mode is not available in this version.


6.2.1. Pre-required library: Installing CGAL (Open Source, GPL license)

This help section was written for CGAL 3.7 installation in CentOS/RedHat. However, more recent versions of CGAL can be used provided the versions of the pre-required modules and programs are adapted.

Note: for Ubuntu/Debian users, CGAL can be installed (together with the corresponding dependencies) using apt-get (e.g. apt-get install libcgal-dev).

CGAL should be downloaded and unpacked from the distribution website.

CGAL is available under an Open Source license. Parts of CGAL are under the LGPL license and other parts are under the GPL license. The Triangulation and AlphaShape packages which we use in the AlphaShape.cpp script are under the GPL license.


Installation of CGAL 3.7 and compilation of AlphaShape with CGAL 3.7 require:

The following environment variables should be defined/modified:

Then CGAL should be installed in the directory where it has been unpacked {CGAL_install_dir} using:
cmake -DCMAKE_INSTALL_PREFIX={CGAL_install_dir}/build ./
make
make install



6.2.2. Compiling AlphaShape

When CGAL is installed, {CGAL_install_dir}/build/lib should be added to the LD_LIBRARY_PATH environment variable and {CGAL_install_dir} to the PATH variable.

Then AlphaShape should be compiled in a directory containing the provided file AlphaShape.cpp.

The CMakeLists.txt should be generated using the script provided by CGAL:
{CGAL_install_dir}/scripts/cgal_create_cmake_script

Finally AlphaShape should be compiled by executing:
cmake ./ 
make

The path for the AlphaShape executable obtained should be set in the config file for InterEvScore (see section 2.2).



6.3. Other external programs

6.3.1. Kalign (GPL license)

Kalign is absolutely necessary to run InterEvScore, as it ensures proper use of the input alignments depending on the sequence of the input PDB files.

The program can be downloaded from the corresponding website:
http://msa.sbc.su.se/cgi-bin/msa.cgi


6.3.2. Naccess (free for academic users, confidentiality agreement)

Note: Naccess is necessary only for the patches mode (not for the standard mode).

Naccess is necessary to run InterEvScore in patches mode, i.e. to include evolutionary information in a specific manner for apolar patches (see section 2.1). Naccess is used to identify surface atoms in order to define surface apolar patches.

Downloading information and details about the program are available directly from the corresponding website:

http://www.bioinf.manchester.ac.uk/naccess/

Naccess is available for free for researchers at academic and non-profit-making institutions provided the confidentiality agreement is signed. Naccess requires csh and fort77.


6.3.3. ProFit (for the clustering analysis only) (registration required)

Note: ProFit is not necessary if the decoys were generated using a program that leaves one of the two chains fixed (e.g. ZDOCK).

ProFit is used to run the clustering analysis script cluster_InterEvScore_results.py when the decoys need to be aligned before ligand RMSD calculation. ProFit calculates the ligand RMSD values for each pair of decoys selected to be clustered.

If the decoys were generated using a program that leaves one of the two chains fixed, then the alignment step is not necessary (set align to False in the config_clustering.ini file). However, this means that the ligand RMSD will be calculated using the second chain from the command line option -ch as the ligand (this should be the moving chain).

Downloading information and details about the ProFit program are available directly from the corresponding website:
http://www.bioinf.org.uk/software/profit/index.html

Registration information is required upon download.

Warning: ProFit requires 32-bit libraries to run properly on 64-bit systems.



7. License

The InterEvScore program and the clustering script associated with InterEvScore are distributed under GNU General Public License (GPL), version 3.0.

This means that InterEvScore is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.


This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Copyright 2013 CEA iBiTec-S/SB2SM/LBSR Jessica Andreani, Guilhem Faure and Raphael Guerois.

For further information please contact Raphael Guerois.


In particular sections 15 and 16 of the GNU GPL apply:

15. Disclaimer of Warranty.

THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.


16. Limitation of Liability.

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.