Recursive Automatic Search of MOTif in 3D structures of PROteins
RASMOT-3D PRO: Help
1. Reference motif specification
The motif you search can be any arrangement of 3D coordinates of 3 to 30
residues. However, common structural motif as alpha-helices, beta-sheets or turns
are widely represented in proteins. They are not specific and searching for such
patterns may lead to extremely long computing times. It is then recommended to
search for secondary-structure independent patterns such as binding or catalytic sites or larger motifs such as
super-secondary structure residues combination.
As the search is based on Ca and Cb atoms description, the file has to contain the
coordinates of the Ca and Cb atoms for all the residues, except for glycines for
which all heavy atoms have to be described.
Uploaded file has to be in PDB format and atoms coordinates have to be described
in ATOM records. For more explanation on the PDB format, please see the FAQ.
Download a motif
PDB file example: a Zinc finger motif (extracted from
1G2F)
2. Search files
a) Uploaded files
Files must be in PDB format and coordinates described in ATOM records. Chains are considered as independent scaffolds and NMR structures with multiple models are allowed (only the model with the lowest rmsd will be kept as a solution).
Files must be in PDB format and coordinates described in ATOM records. Chains are considered as independent scaffolds and NMR structures with multiple models are allowed (only the model with the lowest rmsd will be kept as a solution).
Download example PDB files:
b) Non-redundant PDB chain set
Non-redundant PDB chain set list set of sequence-dissimilar PDB polypeptide chains. Four sets of chains of different non-redundancy are available. They are based on a clustering chains into groups according to their amino acid sequence similarities and selecting a representative from each of those groups. Four sets of chains of different non-redundancy are available: BLAST p-value of 10e-7, 10e-40, 10e-80 and 100% sequence identity. For a complete description, see the ncbi non-redundant PDB chain set home page.
Non-redundant PDB chain set list set of sequence-dissimilar PDB polypeptide chains. Four sets of chains of different non-redundancy are available. They are based on a clustering chains into groups according to their amino acid sequence similarities and selecting a representative from each of those groups. Four sets of chains of different non-redundancy are available: BLAST p-value of 10e-7, 10e-40, 10e-80 and 100% sequence identity. For a complete description, see the ncbi non-redundant PDB chain set home page.
3. Proteins selection
a) Equivalence
RASMOT-3D principle is to search for residues exhibiting similar topology as the uploaded reference motif. This search can be restricted to residues with similar physical properties than their equivalent in the reference motif.
3 options are available :
RASMOT-3D principle is to search for residues exhibiting similar topology as the uploaded reference motif. This search can be restricted to residues with similar physical properties than their equivalent in the reference motif.
3 options are available :
- all residues, i.e. no restriction : for each reference motif residue, all the residues are considered as potential equivalent in the compared structure. With this option, only topology is taken into account.
- identical residues, i.e. only same residue as in the reference motif is accepted as equivalent.
- same properties residues : for each reference motif residue, residues belonging to the same groups are considered as potential equivalent in the compared structure.
acidic | Asp, Glu |
basic | Lys, Arg, His |
polar | Asn, Gln, Ser, Thr, Cys, His, Tyr |
non polar | Val, Met, Ile, Leu, Trp, Phe, Ala, Gly, Pro |
aromatic | Tyr, His, Phe, Trp |
AsGlX | Asp, Glu, Asn, Gln |
b) delta-dist
delta-dist value correspond to a maximum deviation criterion between inter-atomic (CA and CB atoms) distances describing the examined set of residues and the reference motif.
If only one of these inter-atomic distances in the examined set of residues differs by more than delta-dist from the corresponding one in the reference motif, the set of residues is rejected.
This value has a great influence on the computing times. It can be seen as a pre-filter for the following. Indeed, in a protein of size N, the number of set of residues to compare to a reference motif of size n is A = N! / (N - n)! e.g. for a protein of 50 residues and a reference motif of 4 residues, number of sets is 5,527,200. If not to large, delta-dist allows for the fast elimination of a great number of the examined sets. In practice, it should be set to 0.5 to 1.5 A, depending on the motif you are searching for.
delta-dist value correspond to a maximum deviation criterion between inter-atomic (CA and CB atoms) distances describing the examined set of residues and the reference motif.
If only one of these inter-atomic distances in the examined set of residues differs by more than delta-dist from the corresponding one in the reference motif, the set of residues is rejected.
This value has a great influence on the computing times. It can be seen as a pre-filter for the following. Indeed, in a protein of size N, the number of set of residues to compare to a reference motif of size n is A = N! / (N - n)! e.g. for a protein of 50 residues and a reference motif of 4 residues, number of sets is 5,527,200. If not to large, delta-dist allows for the fast elimination of a great number of the examined sets. In practice, it should be set to 0.5 to 1.5 A, depending on the motif you are searching for.
c) RMSD
Root Mean Square Deviation is a standard measure of structural distance between coordinate sets. The CA and CB atoms of the examined set of residues are superimposed on the CA and CB atoms of the reference motif. Then RMSD is calculated as the mean distance between the CA and CB atoms of the identified motif and their equivalents in the reference motif. If RMSD is larger than the threshold, the examined set of residues in rejected.
In RASMOT-3D PRO, RMSD have to be small enough to reproduce at best the reference motif topology but large enough to allow for minor deviations. In practice, RMSD value should range from 0.5 to 1.5 A, depending on the motif you are searching for.
Root Mean Square Deviation is a standard measure of structural distance between coordinate sets. The CA and CB atoms of the examined set of residues are superimposed on the CA and CB atoms of the reference motif. Then RMSD is calculated as the mean distance between the CA and CB atoms of the identified motif and their equivalents in the reference motif. If RMSD is larger than the threshold, the examined set of residues in rejected.
In RASMOT-3D PRO, RMSD have to be small enough to reproduce at best the reference motif topology but large enough to allow for minor deviations. In practice, RMSD value should range from 0.5 to 1.5 A, depending on the motif you are searching for.
4. Steric filter
If you want to filter solutions making important steric clashes with a target (for binding motifs), you
have to provide a PDB file of the structure of the target placed correctly relative to the reference
motif.
Once the examined set of residues is superimposed on the motif of reference, RASMOT-3D PRO counts the atoms of the entire scaffolds that inter-penetrate target atoms. Inter-penetration is considered when the distance separating one atom of the scaffold and one atom of the target is below the sum of the corresponding atom radii. If the score associated to that number is to large, the structure is eliminated. The calculated score take into account the importance of the inter-penetration but also the distance to the backbone of the atoms making clashes (i.e. close or at the end of a flexible sidechain, etc...). This score allows for some easily relaxed inter- penetration but reject scaffolds making important steric clashes.
Download example target PDB files:
Once the examined set of residues is superimposed on the motif of reference, RASMOT-3D PRO counts the atoms of the entire scaffolds that inter-penetrate target atoms. Inter-penetration is considered when the distance separating one atom of the scaffold and one atom of the target is below the sum of the corresponding atom radii. If the score associated to that number is to large, the structure is eliminated. The calculated score take into account the importance of the inter-penetration but also the distance to the backbone of the atoms making clashes (i.e. close or at the end of a flexible sidechain, etc...). This score allows for some easily relaxed inter- penetration but reject scaffolds making important steric clashes.
Download example target PDB files:
5. Results visualization
a) Online visualization
Visualized solutions are limited to the 250 lowest RMSD.
For each solution are given the output PDB filename, the protein name, the chain id, the chain size, the best model id, the RMSD and the identity of the residues in the set identified.
For known protein filenames, the protein name is a link to the corresponding PDBsum webpage. Finally, clicking on the output PDB filename opens a separate window with the Jmol online molecular viewer. Reference motif is colored in cyan, identified residues and scaffold in yellow and target in grey.
Visualized solutions are limited to the 250 lowest RMSD.
For each solution are given the output PDB filename, the protein name, the chain id, the chain size, the best model id, the RMSD and the identity of the residues in the set identified.
For known protein filenames, the protein name is a link to the corresponding PDBsum webpage. Finally, clicking on the output PDB filename opens a separate window with the Jmol online molecular viewer. Reference motif is colored in cyan, identified residues and scaffold in yellow and target in grey.
You can access two example results pages of Zinc finger motif search :
- In uploaded PDB files: (with and without zinc finger motif) with delta-dist = 1.0, RMSD = 0.7, target = zinc atom
- In non-redundant pdb chain set (p-value 10e-7), with delta-dist = 1.0, RMSD = 0.7, target = DNA
b) Local visualization
At the bottom of the results page, a link to an archive containing the results table, the superimposed structures and PyMol visualization files is provided.
The results summary file (results.txt) contains one result per line and fields are separated by tabulations. It can then be easily imported and manipulated in spreadsheet programs.
For each solution, two files are available: a PDB file of the superimposed identified scaffold and a PML file, which is a visualization script file for PyMol (a free version of the software can be downloaded at http://delsci.com/rel/099/).
Reference motif is represented in green, identified scaffold in cyan and target, if provided, in grey. The identified protein and the target are displayed as 'cartoons', reference motif and identified set of residues are displayed as 'sticks'.
NB: the solution structure PDB files can be viewed with all the PDB visualization software but provided visualization scripts only works with PyMol.
At the bottom of the results page, a link to an archive containing the results table, the superimposed structures and PyMol visualization files is provided.
The results summary file (results.txt) contains one result per line and fields are separated by tabulations. It can then be easily imported and manipulated in spreadsheet programs.
For each solution, two files are available: a PDB file of the superimposed identified scaffold and a PML file, which is a visualization script file for PyMol (a free version of the software can be downloaded at http://delsci.com/rel/099/).
Reference motif is represented in green, identified scaffold in cyan and target, if provided, in grey. The identified protein and the target are displayed as 'cartoons', reference motif and identified set of residues are displayed as 'sticks'.
NB: the solution structure PDB files can be viewed with all the PDB visualization software but provided visualization scripts only works with PyMol.