This website is free and open to all users and there is no login requirement.
The sequence search can be restricted to one of the three kingdoms, either Bacteria, Archaea or Eukaryotes. Combined selections can be done.
Choice of the sequence database helps optimizing the selection of species included in the multiple sequence alignments of the interologs.
Four database can be queried with
Psi-Blast to retrieve homologs :
- Entire Genomes (OMA) : database of sequences built from the 1109 fully sequenced species used in
the OMA database
    and mapped onto the NCBI NR database using
Usearch and uclust.
- NR, REF, SWISSPROT : sequence database retrieved from the
ncbi.
It is recommended to use the 'Entire Genomes (OMA)' for the first iterations
    and only use the other database for the latest iteration if required to increase the number of sequences aligned.
The chances that orthologs rather than paralogs be retrieved is higher using 'Entire Genomes (OMA)'
    because the first best match from the Psi-Blast query is selected.
Do not iterate if the first iteration already provides a sufficient number of interologs.
Depending the number of homologs retrieved, you can use either SWISSPROT, REF or the NR database in the last iteration to increase the profile size.
In some cases, it might be important to clean up the alignment from spurious homologous sequences
which are not related to the rest of the alignment (typically distantly related paralogs captured for some species).
By activating this option, a blast search is ran for every sequences of the final alignment.
From the retrieved homologs, we test whether the sequence retrieved as best blast hit, the other sequences of the alignment
If not, the sequence is discarded. Since this procedure involves a number of blast search, it is much longer
and should not be used for large alignments.
Cleaning up the alignments can also be done manually by removing the species in both interolog alignments
for which a spurious sequence was detected
Two levels of stringency can be applied to remove undesired paralogs :
- The 'High Stringency' protocol uses an absolute threshold, imposing every sequence to have a ratio of best blast hit among the sequences of the alignment larger than 60% and to meet the 'Medium Stringency' conditions
- The 'Medium Stringency' protocol uses a relative threshold, removing sequence whose ratio of best blast hits among the sequences of the alignment behave as an outlier (lower than mean_ratio-2*std_dev)
Specify the e-value below which sequence can be integrated in the profile used to iterate with
Psi-Blast
Sequences sharing more than this percentage of identity will be considered as redundant and won't be kept in the final alignment.
This threshold helps controling the sequence divergence inside the alignment.
A sequence is selected in the alignment provided that at least one sequence shares more than the specified threshold in sequence identity.
This threshold defines the minimal coverage the target sequence should have with the query.
This coverage filter is applied directly after the
Psi-Blast search.
This threshold defines the minimal coverage the target sequence should have with the query.
This coverage filter is applied after the
Muscle alignment step, once the limits of the sequences were extended.
The option can be used when 2 sequences are queried.
In some cases, there are too many species represented in an alignment. Use this threshold to limit the number of species to be kept.
The option can be used when only 1 sequence is queried.
It can be used to limit the number of species retrieved in the alignment.
Homologous sequences retrieved with
Psi-Blast are realigned using
Muscle.
Above the indicated number of sequences, the fast alignment protocol wihout refinement is triggered to speed up the alignment process.
Extend the sequence limits of every match retrieved after
Psi-Blast by a given number of amino acids.
Sequence coverage in the sequence alignment can be improved.
The default '1500' means that full-lengths sequences are retrieved. Decrease this value to focus on specific domains.
HHsearch probability threshold used to select homologous chains in InterEvol database
Coverage threshold that the target alignment should have with respect to
the query alignment after
HHsearch profile-profile alignment against the InterEvol database.
Coverage threshold that the query alignment should have with respect to
the target alignment after
HHsearch profile-profile alignment against the InterEvol database.