< IinterEvolAlign request
This website is free and open to all users and there is no login requirement.


Kingdom restrictions  
The sequence search can be restricted to one of the three kingdoms, either Bacteria, Archaea or Eukaryotes. Combined selections can be done.
PSI-Blast protocol and database selection  
Choice of the sequence database helps optimizing the selection of species included in the multiple sequence alignments of the interologs.

Four database can be queried with Psi-Blast to retrieve homologs :
- Entire Genomes (OMA) : database of sequences built from the 1109 fully sequenced species used in the OMA database
    and mapped onto the NCBI NR database using Usearch and uclust.
- NR, REF, SWISSPROT : sequence database retrieved from the ncbi.

It is recommended to use the 'Entire Genomes (OMA)' for the first iterations
    and only use the other database for the latest iteration if required to increase the number of sequences aligned.
The chances that orthologs rather than paralogs be retrieved is higher using 'Entire Genomes (OMA)'
    because the first best match from the Psi-Blast query is selected.
Do not iterate if the first iteration already provides a sufficient number of interologs.
Depending the number of homologs retrieved, you can use either SWISSPROT, REF or the NR database in the last iteration to increase the profile size.
Activate the reciprocal blast procedure (slower)  
In some cases, it might be important to clean up the alignment from spurious homologous sequences
which are not related to the rest of the alignment (typically distantly related paralogs captured for some species).
By activating this option, a blast search is ran for every sequences of the final alignment.
From the retrieved homologs, we test whether the sequence retrieved as best blast hit, the other sequences of the alignment
If not, the sequence is discarded. Since this procedure involves a number of blast search, it is much longer
and should not be used for large alignments.
Cleaning up the alignments can also be done manually by removing the species in both interolog alignments
for which a spurious sequence was detected

Two levels of stringency can be applied to remove undesired paralogs :
E-value threshold in Psi-Blast  
Specify the e-value below which sequence can be integrated in the profile used to iterate with Psi-Blast
Redundancy filter: max. seq. identity (as %) within the alignment  
Sequences sharing more than this percentage of identity will be considered as redundant and won't be kept in the final alignment.
Min. seq. identity to any seq. (as %) to select a seq. hit  
This threshold helps controling the sequence divergence inside the alignment.
A sequence is selected in the alignment provided that at least one sequence shares more than the specified threshold in sequence identity.
After Blast step : Min. coverage to query (as %) threshold  
This threshold defines the minimal coverage the target sequence should have with the query.
This coverage filter is applied directly after the Psi-Blast search.
After Muscle step: Min. coverage to query (as %) threshold  
This threshold defines the minimal coverage the target sequence should have with the query.
This coverage filter is applied after the Muscle alignment step, once the limits of the sequences were extended.
Max. number of interolog couples in the alignments  
The option can be used when 2 sequences are queried.
In some cases, there are too many species represented in an alignment. Use this threshold to limit the number of species to be kept.
Max. number of species in a single alignment  
The option can be used when only 1 sequence is queried.
It can be used to limit the number of species retrieved in the alignment.
Max. number of sequence above which Muscle runs in fast option  
Homologous sequences retrieved with Psi-Blast are realigned using Muscle.
Above the indicated number of sequences, the fast alignment protocol wihout refinement is triggered to speed up the alignment process.
Apply sequence extensions after Psi-Blast by N amino acids  
Extend the sequence limits of every match retrieved after Psi-Blast by a given number of amino acids.
Sequence coverage in the sequence alignment can be improved.
The default '1500' means that full-lengths sequences are retrieved. Decrease this value to focus on specific domains.
Min. HHsearch proba. for hit selection  
HHsearch probability threshold used to select homologous chains in InterEvol database
Coverage ratio of the Target to the Query after HHsearch  
Coverage threshold that the target alignment should have with respect to
   the query alignment after HHsearch profile-profile alignment against the InterEvol database.
Coverage ratio of the Query to the Target after HHsearch  
Coverage threshold that the query alignment should have with respect to
    the target alignment after HHsearch profile-profile alignment against the InterEvol database.