Designer of small interfering RNA
DSIR Help Pages
1. Sequence definition
This part requires a name for your DSIR session in order to keep track.
Enter a name for your design session to identify it precisely (i.e. my_targetname_design_1). This field is mandatory, a message will informed you in case of omission.
Paste your target sequence in FASTA format or click on Browse button to upload your sequence file from your computer (use a file in fasta format).
FASTA format only : your sequence must respect FASTA format. Find a definition of FASTA format here.
If your sequence is in an incorrect format, DSIR will inform you by an error and ask you to check your sequence.
2. Efficacy prediction settings
The first step of the design performed by DSIR relies on the prediction of the efficacy of all possible siRNAs sequence targeting your input sequence. DSIR computes this efficacy thanks to the weights associated with some features (a model that combine a specific nucleotide at a given position and nucleotides motifs) of the sequence, for a detailed description of the model please read: An accurate and interpretable model for siRNA efficacy prediction (Vert, J-P., Foveau, N., Lajaunie, C., Vandenbrouck, Y., BMC Bioinformatics, 2006 7:520).
Depending on your needs, three models are currently available:
- 19nt means a design based on the linear model for 19 nt without 2 nt overhangs that can be further added manually for synthesis (i.e. with symmetric 3'TT overhangs)
- 21nt means a design based on the linear model for 21 nt with 2 nt overhangs belonging to the target sequence
- shRNA means a design based on the linear model for 19 nt with the cloning site (forward and reverse) and internal loop sequences included during the export of the final results
During efficacy prediction, each siRNAs sequences are ranked by their percentage of efficacy. You can adjust the number of displayed siRNA sequences by modifying the "Efficacy threshold" field. Only siRNAs with a predicted efficacy greater than the threshold value are kept and displayed. By default this value is set to 90.
Avoid 4 or more nucleotides runs
siRNA sequences harbouring polynucleotides tracts (i.e 4 occurences of "A" or more) in their content have been shown to lower target gene silencing. This option allows to detect and to filter all siRNA sequences that contains polyN tracts (with N >= 4). By default this option is set to "Yes".
Avoid immunostimulatory motifs
Toll-like receptors (TLRs) expressed in endosomes have been reported to recognize single- and double-stranded siRNA eliciting an interferon response. These TLRs serve as pattern-recognition sensors of specific immunostimulatroy motifs (such as 5'-UGUGU-3' or 5'-GUCCUUCAA-3' in the siRNA guide strand) that should be avoided in siRNA design [Hornung et al., Nat. Med., 2005; Judge et al., Nat. Biotech., 2005]. This option allows to detect and to filter all siRNA sequences that contains these motifs. By default this option is set to "Yes".
siRNA design and efficacy prediction results are reported in this part. Each column can be ranked by clicking on its header. Each siRNA can be selected either for further analysis (i.e. potential off-target search, see section 4) or for export into formatted file (see section 5).
Columns for efficacy prediction:
- "siRNA_id": identifier attributed to each siRNA numbered from the highest DSIR score to the lowest.
- "Pos.": position site where the siRNA guide strand begins in the target sequence (from 5' to 3').
- "SS sequence": sense strand (passenger strand) siRNA sequence from 5' to 3'.
- "AS sequence": antisense strand (guide strand) siRNA sequence from 5' to 3'.
- "Score": predicted efficacy according to the selected model (19 or 21nt)
- "Corrected Score": previous efficacy score minored by the penalties from some intrinsic target features which have been shown to influence siRNA efficacy (Filhol et al., PLoS One, 2012 7:10)).
Columns for siRNA similarity search:
- "OT": number of potential off-targets (namely the number of hits with the chosen mismatch tolerance) found in the selected databanks.
- "SCF Hits": number of SCF (Seed Complement Frequencies) or hits in a 3' UTR transcript databank. The higher this value is the more off-target effects are suspected.
- "#Seqs": number of 3'UTR sequences matched.
- "1 Hit": number of seeds matching a 3'UTR sequence only one time.
- "2 Hits": number of seeds matching the same 3'UTR sequence two times.
- ">=3 Hits": number of seeds matching the same 3'UTR sequence three times or more. The higher this value is, the more off-target effect through miRNA pathway is probable
4. Similarity search
To check how specific your siRNA guide strand sequence, an exact similarity search can be performed with the siRNAs set that satisfy the efficacy threshold. This can be done either by screening mRNA sequence databank for potential off-targets according to a given mismatch tolerance or by computing the seed complement frequencies (SCF) in order to check the complementarity between the seed region of the siRNA guide strand and the 3'UTR of the off-targeted genes.
From the siRNA sequence set, you can perform the similarity search either on all siRNAs satisfying the efficacy threshold or select a subset you want to analyze. For a subset, select rows for each sequence of interest. No selection means that all siRNA sequences will be submitted. To run the similarity click on the "Search bank" button at the end of the page.
The algorithm of similarity search used by DSIR is based on exact pattern matching and allow mismatches. You can set this mismatch tolerance which represents the number of allowed mismatch between your siRNA sequence and sequences from the databank (set 0 if you want a total exact similarity search). By default this value is set to 1. Although the search algorithm is linear with the sequence databank space, you should keep in mind that the computing time increase with this value.
You can choose the bank against which you want to check the specificity. In the current release of DSIR, six databanks are available built from the RefSeq NCBI section and comprise three organisms (Homo sapiens, Mus musculus, Rattus norvegicus and a concatenation of these species for cross-species detection). These databanks are updated daily.
Compute SCF (Seed Complement Frequencies)
It has been reported that complementarity between the 3'UTR regions of the off-targeted gene and the seed region of the siRNA guide strand is critical for off-targeting and that SCF can be used as a predictor for siRNA specificity [Birmingham et al., Nat. Met., 2006]. Select this option if you want to compute SCF. The length of the seed region of the siRNA guide strand can comprises nucleotides 2-7 or 2-8. By default the length of the seed region is set to 7 (position 2-8 nt).
To run the similarity search algorithm, click on the "Search Bank" button.
Filter by off-target
This option allows to filter the siRNA sequences according to the number of potential off-targets sequence. A value set to 1 means that all siRNA sequences matching more than one sequence (normally the proper target gene from which the siRNA is designed for) in the sequence space search are removed from the results page avoiding siRNA sequence that presents hit to unintended sequence.
5. Export results
The results of your DSIR session can be exported by selecting your favorite file format to send your orders to the manufacturers. The supported file format are: csv (for Excel sheet), text and xml.