Input Format

Nucleotide sequences must obey the IUPAC standard. The following are authorized:

A T C G U N and also R Y M K S W B D H V.

Protein sequences must be written with the IUPAC standard one-letter code:

A B C D E F G H I K L M N P Q R S T V W X Y Z.

You must use the FASTA format. This format consists of a > sign immediately followed (no space) by a name (only use letters, numbers or underscores).

Below this line, your sequence can be written on several lines. Repeat to put another sequence.

>seq1

ATCTGACGTACGTCGAGGTTTGGC

ATCGACAGAGATAGANAGCT

>seq2

ACGCGCNCUAAC

Note that you cannot mix DNA/RNA sequences with amino acid sequences in the input. However, you can search nucleotide databases AND protein databases at the same time.