The AAT email server at Michigan Tech identifies genes in a DNA sequence
by comparing the query sequence against cDNA and protein sequence databases:
(1) Human_Gene_Index, a database of human cDNA sequences at TIGR,
(2) dbEST, a database of EST sequences at NCBI,
(3) SwissProt, a database of protein sequences at University of Geneva,
(4) nr, a database of non-redundant protein sequences at NCBI.
Author: Xiaoqiu Huang (Email: huang at cs.mtu.edu)
Dept. of Computer Science, Michigan Technological Univ., Houghton, MI 49931
The analysis and annotation tool (AAT) includes two sets of programs,
one for comparing the query sequence with the protein database,
and the other for comparing the query with the cDNA database.
Each set contains a fast database search program and
a rigorous alignment program. The database search program
quickly identifies regions of the query sequence that
are similar to a database sequence. Then the
alignment program constructs an optimal alignment
for each region and the database sequence. The alignment
program also reports the coordinates of exons in the query sequence.
Each alignment program handles the problem of introns.
The DNA-protein alignment program corrects frameshifts.
The AAT tool reduces the labor-intensive work of locating the exons
of the query sequence and improves the process of defining intron/exon
boundaries by using the wealth of available protein and cDNA data.
Obtaining Help
To receive information on using the AAT email server,
send a mail message to:
aat at cs.mtu.edu
Put the word 'HELP' on a single line in the body of the mail message.
Examples of Results by AAT
A portion of a DNA-protein alignment:
Top sequence is the query and bottom one is a database sequence.
Accession: SP|P32198|CPT1_RAT MITOCHONDRIAL CARNITINE PALMITOYLTRANSFERASE I
Score: 2003 Identity: 482/773 (62%) Strand: plus
Script M A E A
62479 GCGCCCGCGCACCCATCTGCCCCCGTCCTAGGTGCCGACCAACCCCCAGGATGGCGGAAG
--------------------------------------------------::::::::::
1 M A E A
Script H Q A V A F Q F T V T P D G V D F R L S
62539 CTCACCAGGCCGTGGCCTTCCAGTTCACGGTGACCCCAGACGGGGTCGACTTCCGGCTCA
:::::::::::::::::::::::::::::::::::::::::::: ..:::.. :::::::
5 H Q A V A F Q F T V T P D G I D L R L S
Script R E A L K H V Y L S G I N S W K K R L I
62599 GTCGGGAGGCCCTGAAACACGTCTACCTGTCTGGGATCAACTCCTGGAAGAAACGCCTGA
::. ::::::::::::.. ... .::::::::: .. ..:::::::::::: . :
25 H E A L K Q I C L S G L H S W K K K F I
EXON 1 62529 62669 CONFIDENCE: 100 66
Script R I K
62659 TCCGCATCAAGGTGCGCACAGGTGCTTCTCCCAGAGCGTAGGCAGAGGCCGGCTGTCAGC
::::: ..:::-------------------------------------------------
45 R F K
Script
62719 TGTTAAGCGCTTTGTTAGGGTCCCTCACTGCCTCCTTGGCTGGCACTTCTGCCCGGTACA
------------------------------------------------------------
48
Script
62779 GGTTGTGGAAGTACAGACACCAGAGGGGTGCACAGGATGTGGTCGGACACAGGGAGCTGT
------------------------------------------------------------
48
Script
62839 GGGTGTGGCGGAGGAAGGAGCACAGCAGGGCATCAGGAGAGAAAGCCTTCCAGGCCAAGA
------------------------------------------------------------
48
Script
62899 CCAGGAGCCAGTTCCCAAGACTTCACAGGCAGGCTAACCTCCCGCCTTCCGGCTCCATAA
------------------------------------------------------------
48
Script N G I L R G V Y P G S P T
62959 GGGCGCCTGTTTCTGCCCACAGAATGGCATCCTCAGGGGCGTGTACCCTGGCAGCCCCAC
----------------------::::::::: ... .::::::. .:::. .. .:::.
48 N G I I T G V F P A N P S
Script S W L V V I M A T V G S S F C N V D I S
63019 CAGCTGGCTGGTCGTCATCATGGCAACAGTGGGTTCCTCCTTCTGCAACGTGGACATCTC
.::::::::: ..::: .. ... . . . ..::: . ... :::::: .::
61 S W L I V V V G V I S S M H A K V D P S
EXON 2 62981 63120 CONFIDENCE: 60 40
Script L G L V S C I Q R C L P Q G
63079 CTTGGGGCTGGTCAGTTGCATCCAGAGATGCCTCCCTCAGGGGTAAGGAGTGAAACTGGA
::::::: .. .. . ::: .::: .::: . . ------------------
81 L G M I A K I S R T L D T T
A portion of a DNA-cDNA alignment:
Top sequence is the query and bottom one is a database sequence.
EXON 1 12304 12896 CONFIDENCE: 100 100
12853 TTCACAGACTTCTACGTGCCTGTGTCTCTGTGCACACCCTCTAGGTAAAGAGGGGGCCGC
||||||||||||||||||||||||||||||||||||||||||||----------------
549 TTCACAGACTTCTACGTGCCTGTGTCTCTGTGCACACCCTCTAG
12913 GCCTCTTCCCCGCCCCGACCCTCCATCCCTTTCCTCCCAATGGATTGCAGGGGGGCGGGA
------------------------------------------------------------
593
12973 AAAACGTCTGTCTCTCTCTCTAGGGAAGGCCACATTTCTGTCTGTCTCAGGGACTCTGTG
------------------------------------------------------------
593
13033 ACTTGTCCCGCAGGGCCGCCCTCCTGACCGGCCGGCTCCCGGTTCGGATGGGCATGTACC
-------------|||||||||||||||||||||||||||||||||||||||||||||||
593 GGCCGCCCTCCTGACCGGCCGGCTCCCGGTTCGGATGGGCATGTACC
13093 CTGGCGTCCTGGTGCCCAGCTCCCGGGGGGGCCTGCCCCTGGAGGAGGTGACCGTGGCCG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
640 CTGGCGTCCTGGTGCCCAGCTCCCGGGGGGGCCTGCCCCTGGAGGAGGTGACCGTGGCCG
13153 AAGTCCTGGCTGCCCGAGGCTACCTCACAGGAATGGCCGGCAAGTGGCACCTTGGGGTGG
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
700 AAGTCCTGGCTGCCCGAGGCTACCTCACAGGAATGGCCGGCAAGTGGCACCTTGGGGTGG
13213 GGCCTGAGGGGGCCTTCCTGCCCCCCCATCAGGGCTTCCATCGATTTCTAGGCATCCCGT
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
760 GGCCTGAGGGGGCCTTCCTGCCCCCCCATCAGGGCTTCCATCGATTTCTAGGCATCCCGT
EXON 2 13046 13286 CONFIDENCE: 100 96
13273 ACTCCCACGACCAGGTAGGAACCACCCGGGCCCTCAGCCACCCTCCCACCTCCCAAAGTC
||||||| ||||||----------------------------------------------
820 ACTCCCAyGACCAG
13333 CCCCAGCCCTTGATGCTCCCGCAGCCCCACCTGCCAGCCCAGCCCTCACGGCAGCTGCCC
------------------------------------------------------------
834
13393 GCCTCAGGGCCCCTGCCAGAACCTGACCTGCTTCCCGCCGGCCACTCCTTGCGACGGTGG
-------||||||||||||||||||||||||||||-||||||||||||||||||||||||
834 GGCCCCTGCcAGAACCTGACCTGCTTCC gCCGGCCACTCCTTGCGACGGTGG
13453 CTGTGACCAGGGCCTGGTCCCCATCCCACTGTTGGCCAACCTGTCCGTGGAGGCGCAGCC
|||||||||||||||||||||-||||||||||||||||||||||||||||||||||||||
886 CTGTGACCAGGGCCTGGTCCC aTCCCACTGTTGGCCAACCTGTCCGTGGAGGCGCAGCC
13513 CCCCTGGCTGCCCGGACTAGAGGCCCGCTACATGGCTTTCGCCCATGACCTCATGGCCGA
|---|||||-|||||||||||||||||||||||||||||||||-||||||||||||||||
945 C tGGCT cCCGGACTAGAGGCCCGCTACATGGCTTTCGCC aTGACCTCATGGCCGA
EXON 3 13400 13618 CONFIDENCE: 96 96
13573 CGCCCAGCGCCAGGATCGCCCCTTCTTCCTGTACTATGCCTCTCACGTAAGTGATCTTGG
||||-|||||||||||||||||||||||||||||||||||||| ||--------------
1000 CGCC aGCGCCAGGATCGCCCCTTCTTCCTGTACTATGCCTCTmAC