IUBio

How can I get fasta hit coordinates in correct nucleotides

Marcus Claesson m.claesson at student.ucc.ie
Mon Sep 6 05:38:14 EST 2004


Hello!

I have installed Fasta version 3.4t24 locally on my RedHat linux
machine, but I hope it's still ok to ask a fasta question.

I'm doing translated alignments of genomic prokaryotic DNA against
protein libraries and I would need fasta as a complement to previous
blastx searches. The problem is that I don't know exactly where in the
query sequence the hit is. I get the hit coordinates in aa and not in
nt which I need. I used the '-n' flag to force nucleotides in the
query sequence but I couldn't spot a difference except that the
letters 'aa' were now 'nt'. Here's an example:

# ./fastx34_t -w 100 -W 100 -H -b 100 -d 100 -Q seq_1-102000.fasta nr
FASTX compares a DNA sequence to a protein sequence data bank
 version 3.4t23 March 5, 2003
Please cite:
 Pearson et al, Genomics (1997) 46:24-36

Query library seq_1-102000.fasta vs nr library searching nr library

  1>>>seq_1-102000 - 19990 aa
 vs  nr library

.....

And further down in the same file

>>gi|23335944|ref|ZP_00121175.1| COG0188: Type IIA topoisomerase (DNA
gyrase/topo II, topoisomer  (883 aa)
 initn: 5541 init1: 5472 opt: 5517  Z-score: 5619.8  bits: 1055.5
E():    0
Smith-Waterman score: 5540;  95.647% identity (97.055% ungapped) in
896
aa overlap (7263-9950:1-883)

7290      7320      7350      7380      7410      7440     7470   7500
     7530      7560
seq_1-
VADETNNTGDEQFTPDGSMEPLSPQEADTTDYGLMESGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
       .:::
::::::::::::::::::::::::::::::..:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
gi|233
MADE-NNTGDEQFTPDGSMEPLSPQEADTTDYGLMDTGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
10        20        30        40        50       60        70      80 
      90

And in the other example where I used '-n':


# ./fastx34_t -w 100 -W 100 -H -b 100 -d 100 -n -Q seq_1-102000.fasta
nr
FASTX compares a DNA sequence to a protein sequence data bank
 version 3.4t23 March 5, 2003
Please cite:
 Pearson et al, Genomics (1997) 46:24-36

Query library seq_1-102000.fasta vs nr library
searching nr library

  1>>>seq_1-102000 - 19990 nt
 vs  nr library

.....

And further down in the same file


>>gi|23335944|ref|ZP_00121175.1| COG0188: Type IIA topoisomerase (DNA
gyrase/topo II, topoisomer  (883 nt)
 initn: 5541 init1: 5472 opt: 5517  Z-score: 5615.2  bits: 1054.7
E():    0
Smith-Waterman score: 5540;  95.647% identity (97.055% ungapped) in
896
nt overlap (7263-9950:1-883)

7290      7320      7350      7380      7410      7440     7470   
7500      7530      7560
seq_1-
VADETNNTGDEQFTPDGSMEPLSPQEADTTDYGLMESGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
       .:::
::::::::::::::::::::::::::::::..:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
gi|233
MADE-NNTGDEQFTPDGSMEPLSPQEADTTDYGLMDTGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
10        20        30        40        50       60        70       80
       90


So, as you can see there is no real difference in hit-coordinates for
the query sequence, just 'nt' instead of 'aa'.  Is there a way of
getting the exact nt coordinates you think?

And if there isn't, can I calculate the nt coordinates exactly? In
this example it's hardly as simple as to multiple by 3.

Any feedback on this problem is greatly appreciated!

Regards,
Marcus
---




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net