Hello!
I have installed Fasta version 3.4t24 locally on my RedHat linux
machine, but I hope it's still ok to ask a fasta question.
I'm doing translated alignments of genomic prokaryotic DNA against
protein libraries and I would need fasta as a complement to previous
blastx searches. The problem is that I don't know exactly where in the
query sequence the hit is. I get the hit coordinates in aa and not in
nt which I need. I used the '-n' flag to force nucleotides in the
query sequence but I couldn't spot a difference except that the
letters 'aa' were now 'nt'. Here's an example:
# ./fastx34_t -w 100 -W 100 -H -b 100 -d 100 -Q seq_1-102000.fasta nr
FASTX compares a DNA sequence to a protein sequence data bank
version 3.4t23 March 5, 2003
Please cite:
Pearson et al, Genomics (1997) 46:24-36
Query library seq_1-102000.fasta vs nr library searching nr library
1>>>seq_1-102000 - 19990 aa
vs nr library
.....
And further down in the same file
>>gi|23335944|ref|ZP_00121175.1| COG0188: Type IIA topoisomerase (DNA
gyrase/topo II, topoisomer (883 aa)
initn: 5541 init1: 5472 opt: 5517 Z-score: 5619.8 bits: 1055.5
E(): 0
Smith-Waterman score: 5540; 95.647% identity (97.055% ungapped) in
896
aa overlap (7263-9950:1-883)
7290 7320 7350 7380 7410 7440 7470 7500
7530 7560
seq_1-
VADETNNTGDEQFTPDGSMEPLSPQEADTTDYGLMESGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
.:::
::::::::::::::::::::::::::::::..:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
gi|233
MADE-NNTGDEQFTPDGSMEPLSPQEADTTDYGLMDTGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
10 20 30 40 50 60 70 80
90
And in the other example where I used '-n':
# ./fastx34_t -w 100 -W 100 -H -b 100 -d 100 -n -Q seq_1-102000.fasta
nr
FASTX compares a DNA sequence to a protein sequence data bank
version 3.4t23 March 5, 2003
Please cite:
Pearson et al, Genomics (1997) 46:24-36
Query library seq_1-102000.fasta vs nr library
searching nr library
1>>>seq_1-102000 - 19990 nt
vs nr library
.....
And further down in the same file
>>gi|23335944|ref|ZP_00121175.1| COG0188: Type IIA topoisomerase (DNA
gyrase/topo II, topoisomer (883 nt)
initn: 5541 init1: 5472 opt: 5517 Z-score: 5615.2 bits: 1054.7
E(): 0
Smith-Waterman score: 5540; 95.647% identity (97.055% ungapped) in
896
nt overlap (7263-9950:1-883)
7290 7320 7350 7380 7410 7440 7470
7500 7530 7560
seq_1-
VADETNNTGDEQFTPDGSMEPLSPQEADTTDYGLMESGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
.:::
::::::::::::::::::::::::::::::..:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
gi|233
MADE-NNTGDEQFTPDGSMEPLSPQEADTTDYGLMDTGERIQRKDLQQEMRESYLAYALSVIVERALPDVRDGMKPVHRRVIYAMYDGGYRPDRGYNKCS
10 20 30 40 50 60 70 80
90
So, as you can see there is no real difference in hit-coordinates for
the query sequence, just 'nt' instead of 'aa'. Is there a way of
getting the exact nt coordinates you think?
And if there isn't, can I calculate the nt coordinates exactly? In
this example it's hardly as simple as to multiple by 3.
Any feedback on this problem is greatly appreciated!
Regards,
Marcus
---