IUBio

FASTA glitch.

pmiguel at bilbo.bio.purdue.edu pmiguel at bilbo.bio.purdue.edu
Tue Apr 26 12:13:19 EST 1994


In article <Cotz82.19H at mozo.cc.purdue.edu>, pmiguel at bilbo.bio.purdue.edu writes:
>  (This is about the version of FASTA in GCG.)
>  When a DNA FASTA search displays an alignment in which the query is /REV 
>(i.e., reverse complement, i.e., bottom strand) it numbers it incorrectly.  
>That is, the last base of the sequence becomes "1", the next to the last 
>"2" and so on.  Why hasn't this been fixed over the years?  BESTFIT, for 
>example, doesn't do this.  Why doesn't FASTA display reverse alignments 
>like BESTFIT?  What would it take, an extra 10 lines of code?  It drives me 
>nuts every time I have to do it by hand!
>

  I've been asked why this numbering should be considered incorrect.  If I 
have 1000 bases of sequence in a file, then it must be in one orientation or 
another.  Sometimes the orientation will be arbitrary, sometimes not -- but 
that is not for a program to decide.  If I get a hit on the reverse strand 
I want to know where that hit is according to the numbering scheme (the 
orientation) I'm using.  But FASTA will show an alignment /REV from say 700 
to 900.  But it won't be 700 to 900 on my map, it will be 300 to 100.  I can't 
imagine anyone preferring the numbering used by FASTA.  All it would take 
to change it is to subtract the sequence length from /rev numbers and print 
their absolute values on the alignment.  Right?  Why make me do this 
calculation myself?
  The situation is even more confusing if I'm restricting the database 
search to a sub-section of my sequence -- say 200 to 600.  If I get a /rev 
hit (a hit on the reverse complement strand) from 300 to 350 according to 
fasta, the real region of homology will be from 500 to 450!  It took me 
hours of banging my head against the wall to figure this one out.  If you 
think about it there is no legitimate reason why FASTA should display an 
alignment like this.  It's just an error.  But in the 5 years I've been 
using the program it's not been corrected.  
  
Phillip  
  



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net