FASTA/TFASTA on long sequences

Paul Roy proy at rsvs.ulaval.ca
Tue Aug 28 12:01:43 EST 2001

Dear GCGers:

     I have been doing analyses on files originating from complete and
partial genomes.  While the GenBank files chopped up into 10-15 kb pieces
usually pose no problems, I have noticed that the number of significant
but different alignments found by FASTA and TFASTA diminishes as the
contigs are assembled or when complete genomes are left as long
(ca. 100 kb) files by GenBank, or when long contigs are chopped into
110 kb pieces by BREAKUP.  "Second best" alignments seem to be lost if on
the same strand in the same contig with FASTA, or if in the same reading
frame (even if many kb away) with TFASTA.  I believe this is due to the
algorithm which finds ONE best diagonal.  Has anyone else noticed this and
is there any solution which won't miss significant "second best" alignments 
in long sequences?


 Paul H. Roy                             Phone:  +1 418 654 2705
 Departement de biochimie,FSG            FAX:    +1 418 654 2715
 Universite Laval                        E-mail: proy at rsvs.ulaval.ca
 Quebec, QC  G1K 7P4



