FASTA moans and questions

Bill Pearson wrp at cyclops.micr.Virginia.EDU
Fri May 15 11:43:08 EST 1992

In article <9205151608.AA07714 at gserv1> JAB5 at UK.AC.YORK.VAXA writes:
>Dear colleagues,
>  I frequently use FASTA in my work. The latest version has some useful
>improvements, eg. the whole sequence entry is not listed in alignments
>when only a short match is found. However, I'd like to have a bit of a
>moan and to ask some questions...
>1. You used to be able to search sub-sets of the database, eg. B=
>   Bacterial sequences. I found this most useful.
	This has not changed; you can still search subsets of the database
	if they are available in separate files.  The program must be installed
	correctly (the FASTLIBS file must be set up properly) for this to work.

>2. What is the meaning of the message "ignoring..(list of entry codes)"?
	There are some very short sequences (1 amino acid, 3
	nucleotides) in the databases.  They are ignored.

>3. The best matches are listed as entry codes only. You used to get a
>   short descriptor. "M21579" does not tell you much!

	This is another instance of the program not being installed
	correctly.  It sounds like you have a "type 5" (PIR format) file but
	you are searching it as a "type 0" (FASTA format) file.  When you do
	this, you end up treating the descriptive line as amino acid sequence.

>4. Is there a problem with the numbering of long sequences in alignments
>   ?? eg MIPACGA (Embl) is over 100kb. The numbering goes from 99990 to
>   10000 ! Whilst pretty obvious at present, this might prove to be an
>   important fault as longer sequences or melded entries accumulate.

	The latest version of FASTA is set up to handle sequences up
	to 10,000,000 residues without the numbering problem.

>5. In local comparisons (LFasta) why should the order of comparison 
>   sometimes give different answers? ie. it is not intuitive why 
>   a compared to b should not be the same as b to a...

	LFASTA uses a heuristic algorithm that is fast, but not as
	predictable as one might like.  That is why I recommend that LALIGN
	(also included with FASTA) be used when practical.

>Sorry if this sounds a bit negative-in fact I find it a very useful
>and powerful tool but I would like to see the points above addressed.
>Best wishes,
>Jim Brannigan
>Chemistry Dept.
>University of York, UK.

	Bill Pearson

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net