FASTA format - proposed max line limit

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Fri Dec 4 20:41:42 EST 1998

In article <73mjou$tpm$1 at news.fas.harvard.edu>, "tendo" <tendo at fas.harvard.edu> writes:
>There is a very good standard that only one comment line which starts with
>'>' character is allowed for each sequence.
>If lengths are really between 80-1000 chars, all you need is to just prepare
>1002 bytes buffer should be enough for reading or 2000 bytes for security.
>It's not a problem at all for any kind of recent computers, is it?

Yes, it is, but not in the sense you meant it.  The fundamental problem
with lines >80 characters is that there is no consistency in how they will
be displayed.  They might wrap, they might truncate, they might be scrolled
off the right hand side of the screen (which an end user might not notice
when scanning quickly through a 100 entry FASTA file with a tool like
"nedit" or "notepad").  There are even a few tools around which will do
nasty things when they encounter overly long "text" records, for instance
EDT on VMS will truncate them to 255 characters. 

>By my understanding, David's proposal is mainly focused on easy handling of
>FASTA fomrat data in programs and compatibility with available programs.

and consistency of display with a variety of tools.

FASTA is a TEXT format, so fasta files should look very much the same with
the widest range of existing text tools.  Long lines are not compatible 
with that goal.


David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net