FASTA format - proposed max line limit

mathog at seqaxp.bio.caltech.edu mathog at seqaxp.bio.caltech.edu
Tue Nov 24 11:58:46 EST 1998

In article <731l4k$8lp$1 at news.fas.harvard.edu>, "tendo" <tendo at fas.harvard.edu> writes:

>For Mathog's solution, actually provides human readability in a sense, 

The sense that it is readable with any text editor or word processor
on any system, and will look approximately the same on all of them.

>dealing with two separate file make it more complex to code for those who
>need reference information.

Agree and disagree.  It depends upon the amount of reference information
that is involved.  If there is very little information, then 80 characters
is enough, if there is a lot of information, then it should be in ASN.1,
Genbank, or some other standard (and machine parseable) format.  There may
be some application around that routinely accesses Genbank in its raw
distribution, but I've never used it.  Instead these sorts of databases are
always (?) processed into some local database format, and accessed from
there.   It's only in the grey area, roughly 80-1000 characters of
reference information, where there is a lot of disagreement, and a lack of
standardization.  It's also in this grey area where the transition from
using the data raw (as a fasta file) to preprocessing (genbank) occurs.

>Anyway, it seems that Mathog and other guys lost interest of this stuff, and
>that change of format provides no benefit for other people, so we should
>just leave it there.

Sitting back and taking notes is not losing interest - there just wasn't 
any need to reply to every reply.

David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech 

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net