IUBio

FASTA format - proposed max line limit

tendo tendo at fas.harvard.edu
Mon Nov 16 16:03:54 EST 1998


> I don't think it is the point. keeping comments out of the
> sequence will dramatically increase the speed of both
> sequence similarities and keyword searches. It is not so
> hard to use two files for a sequence, one for the sequence
> itself and one for the comments.


Have you ever measured the speed difference?
I have ever done with FASTA for whole GenBank realease 109 and it was less
than 5% (FASTA 3.1t13 on i686-266MHz Linux).

At least to me, it is HARD to use two files when you have to check each
sequences one by one unless somebody provides nice viewer program which
shows each sequence and comments by just clicking on it.  Well, I developped
that program actually, still it's much easier to handle both comment and
sequence simultaneously, so I chose keeping comment and sequence at the same
place.

Anyway, sequence database is not so-called Database in computer science.
Rather it is descriptive data or even structured document.  Remember, any
programming language allows commenting.  Same for descriptive data.  Say,
HTML and SGML
allows commenting in its data file, simply because it is required.  I
believe sequence data is such a descriptive data.


te






More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net