FASTA format - proposed max line limit

tendo tendo at fas.harvard.edu
Fri Nov 13 17:05:20 EST 1998


what I wanted to say by pointing out of your program bug is *good designed
data structure will minimize the program error*.
extra blank line after each record is a solution for this.  I understand
that the code you showed was just a part of the program, but if you thought
a little bit further away, you would notice this problem.  post-treatment of
the last data is easily forgotten and it bothers programmers.
giving terminator is also beneficial so that a small perl script can find
out entries containing given key words.

as of commenting, it is REQUIREMENT in my opinion.  researchers who actually
work on sequence data will want to have a comment right with sequence
because it's much easier than handling two separate data file without any
doing so is also beneficial with the combination with above key word search.
say, you want to make a multiple alignment for globin gene.  just collect
the sequences by search for key word 'globin' using GREP (the five-line
program that I showed in the previous article)  then, just remove the
comment lines by

    grep -v '^;'  <commented_version >no_comment_version

then you are ready to make multiple alignment using CLUSTAL W, PHYLIP, etc
 if the comments are in a separate file you'll have to retrieve sequences
one by one manually after you obtain the entry list from reference
sequence - I would write a small program in that case, but it's extra task
which comes with the introduction of new standard, anyway.

As I mentioned and you pointed out, it is true that introduction of another
comment character is a problem to many programs that uses fasta format.
however, t is easily solved by filtering above.
in a new program, the treatment is apparently very easy.  even in a old
programs, you can easily modify the source code (if its open) by adding
comment check code.  this code will be like this

    while (fgets(buffer, LINELEN, stdin)) { /* in original code */

        if (buffer[0] == ';') {
            /* do_nothing */
        } else if (buffer[0] == '>') {
            /* code for identifier line */

this modification is not only harmless to the original code but its also
beneficial to allow extra comments.


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net