Sequence formats

James Bonfield jkb at mrc-lmb.cam.ac.uk
Wed Mar 26 03:58:18 EST 1997

In article <33380285.7D6A at nibsc.ac.uk> ajenkins at nibsc.ac.uk (Adrian Jenkins) writes:
>This is probably a controversal topic but...
>On my 'wish-list' for features regarding molecular biology programs, the
>main feature would be a universal format.
>I primarily use the Staden Package for sewquence assembly but GCG for
>additional sequence analysis.

The Staden Package can output the 'staden' format (which isn't used by
us any more, but simply exists for the likes of "fromstaden" style
programs), fasta or Experiment File format. Fasta is pretty generic,
but doesn't include half of the useful information. Consequently it's been
extended in several, probably incompatible, ways. We simply output a
name and sequence.

Our preferred output format is Experiment File. This is more or less
the same as EMBL, except with our own additional line types. (Eg PR
for primer type, TG for tags, etc). For the basic EMBL format the ID,
EN, SQ are the same. I'd except this to be a fairly portable format
(as after all it is one of the primary sequence database formats)
provided programs reading it ignore unknown line types.


PS. We don't currently support GCG format as it appears to be too much
of a moving target!
James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net