Software to extract annotation fields from EMBL/GenBank entries.

Brian Robertson b.robertson at ic.ac.uk
Mon Jun 3 08:09:53 EST 1996

The amount of bacterial genome data available as sequenced cosmids of
30-40 kb is increasing rapidly. Our problem is that we need to keep track
of newly discovered genes as they appear, so they can be incorporated into
our research program as appropriate. For this we need to create lists of
probable genes identified in the annotations for each cosmid. This can
then be circulated to laboratory workers.

An example of this kind of annotation is shown below. We would like to
extract the "/note" field, which contains the probable function of the
gene, and create a list of these for each cosmid.

FT   CDS_pept        complement(3043..4155)
FT                   /note="MTCY190.03c, probable anthranilate
FT                   phosphoribosyltransferase, trpD, len: 370, similar to eg
FT                   SW:TRPD_LACCA P17170, (43.2% identity in 308 aa overlap),
FT                   initiation codon uncertain, gtg at 4086 favoured by
FT                   homology but this has no clear ribosome binding site"

Does anyone know of a way of extracting this information from database
entries and creating a list? Is there any software avaialable that has
this as one of its options, or would a shell script be needed?

If a shell script is required, can anyone help with writing one? I'm
afraid it's beyond my capabilities.....

Thanks for your help.

Brian Robertson

Dr. Brian D. Robertson
Dept. Medical Microbiology
Imperial College School of Medicine at St Mary's
Norfolk Place
London W2 1PG, U.K.

b.robertson at ic.ac.uk

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net