Brian Robertson wrote:
>> The amount of bacterial genome data available as sequenced cosmids of
> 30-40 kb is increasing rapidly. Our problem is that we need to keep track
> of newly discovered genes as they appear, so they can be incorporated into
> our research program as appropriate. For this we need to create lists of
> probable genes identified in the annotations for each cosmid. This can
> then be circulated to laboratory workers.
>> An example of this kind of annotation is shown below. We would like to
> extract the "/note" field, which contains the probable function of the
> gene, and create a list of these for each cosmid.
>> FT CDS_pept complement(3043..4155)
> FT /note="MTCY190.03c, probable anthranilate
> FT phosphoribosyltransferase, trpD, len: 370, similar to eg
> FT SW:TRPD_LACCA P17170, (43.2% identity in 308 aa overlap),
> FT initiation codon uncertain, gtg at 4086 favoured by
> FT homology but this has no clear ribosome binding site"
>> Does anyone know of a way of extracting this information from database
> entries and creating a list? Is there any software avaialable that has
> this as one of its options, or would a shell script be needed?
If you have the entries in your own computer, I think the best
solution is make an own program to do it. I have done a similar one
in FORTRAN77 and is not difficult.