IUBio

Major FlyBase Update

Peter Rice pmr at sanger.ac.uk
Wed Aug 16 08:50:34 EST 1995


In article <9508151547.AA27695 at morgan.harvard.edu> flybase-help at MORGAN.HARVARD.EDU (FlyBase Project Members) writes:
>       I am surprised at your report of finding fewer than 7500 genes when
>   you did the SRS indexing.  Something strange must be going on at your
>   end or ours, suggest you send email to flybase-help at morgan.harvard.edu
>   with any additional details.  (I expect the readers of bionet.drosophila
>   do not wish to see all the technical details, though we might wish
>   to inform them of the end result of pinning down whatever the problem is.)
>   All I can tell you is that in the genes.txt file on IUBio there are 9012
>   lines that begin with "*a" and that in the genes.rpt file there are 9012
>   lines that beging with "Gene symbol".  Appreciate your help if there is
>   any problem in what we have provided.

(note for bionet.software.srs readers - this is part of a thread on
bionet.drosophila about SRS indexing of the flybase genes.txt file)

Success at last. The "missing" genes turned out to be those with a "\"
character in the gene name (Dvir\sev for example).

SRS appeared unable to recognize the "\" character, and truncated all
the names. This left many genes (about 1500) with just the species
code, and SRS only counts the number of different names found.

The solution, which took a bit of head scratching, was:

In writing the flybase.sdl file, the "\" character must be escaped
twice (apparently escaping gets checked twice in parsing),
so the definition of an id for FLYGENE has the sequence "\\\\"
to represent a single "\" character.

Having done that (and included a few extra characters that I was missing
before as valid in gene names) I now indeed have 9012 genes indexed.

Many, many thanks to the flybase community, who gathered round to ask
what was happening rather like, well like fruit flies around a ripe
banana I suppose :-)

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr at sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England




More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net