IUBio

New sequence version numbers

Peter Rice pmr at sanger.ac.uk
Mon Mar 1 09:11:56 EST 1999


timc at chiark.greenend.org.uk (Tim Cutts) writes:

> Has anyone got suitable diffs for the EMBL and GenBank parsers to make
> them cope with the new sequence version numbering as performed by EMBL
> and GenBank?

Tricky. I have what I think is a fix, but I have had problems
with SRS recently (see my reply to the "srsbuild -t" bugfix in
this newsgroup).

The new version numbers appear in the feature table locations.
To get them to parse, it appears the following change is needed in
$SRSDB/ftseq.is

46c46
<                   (/([A-Z]+[0-9]+)[ \n.0-9]*:/ 
---
>                   (/([A-Z]+[0-9]+)[ \n]*:/ 

This allows, and discards, ".1" after an accession number. There is
little point in keeping the version number as you can only get to the
latest version of the entry anyway. Luckily the parser was skipping
white

*** Use this with caution until someone from Lion
*** posts the real solutions

An entry to test is in the latest EMBLNEW updates: DMC171D11
where one feature overlaps with entry DMC65F1

I have a temporary database EMBLTEST at Sanger for trying out parser
changes.

Anybody know when other databases will start using SV instead of NID
in their EMBL references?

-- 
----------------------------------------------------------------------
Peter Rice                | Informatics Division, The Sanger Centre,
E-mail: pmr at sanger.ac.uk  | Wellcome Trust Genome Campus,
Tel: (44) 1223 494967     | Hinxton, Cambridge, CB10 1SA, England
Fax: (44) 1223 494919     | URL: http://www.sanger.ac.uk/Users/pmr/




More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net