Don Gilbert gilbertd at bio.indiana.edu
Mon Feb 3 10:04:42 EST 2003

Andrew Dalke <adalke at mindspring.com> found these problems at the
IUBio SRS service recently:
>  I choose IUBIO (19808101), at
>  Then "TOP PAGE" so I could do a query.
>  Select GENBANK and GENPEPT.  (Both, to be on the safe side)
>  Enter "X61499" in the "Quick Search" box, and press the "Quick Search"
>  button.
>  Here are the hits:
>     GENBANK:AE015854    <-- because of note "similar to GB:X61498 ..."
>     GENBANK:HSCD85703   <-- don't know why there was a match
>     GENBANK:HSPA18H7    <-- don't know why there was a match
>     GENPEPT:AE015854_2  <-- because of note "similar to GB:X61498,
>                                                 GB:X61499 ..."
>     GENPEPT:X61499_1    <-- contains ACCESSION X61499, so this make sense

These errors were the result of my mistakes in dealing with some
of the stress that the growth in GenBank size is putting on our
tools.  SRS is well able to deal with these large and growing
sequence databanks, and Lion Bioscience devotes good effort to
ensure this is so.  However there are Unix-system dependent needs
that my server wasn't able to meet (having > 1024 files open by
one process).  I tried a work-around that didn't work, but it
wasn't obvious until Andrew pointed this out (thank you very

The current release of GenBank is now available at IUBio thru SRS,
and the above query returns these matches
  GENBANK:AE015854     <-- because of note "similar to GB:X61498 ..."
 GENBANK:HSBA18I14    <-- because of note "match ... Em:X61499 .."
 GENBANK:HSNFKBSU     <-- this is accession X61499
 GENPEPT:AE015854_2   <-- because of note "similar to ...
 GENPEPT:X61499_1     <-- contains ACCESSION X61499

My thanks to Martin Hilbers of Lion Bioscience for help with

If you want to provide useful search and retrieval for many of
the large biosequence databanks, I recommend SRS which handles
these well, with ready-made, community updated parsers which are
kept current as databank formats change and new databanks come
into existence (see http://www.lionbio.co.uk/parser/).

I  use SRS not only for providing the public access at IUBio, but
for datamining in the FlyBase and euGenes genome projects, and
many small projects.  We also use SRS as the core
search/retrieval engine in the FlyBase project, as it continues
to provide rapid results to simple and complex user queries of
genome data, with a minimum of fuss to process new and complex
data structures (including XML databases).  I would love to see
Lion Bioscience provide an 'SRS-lite' system that could be
bundled with public bioinformatics data access systems such as
FlyBase, that would have fewer licensing restrictions for
academic use than the current release of SRS (which does have a
no-cost academic license).  Such a tool could encourage more use
of SRS and enhance its full, commercial release as a widely
desired tool.

Somewhere in the future of bioinformatics data access, our
community needs to develop better ways to find and retrieve data
objects, as the growth and dispersion of bio-data is making it
harder for those who want to use this sensibly to find the best,
current data.  SRS is one of those core bio-data access tools
which could help form a basis for such.  In my view, distributed
directories of bio-data could be created, using such common,
standard directory protocols as LDAP and Web Services, and which
are built  on efficient backend systems including SRS, RDBMS,
perhaps Entrez and others.  This could provide a common standard
for searching these distributed directories to retrieve data by
hundreds to millions of records as efficiently and simply,
if not more so, as you now do by  copying and installing databanks locally.
The current web interfaces are useful for the individual
click-and-browse human interface, but are limited when it
comes to computational access to volumes of data.

- Don Gilbert

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu

