New Drosophila data search and retrieval thru Internet network
services are now available from the computer ftp.bio.indiana.edu.
These services are found using Internet Gopher or Wide Area Information
Service (WAIS).
The Gopher services for Drosophila at ftp.bio.indiana.edu
look like this:
Root gopher server: ftp.bio.indiana.edu
1. About-IUBio-Gopher [21Jun92, 3kb].
2. About-New-Features [ 1Nov92, 3kb].
--> 3. Drosophila/
4. Genbank-Sequences/
5. IUBio-Software+Data/
(more...)
Drosophila
--> 1. About Drosophila Gopher [23Feb92, 1kb].
2. Clone database search <?>
3. Cytological features search <?>
4. Drosophila Archive/
5. Drosophila Information Newsletter <?>
6. Drosophila Stocks at Bloomington, USA <?>
7. Drosophila Stocks at Bowling Green, USA <?>
8. Drosophila Stocks at Umea, Sweden <?>
9. Fly worker & GSA address search <?>
10. Flybase/
11. Flybase search <?>
12. Redbook/
13. Redbook search <?>
All of the <?> services are WAIS/Gopher searches of fly data
files that reside in the Drosophila Archive:
Clone database search == search clonelist.txt
Cytological features search == search Amero.txt
Drosophila Information Newsletter == search newsletter issues
Drosophila Stocks ... == search stock lists
Fly worker address search == search Haynie & GSA address files
Flybase search == search Ashburner flybase files
Redbook search == search complete Lindsley & Zimm Genome book
These search services are also available via WAIS client software.
The relevant WAIS source for IUBio archive is:
(:source
:version 3
:ip-address "129.79.224.25"
:ip-name "ftp.bio.indiana.edu"
:tcp-port 210
:database-name "INFO"
:cost 0.00
:cost-unit :free
:maintainer "archive at bio.indiana.edu"
:description "
This WAIS service includes several indexed Biology information sources,
including Genbank nucleic acid gene sequence databank, Drosophila genetics
BioSci/Bionet network news, and others.
")
And the fly wais databases are named:
:database-name "fly-address"
:database-name "fly-amero"
:database-name "fly-clones"
:database-name "fly-din"
:database-name "flybase"
:database-name "flystock-bg"
:database-name "flystock-bl"
:database-name "flystock-um"
:database-name "redbook"
As a reminder, client software for Macintosh, MS-Dos, Unix, VMS and
other computer systems are available for Internet Gopher via
anonymous ftp to boombox.micro.umn.edu, in /pub/gopher, and client
software for WAIS is available via ftp to ftp.think.com. There
are also some of these available via ftp to ftp.bio.indiana.edu,
in /util/gopher and /util/wais directories.
I've modified the WAIS indexing and searching software in several
ways to make it more suitable for biology and genetic data searching.
These modifications include
a) use of symbols, so that queries like "In(4;5)red39" should
work
b) boolean 'and' and 'not' operators to limit a query results
c) partial word searches, such as "hum*" matches human and hummingbird
d) literal phrase searches, such finding "red rooster[45]" exactly
e) output of data file headers (Gopher only so far).
The use of symbols is still somewhat problematic, since WAIS is based
on free text indexing, rather than on indexing of delimited fields
in databases, it needs to use some characters and symbols to delimit
words. I've tried to find a distinction between symbols needed for genetic
"words" and symbols needed for distinguishing words (other than spaces),
but there is some overlap. If you use the literal phrase search,
by enclosing a phrase with symbols in quote (') or double quote (")
marks, you may get better results.
For instance, in some of the fly data files, esp. redbook, "(" and ")"
are used both as genetic symbols and as word delimiters. Thus
searching for
Df(3)something
will generally parse into searching for the three words "Df" "3"
and "something", producing lots of matches. While using a literal
search,
'Df(3)something'
should limit the results to just that phrase.
There are other ways to better index genetic symbols, but they involve
more effort. I'd like to get some feedback first on the usefullness
of this, from the general community of Internet-enabled fly
researchers.
The header file output adds a useful touch. Here is one result
returned from a search of flybase for "ashburner":
This section is from the document '//Drosophila/Drosophila Archive/flybase/ABREFS.TEXT'.
gene-symbol first-author reference
------------ ---------------- -----------------------
Df(2L)ScoR+4 Ashburner, Genetics 126:679
McGill, Genetics 119:647
And a search for "red" produces this:
This section is from the document '//Drosophila/Drosophila Archive/flybase/LOCI.TEXT'.
gene-name-abbrev; full-gene-name
gene-map-position cyto-map-position
function
nucleic-acid-databank-accession-number
ditto-for-species-other-than-melanogaster
protein-database-accession-number
ditto-for-other-species
----------------------------------------------------------
red; red
3-53.6 88B1-88B2
--
Don Gilbert gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405