request for info

Reinhard Dölz Reinhard.Doelz at genedata.com
Thu Sep 16 02:11:31 EST 1999

as I don't know where you are working you might be interested in the
EMBnet resource (www.embnet.org) that is offering bioinformatics and
other fine services in Europe and other countries. For academics, many
opportunities exist to use software (which is frequently based on GCG
even if it's not showing directly) via WWW interfaces over the internet.
In clinical, contract/consulting, for-profit or commercial environments
you will want to access a 'secure' host that gives you (a) a
secure/encrypted connection and (b) makes sure that your data are not
being compromised by attacks and accidental disclosure. www.genedata.com
is a possible provider for this purpose. Prices are varying and depend
on the amount, quality and add-ons available with the subscription.

Furthermore, regardless your academic/industry background, you will want
to look into getting some training before you start or make a final
decision on purchases. This is not necessarily reflecting trainings in
terms of only explaining software tools like BLAST (which is admittedly
also very much needed) but a solid training will also give you advanced
presentations for dealing with issues like (but not limited to) multiple
sequence alignments, statistical tools for motif sampling, and
backgrounds on scope and limitations of bioinformatics in the genomics
age. I have done and I am still doing training courses in bioinformatics
and can clearly see that the demand goes into the direction of more
sophisticated working with bioinformatics tools. Even "only" beginners
can gain significant advantage from sophisticated applications if the
input is prepared carefully, and the output is explained. This
especially holds true if you look into applications of Gene Expression
and Comparative Genomics (see below). 
In order to use bioinformatics in programming mode you will likely need
telnet access to a UNIX host and you will be best working on 'wrappers'
to existing tools rather than working out your own software as long as
you are not doing novel algorithms. To use GCG software with your
programs the place you are subscribing to must also have the source code
licensed. I am not actually sure what your real requirement is 'access'
with respect to 'use GCG' or 'connect to databases'.  The 'databases'
you are referring to may either be the so-called 'flat file databases'
which are rather flat files which are indexed with proprietary/publicly
available programs, or the programs are using powerful database
management systems for sequence analysis(GCG, and other companies, have
products of this kind available commercially).  Despite the transparency
offered to the end user, programmers like you may find it more
attractive to work with larger relational systems rather than flat files
despite the overhead that you are eventually encountering. 

There is also the aspect of processing the result data. It's not
sufficient anymore to write a program that launches all available
programs and makes a nice web page with links to the original output.
What you really want is an intelligent processing of the output of
bioinformatics programs that presents the outcome of potentially huge
computations in intuitive and efficient manner. Genome computing, these
days, tends to produce data rather than catch up with understanding
data. Comparative genomics, such as needed for microbial/anti-infective
research, is such an example where it helps very little to have run
BLAST, FASTA, MOTIFS, etc. etc. against all proteins, and finish work
with presenting the output. What you need there is a database for result
data and novel types of computations, which you will not necessarily
find in the straightforward packages of today. 

The system configuration you will need very much depends on what you
actually want to do. For the installation of GCG (locally) you will want
a large-memory UNIX system with significant (250G+) disk space to cope
with the databases to come. There is also cost to plan for if you need
an extra Internet link - 128k is _minimum_ for database transfers, and
even then it will take very long. For usage of remote resources, a
telnet and WWW client on any system will do (I use a palm-sized Psion
for this type of work if I am on the road). Certainly, you or your
organization have to carefully balance whether you will want to embark
on the cost of ownership for local databases or whether you will find a
provider that gets you the WWW access you seem to referring to. There is
also a novel kind of 'application' that is coming into play in the past
years (JAVA code) that either lives its life as a program that gets its
data upon starting or JAVA code that works with database connections to
interrogate databases. The first type of program can be worked out
locally if you have a reasonable JAVA development environment on your
PC/Mac/UNIX local system, and will use example files before getting
worked into the Webserver environment for realtime use via Internet. The
second type - "JDBC" is the buzzword - has some tricky aspects with
respect to security and database programming and you might want to look
into this at a later point of time.  
Maybe this helps, sorry if it was rather long,  

Dr. Ing. Reinhard Doelz
GeneData AG 

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net