IUBio

Software for comparing sequence homology from GenBank server?

Brian Foley brianf at med.uvm.edu
Wed Nov 2 16:46:26 EST 1994


Jason L. Buberel (jbuberel at uiuc.edu) wrote:
: Could someone please let me know if there is a simple (straighforward) way
: to ask GenBank (or a similar service) to compare sequence homology for me
: if I give it the names of several related genes (or same gene between
: different species).  We are trying to find conserved areas among the
: cytochrome enzyme genes.  We have played with the GenBank web site, but
: have not figured out how to do this.  Others have assured me that it is
: possible.

	1) Retrieve one of the cytochrome gene sequences from GenBank
	   by sending the ACCESSION number to "retrieve at ncbi.nlm.nih.gov"

	   If you are not familiar with the RETRIEVE server, send a
	   message with just "help" in the text first.  The  server
	   will then send you detailed instructions.

	   If you do not know an ACCESSION number for a cytochrome 
	   gene, use MEDLINE or another resource to look for a 
           publication of a cytochrome sequence.

	2) Send that cytochrome sequence to "blast at ncbi.nlm.nih.gov"
           in the proper BLAST format.  If you are not familiar with
	   the BLAST server, send a "help" message first to get
	   detailed instructions.

	3) The BLAST output will list the entries in the database that
	   are the most similar to the one you queried with.  Along with
	   pairwise alignments.  It does not produce a multiple sequence
           alignment.

	4) RETRIEVE the sequences you are interested in.

	5) Use a multiple sequence alignment program to do the
           multiple sequence alignment.  There are so many programs
           available for multiple sequence alignment that I can't
	   describe them all here.

----------Clipped from an earlier post by Keith Robison-------------------
There is a short description of some of the issues in sequence searching
available via the WWW at

                http://twod.med.harvard.edu/seqanal/

You should certainly read the extremely good review by Altschul et al in
Nature Genetics (6:119-129).  If you can find the recent book
"Biocomputing" (D.Smith, ed) it has an excellent article on the
subject by Steven Henikoff (the WWW reference above contains the
full citations to these and many other good readings).

As some other posters have noted, you should search multiple ways.
Both DNA and protein databases have their drawbacks.  Searching for
matches at the protein level is more sensitive than at the DNA level,
but many genes are not in the protein databases (see Nature Genetics 
7:205-215 or the Rudd et al article in the current issue of TIBS),
so you cannot rely on searches against protein databases (BLASTP, BLASTX,
etc).  

Also, one must be careful in that both the protein and DNA databases
contain various sorts of contaminants (vector sequence, foreign sequence,
repetitive elements, etc) and other anomalies (rearrangements, 
mis-translations, annotation errors, etc)


Keith Robison
Harvard University
Department of Cellular and Developmental Biology
Department of Genetics / HHMI

robison at mito.harvard.edu 
------------------end clip----------------------------------------

 --
********************************************************************
*  Brian Foley               *     If we knew what we were doing   *
*  Molecular Genetics Dept.  *     it wouldn't be called research  *
*  University of Vermont     *                                     *
********************************************************************




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net