making a "true" est consensus

Dr. Rob Miller rmiller at house.med.und.ac.za
Tue Dec 24 17:38:09 EST 1996

Much thanks to those who have replied to suggest that we would do 
well to check the other data bases of consensus ESTs -- we're 
hoping to improve on them !  

We use UniGene and BodyMap to benchmark our clusters.  Our effort is 
to create a database of EST consensus sequences that can be searched by 
sequence rather than starting with tissue or clone information, but we'd 
very much prefer that someone be able to find the correct EST
when they search with the complete sequence (and eventually get a nice 
alignment with a big gap between the 5' and 3' fragments in the
On the other hand, we need to be able to find the right 3' region for a 
hit on an associated 5' consensus, but the database submission format 
doesn't appear to handle specific linkages for multiple alignments 
together with non-specific linkages for clone-related fragments.  We 
believe the best approach will be to submit `artificially linked' 
3' and 5' consensus sequences where appropriate, but we are concerned 
about what the best format of the linker region should be with respect 
to the variety of alignment software/algorithms out there. 

Still looking forward to any hints (or preferences from those of you
whose software may have to deal with our decisions in coming years ! :-)

	 			Merriest of Christmases to you all,


p.s., sorry the info at sanbi addr below won't work over Christmas, happy
thoughts to you if you can e-mail a copy of your reply to 
rmiller at house.med.und.ac.za I don't always trust newservers to get 
postings all around the world :-)


Dr. Rob Miller wrote:

 Hi there,
 Got some nucleotide sequence alignment/search/database questions for
you :
 How do we link 3' EST to 5' EST fragments from the same clone in order
to make the linked consensus useful for subsequent searching, alignment
and/or translation?
 We're developing a set of EST consensus sequences to submit to a public
database, and naturally we'd like these to be of the greatest utility
possible.  We are thinking about the most useful format for the
 What is the  best way to link data for ESTs which come from the same
clone -- a way that will preferably result in gaps inserted in the
region when someone comes along and searches the database with the
sequence of the full clone ?
 Specifically, we'll be creating artificial consensus sequences from two
EST consensuses, e.g. a 5' EST AAAAAAAAAAAAAA and a 3' EST ZZZZZZZZZ.
 So our questions are:
    * What are the ramifications of
       using NNN's (unassigned) :
       or using ----'s (gap) :
             AAAAAAAAAAAAA-----------------ZZZZZZZZZZZZZZZZ  ???
       between the two sequences ?
     * how many characters would be ideal ?
     * what else could be used ?
 We invite any helpful comments, and feel free to e-mail a copy of
your reply to info at sanbi.ac.za 

XXXXX - make that rmiller at house.med.und.ac.za  thanks ! - XXXXXX

to make certain we see it.
                                  thanks in advance,
Robert T. Miller, Ph.D.                         
rmiller at house.med.und.ac.za

Manager - Durban Satellite - South African National Bioinformatics

Faculty of Medicine / Dept of Virology / University of Natal 
Private Bag 7 / Congella 4013 / Durban / South Africa 
phone +27 (031) 3603743                     fax +27 (031) 3603744 or

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net