IUBio

nucleotide consensus search

Michael A Lonetto Michael_A_Lonetto at sbphrd.com
Wed Jun 25 09:03:38 EST 1997


Hi,

If you can put the consensus in the form of a pattern or set of patterns then 
the GCG "findpatterns" program works pretty well, though the speed does not 
compare to BLAST.  

Findpatterns works best if you can define a sequence pattern that is specific, 
but includes all of the "real" binding sites.  It does not allow positional 
weighting, so the best thing to give it is fully conserved residues/IUPAC 
specifications and allowed spacings between conserved groups.  In addition, if 
there are correlated dinucleotide variations they are best searched as separate 
patterns instead of via a single generalization of the pattern.

EG:

CNGGATNA{5,7}TNATCCNG
CNGGTANA{5,7}TNTAGGNG

is much better than:

CNGGNNNA{5,7}TNNNCCNG 

Findpatterns also lets you allow mismatches, but in general for short, 
degenerate patterns without positional weighting this will just decrease your 
signal to noise ratio.  Better to explicitly search additional patterns.

Findpatterns patterns are specified in the same data file format as restriction 
enzymes for "MAP", etc.  Do "fetch pattern.dat" to get an example file you can 
modify with your own patterns.   Do "genhelp findpatterns" to see a description 
of the program.  The pattern syntax is described under "defining patterns" and 
making and using your own pattern files are described under "pattern file" and 
"local data files".  Good luck,

Mike Lonetto
Microbiology Dept.
SB Pharm. R&D
Collegeville, PA 19426

To: info-gcg @ net.bio.net @ INET
From: Nate_Weyand @ HLTHSCI.MED.UTAH.EDU (Nathan Weyand) @ INET 
Date: 24-Jun-97 04:03:04 PM
Subject: nucleotide consensus search

Hi all,
  I have a question about searching for a short nucleotide consensus 
sequence.  The sequence is only 10 nucleotides long and represents a 
protein binding site with in a promoter.  Specifically, I want to search 
the E. coli genome for this consensus sequence.  I have tried a fasta 
and blastn search in GCG without success.  Is it possible to search for 
such a short consensus in GCG databases such as - [nr n Non-redundant 
GenBank+EMBL+DDBJ+PDB sequences]?  Does anybody have any suggestions on 
what settings I need to adjust for a successful search or whether web 
resources exist that can be used for my problem?

Any suggestions would be welcome! Thanks in advance for any help you can 
offer.  I will post a summary of advice I get that leads me in the 
right direction.

sincerely,

Nate Weyand
email:  Nate_Weyand at hlthsci.med.utah.edu








More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net