GCG compatible consensus weight matrices avail.

Steve Thompson: VADMS genetics THOMPSON at WSUVMS1.CSC.WSU.EDU
Fri Oct 16 10:44:22 EST 1992

Greetings NetLanders:

In September, 1992 I sent the enclosed message out on the bulletin boards.   
I never got any replies so I decided to tackle the job myself.  The matrices
which I reformatted appear to operate without problem in the GCG FitConsensus
program.  I have deposited the reformatted weight matrices on our VAX cluster
in the ANONYMOUS ftp account for public use.  The enclosed message follows:

start of enclosed text

    Has anyone heard of or done for themselves the conversion of Dr. Bucher's
    weight matrix descriptions of eukaryotic promoter elements to GCG
    Consensus.Csn format?  These matrices are described in _J._Mol._Biol._
    (1990) 212: 563-578.  I am teaching a course this semester on computer
    techniques in molecular biology and would very much like to use this data
    to illustrate the power of weight matrix approaches versus simple
    one-dimensional pattern matching.  I realize that the conversion probably
    is not that difficult, however, if it has already been done, it sure would
    save me some time.

    While on this topic, I've searched LIMB to see if there is a database of
    weight matrix consensus descriptions and found nothing.  Have any of you
    heard of a collection of this type of data?  I do have access to Dr.
    Bucher's EPD database but am especially interested in weight matrix

end of enclosed text
GCG has preassembled consensus weight matrices of the donor and acceptor site
sequences at exon-intron splice junctions for use with FitConsensus available
in their public data files.  However, they do not provide any others;
therefore, I have reformatted the four weight matrix descriptions of eukaryotic
RNA polymerase II promoter elements reported by Bucher (1990) into a form
appropriate for GCG's programs.  Additionally, McLauchlan et al. (1985)
assembled a eukaryotic terminator weight matrix which I have reformatted for
GCG use.  These files have the following names:  TATA.Csn, Cap.Csn, CCAAT.Csn,
GC.Csn and Terminator.Csn.



Bucher, P. (1990). Weight Matrix Descriptions of Four Eukaryotic RNA Polymerase
II Promoter Elements Derived from 502 Unrelated Promoter Sequences. Journal of
Molecular Biology 212, 563-578.

McLauchen, J., Gaffrey, D., Whitton, J. and Clements, J. (1985). The Consensus
Sequences YGTGTTYY Located Downstream from the AATAAA Signal is Required for
Efficient Formation of mRNA 3' Termini. Nucleic Acid Research 13, 1347-1368.


The directory structure and logon information for our anonymous account
follows.  In addition to this Consensus subdirectory, the MolBio directory also
contains the subdirectory Profiles, which, in turn, contains several profile
subdirectories and their associated profile matrices which I deposited last
summer.  Again, thanks for all of the support; I hope this data can be of some
use to you.


Internet address:	bobcat.csc.wsu.edu
         alias:		wsuvms1.csc.wsu.edu

logon as: 		USER ANONYMOUS
password:               your Internet address


path: root/molbio/consensus	(however, this is a VMS site not Unix!)


and README.TXT (this file)


                              Steven M. Thompson
            Consultant in Molecular Genetics and Sequence Analysis
VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
           Washington State University, Pullman, WA 99164-1224, USA
          AT&Tnet:  (509) 335-0533 or 335-3179  FAX:  (509) 335-0540
                  BITnet:  THOMPSON at WSUVMS1 or STEVET at WSUVM1
                   INTERnet:  THOMPSON at wsuvms1.csc.wsu.edu


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net