Transcription factor software (Steve Thompson's concerns)

MICHAEL WEISE WEISE at bscf.uga.edu
Fri May 22 10:15:12 EST 1992

In <920520165309.20200935 at BOBCAT.CSC.WSU.EDU> THOMPSON at WSUVMS1.CSC.WSU.EDU writes:

> Fellow Netlanders--
> In reply to a discussion on software for accessing Dr. Gosh's Transcription
> Factor Database Michael Weise writes:
> 	{text deleted}
> >
> >	The TFD is available in the file  SITEDATA.GCG  (available via anon.
> >ftp from  ncbi.nlm.nih.gov  in  /repository/TFD/datasets).  Feedback from tech
> >
> 	{much stuff deleted}
> >necessary to first create a  TFD.Patterns  file (with a format like that in
> >GCG's  Prosite.Patterns ) and a set of  .TFdoc  files using the information
> >found in  SITEDATA.GCG  (while the GCG package has a  TFsites.DAT  file, it
> >doesn't contain all the information found in SITEDATA.GCG).  In creating 
> 	{more stuff deleted}
> However, when I ftp'ed SITEDATA.GCG over and compared it to our own GCG version
> of TFsites.DAT I didn't recognize any differences.

{ results of VMS dif deleted }

> Yet Micheal claims that TFsites.DAT doesn't have as much information as
> SITEDATA.GCG.  What's going on?  Might it be that Micheal's version of
> TFsites.DAT is not current?  Regardless, Thank's for the tips; we will pursue
> the modifications and use MOTIFS as Micheal suggests.
> 							Steve Thompson
>                               Steven M. Thompson
>             Consultant in Molecular Genetics and Sequence Analysis
> VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
>            Washington State University, Pullman, WA 99164-1224, USA
>           AT&Tnet:  (509) 335-0533 or 335-3179  FAX:  (509) 335-0540
>                   BITnet:  THOMPSON at WSUVMS1 or STEVET at WSUVM1
>                    INTERnet:  THOMPSON at wsuvms1.csc.wsu.edu

Well, when we setup TF_Motifs, we compared the GCG v.7 file TFsites.DAT to
SITEDATA.GCG and found them to be different (the SITEDATA file contained names
of transcription factors associated with sites, whereas the .DAT file didn't).
In GCG v. 7.1, the two files are identical in content, so Steve is correct in
what he sees with his dif.  However, just having an updated TFsites.DAT does
not provide the capability of using Motifs to analyze for TF sites in NT
sequences.  It is still necessary to have our program read the info in this
file and create the .PATTERNS file and set of .TFdoc files.

Sorry if this has caused problems.  My todo list DOES have an upgrade to 7.1 as
an item; it's just been difficult getting down to it.


PS.  The fun part of all this is that Help at GCG.COM told me that Motifs wouldn't
be able to analyze NT sequences (first info that I've gotten from them which
missed the mark).  What makes it work is: 1) A,T,G,C are valid symbols in both
NT and AA alphabets, and 2) you can expand TFsite.DAT patterns [ex, GGAKGA], so
they don't contain any ambigous NTs, and thus make them look like Prosite
patterns [ex, GGA(G,T)GA ] which Motifs can readily use (this is one of the
things our program does).  In that way, Motifs doesn't know - or care - that
it's working with an NT seq instead of an AA seq.

   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 / Michael J. Weise, Ph.D.    \  Univ.of Ga. BioScience Computing Facility \
(   weise at bscf.uga.edu         \   Dept.of Genetics  UGa, Athens  GA  30602 )
 \ _ _ _'Tis_only_me_speak'n._ _\_ _ _ _ _ _ _ (706) 542-1409_ _ _ _ _ _ _ /

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net