MICHAEL WEISE WEISE at bscf.uga.edu
Fri May 22 10:15:12 EST 1992

> In reply to a discussion on software for accessing Dr. Gosh's Transcription
> Factor Database Michael Weise writes:
> >	The TFD is available in the file  SITEDATA.GCG  (available via anon.
> >ftp from  ncbi.nlm.nih.gov  in  /repository/TFD/datasets).  Feedback from tech
> >necessary to first create a  TFD.Patterns  file (with a format like that in
> >GCG's  Prosite.Patterns ) and a set of  .TFdoc  files using the information
> >found in  SITEDATA.GCG  (while the GCG package has a  TFsites.DAT  file, it
> >doesn't contain all the information found in SITEDATA.GCG).  In creating 
> However, when I ftp'ed SITEDATA.GCG over and compared it to our own GCG version
> of TFsites.DAT I didn't recognize any differences.

> Yet Micheal claims that TFsites.DAT doesn't have as much information as
> SITEDATA.GCG.  What's going on?  Might it be that Micheal's version of
> TFsites.DAT is not current?  Regardless, Thank's for the tips; we will pursue
> the modifications and use MOTIFS as Micheal suggests.
Well, when we setup TF_Motifs, we compared the GCG v.7 file TFsites.DAT to
SITEDATA.GCG and found them to be different (the SITEDATA file contained names
of transcription factors associated with sites, whereas the .DAT file didn't).
In GCG v. 7.1, the two files are identical in content, so Steve is correct in
what he sees with his dif.  However, just having an updated TFsites.DAT does
not provide the capability of using Motifs to analyze for TF sites in NT
sequences.  It is still necessary to have our program read the info in this
file and create the .PATTERNS file and set of .TFdoc files.

Sorry if this has caused problems.  My todo list DOES have an upgrade to 7.1 as
an item; it's just been difficult getting down to it.


PS.  The fun part of all this is that Help at GCG.COM told me that Motifs wouldn't
be able to analyze NT sequences (first info that I've gotten from them which
missed the mark).  What makes it work is: 1) A,T,G,C are valid symbols in both
NT and AA alphabets, and 2) you can expand TFsite.DAT patterns [ex, GGAKGA], so
they don't contain any ambigous NTs, and thus make them look like Prosite
patterns [ex, GGA(G,T)GA ] which Motifs can readily use (this is one of the
things our program does).  In that way, Motifs doesn't know - or care - that
it's working with an NT seq instead of an AA seq.

