Fellow Netlanders--
In reply to a discussion on software for accessing Dr. Gosh's Transcription
Factor Database Michael Weise writes:
{text deleted}
>> The TFD is available in the file SITEDATA.GCG (available via anon.
>ftp from ncbi.nlm.nih.gov in /repository/TFD/datasets). Feedback from tech
> {much stuff deleted}
>necessary to first create a TFD.Patterns file (with a format like that in
>GCG's Prosite.Patterns ) and a set of .TFdoc files using the information
>found in SITEDATA.GCG (while the GCG package has a TFsites.DAT file, it
>doesn't contain all the information found in SITEDATA.GCG). In creating
{more stuff deleted}
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> / Michael J. Weise, Ph.D. \ Univ.of Ga. BioScience Computing Facility \
>( weise at bscf.uga.edu \ Dept.of Genetics UGa, Athens GA 30602 )
> \ _ _ _'Tis_only_me_speak'n._ _\_ _ _ _ _ _ _ (706) 542-1409_ _ _ _ _ _ _ /
>
However, when I ftp'ed SITEDATA.GCG over and compared it to our own GCG version
of TFsites.DAT I didn't recognize any differences. VMS DIF saw these:
************
File DISK03:[THOMPSON.TXTFILES.BIONET]SITEDATA.GCG;1
4 Dr. David Ghosh of the National Center for Biotechnology Information,
5 National Library of Medicine, National Institutes of Health, maintains
6 the relational database, TFD, from which the information in this file
7 is derived.
8
******
File GENCOREDISK:[GCGCORE.DATA.MOREDATA]TFSITES.DAT;2
4 This file is derived from the SITES table of TFD, a database
5 of transcription factors maintained by Dr. David Ghosh at the
6 National Center for Biotechnology Information, National Library
7 of Medicine, National Institutes of Health.
8
************
************
File DISK03:[THOMPSON.TXTFILES.BIONET]SITEDATA.GCG;1
1263 Nir_box 0 CGCCATCTGC 0 ! InsfE Mol Cell Biol 8: 2
1264 Nod_box 0 ATCCAAACAATCRATTTTACCAATC 0 ! NodD Genes Dev 2: 28
******
File GENCOREDISK:[GCGCORE.DATA.MOREDATA]TFSITES.DAT;2
1263 Nir_box 0 CGCCATCTGC 0 ! InsfE Mol Cell Biol 8: 2
1264 Nod_box 0 ATCCAAACAATCRATTTTACCAATC 0 ! NodD Genes Dev 2: 28
************
************
File DISK03:[THOMPSON.TXTFILES.BIONET]SITEDATA.GCG;1
1908
******
File GENCOREDISK:[GCGCORE.DATA.MOREDATA]TFSITES.DAT;2
************
Number of difference sections found: 3
Number of difference records found: 6
DIFFERENCES /IGNORE=()/MERGED=1/OUTPUT=DISK03:[THOMPSON.TXTFILES.BIONET]SITES.DIF;1-
DISK03:[THOMPSON.TXTFILES.BIONET]SITEDATA.GCG;1-
GENCOREDISK:[GCGCORE.DATA.MOREDATA]TFSITES.DAT;2
Yet Micheal claims that TFsites.DAT doesn't have as much information as
SITEDATA.GCG. What's going on? Might it be that Micheal's version of
TFsites.DAT is not current? Regardless, Thank's for the tips; we will pursue
the modifications and use MOTIFS as Micheal suggests.
Steve Thompson
Steven M. Thompson
Consultant in Molecular Genetics and Sequence Analysis
VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
Washington State University, Pullman, WA 99164-1224, USA
AT&Tnet: (509) 335-0533 or 335-3179 FAX: (509) 335-0540
BITnet: THOMPSON at WSUVMS1 or STEVET at WSUVM1
INTERnet: THOMPSON at wsuvms1.csc.wsu.edu