> I am trying to change the eukaryotic promoter database (epd29.seq)
> from a FASTA format to a GCG format so GCG can perform FASTA
> searches.
>
The EPD.DAT file distributed by EMBL does not contain sequence information
at all. So it cannot be "formatted" for GCG.
EPD is a list of database entries which are Eukarotic promptors.
The first entry is:
XX
FP Pv snRNA U1 :+S PLN:PVUG1 1+ 352; 17001.098
XX
DO Experimental evidence: 4
DO Expression/Regulation: housekeeping gene
RF PNAS84:9094
The entry code is PLN:PVUG1 . In GCG you can obtain a copy of this entry using
$ FETCH EMBL:PVUG1
If you wish to use the whole EPD "database" with GCG you should do the
following:
a) Make a file of sequence names (FOSN) from the EPD.DAT file. This file
holds all the entry codes. You could write a simple program to parse
the EPD.DAT file to extract all the codes.
eg call the following file EPD.LIS:
This is a FOSN for the Eukarotic promotor databae
..
EM:PVUG1
etc
etc
b) Use the FOSN for database searches. eg: With FASTA when it asks for the
database to search:
Search for query in what sequence(s) (* GenEMBL:* *) ? @EPD.LIS
c) Or - use DATASET to make a separate GCG-readable database. Again, when
it asks for the data:
Assembl DATASET from what sequence(s) ? @EPD.LIS
BEWARE:
If you use the GCG-provided databases, then EMBL is only a subset of
the full EMBL database - avoiding duplications in Genbank. You will need a
full copy of the EMBL database for the above to work correctly.
(Alternatively you will need to cross-identify the Accession numbers,
to identify all the Genbank entry codes from the EMBL codes.
This is not difficult if you start off with a full copy of EMBL and Genbank
and use Peter Rice's GBONLY facility (or modify it) in the GCGUNSUPPORTED
set.)
regards
Cary O'Donnell
*****************************************************************************
AFRC Computing Division JANET : AFRC.ARCB::ODONNELL
West Common INTERNET: ODONNELL at ARCB.AFRC.AC.UK
Harpenden Tel: (+44) 582 762271 ext 229
Herts AL5 2JE Fax: (+44) 582 761710
U.K. (AFRC = Agricultural & Food Research Council)
-----------------------------------------------------------------------------
============================================================================
Here is an extract from the EMBL release notes:
4.2 Eukaryotic Promoter Database (EPD)
EPD provides additional information about eukaryotic promoters which are present
in the main nucleotide sequence database. EPD is maintained and distributed
concurrently with the EMBL nucleotide sequence database.