readseq version2 may or may not help:
http://iubio.bio.indiana.edu/soft/molbio/readseq/java/
It will pull any feature annotations from genbank or embl sequence files.
If you feed readseq a file of many genbank/embl records, it will produce
output with the feature sequence separated by record. If you feed it one
large sequence record (e.g., a chromosome) with many feature annotations,
it will join all the feature sequence into one output record (for that csome).
If you translate to embl/genbank while selecting a feature, the output
documentation will list the join() statements it used to make the output
sequence from the extracted features.
E.g.
fetch ftp://ncbi.nlm.nih.gov/genbank/genomes/S_cerevisiae/Chr01/yst_1.gbk.Z
jre -cp readseq.jar run format=fasta features=gene yst_1.gbk
kalo% jre -cp readseq.jar run format=fasta features=gene -pipe yst_1.gbk
Readseq version 2.0.8 (18 Jan 2000)
>NC_001133 Saccharomyces cerevisiae chromosome I, complete chromosome sequence. 145659 bp
atgatcgtaaataacacacacgtgcttaccctaccactttataccaccaccacatgccat
actcaccctcacttgtatactgattttacgtacgcacacggatgctacagtatataccat
...
kalo% jre -cp readseq.jar run format=embl features=gene -pipe yst_1.gbk
Readseq version 2.0.8 (18 Jan 2000)
ID NC_001133 standard; DNA; PLN; 145659 BP.
XX
..
FH Key Location/Qualifiers
FT gene 335..649
FT /gene="YAL069W"
FT gene 1807..2169
FT /gene="YAL068C"
..
FT extracted_range join(335..649,1807..2169,7236..9017,10092..10400,
FT 11566..11952,12047..12427,21526..21852,24001..27969,
FT 31568..32941,33449..34702,35156..36304,36510..37148,
..
SQ Sequence 145659 BP; 69830 A; 44641 C; 45763 G; 69969 T; 0 other;
atgatcgtaa ataacacaca cgtgcttacc ctaccacttt ataccaccac cacatgccat 60
actcaccctc acttgtatac tgattttacg tacgcacacg gatgctacag tatataccat 120
..
Readsesq will read ncbi's genome section .gbk files, but chokes on the large ones
currently (I think I know the solution..).
In article <38E3826A.6A8CE0AA at nospam.net>, <nospam at nospam.net> wrote:
>Is there an "out of the box" method for producing separate sequence
>lines for each ORF in a genomic sequence? Something suitable for making
>a multisequence fasta file would be nice.
>>Thanks,
>Mike Holloway
>holloway-1 at medctr.osu.edu>
--
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu