Readseq, version 2, will extract (or remove) any set of features or
fields from Genbank or EMBL format sequence files,
See http://iubio.bio.indiana.edu/soft/molbio/readseq/java
One caveat, this only works for feature ranges within the
given sequence. Where other sequence records are part of range,
they are not included in extraction.
Example to pull all CDS entries from a genbank format file:
jre -cp readseq.jar run -feature=CDS -format=gb out=gbinvcds.gb data/gbinv1a.seq
Output example:
LOCUS AAAAGC 147 bp mRNA INV 28-NOV-1994
FEATURES Location/Qualifiers
CDS join(<31..63,64..177)
/codon_start=1
/product="alpha globin"
/db_xref="PID:g402359"
/translation="INRKISGDAFGSIIEPMKETLKARMGSYYSDDVAGAWAALIGVVQAAL"
extracted_range join(<31..63,64..177)
/note="Range of sequence extracted from original, due to feature
selection. Feature locations are not valid for this
sequence, but for original."
ORIGIN
1 atcaacagga aaatcagcgg tgacgcattc gggtcaatca ttgaaccaat gaaggagaca
61 ctgaaggcca ggatgggcag ttattacagt gatgatgtcg ctggagcatg ggccgctctg
121 attggtgtag ttcaggctgc tttgtaa
//
LOCUS AAABDA 224 bp DNA INV 05-AUG-1992
FEATURES Location/Qualifiers
CDS 1016..1239
/partial
/gene="abd-A"
/codon_start=3
/product="abdominal-A homologue"
/db_xref="PID:g5554"
/db_xref="SWISS-PROT:P29552"
/translation="PNGCPRRRGRQTYTRFQTLELEKEFHFNHYLTRRRRIEIAHALCLTERQIKIWFQN
RRMKLKKELRAVKEINEQ"
extracted_range 1016..1239
/note="Range of sequence extracted from original, due to feature
selection. Feature locations are not valid for this
sequence, but for original."
ORIGIN
1 gtcccaacgg atgcccgcgt cgacgaggcc ggcaaacgta cacccgcttc cagacgctcg
61 agctggagaa agagttccac ttcaaccact acctgacccg gcgacggagg atcgaaattg
121 cgcacgccct gtgtctgacc gagcggcaga tcaaaatctg gttccaaaat cgccggatga
181 agctgaagaa ggaactgcgg gcggtgaagg aaattaacga acag
//
...
--
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu