robison1 at husc10.harvard.edu (Keith Robison) writes:
:: >Does someone already have a program to convert the results of a
: >NCBI Sequence Server Query into FASTA format?
::: I realize now that I omitted a key qualifier. GenBank queries
: come back in GenBank format, PIR queries in something that looks
: like PIR format but with some extra linefeeds separating field
: labels from data. But the Swiss-Prot returns not only have the
: extra line-feeds, but use different heading names than the
: Swiss-Prot distribution (full names rather than abbreviations).
: I guess the real question is whether these deviations will cause
: problems for various programs designed to read/convert PIR and
: Swiss-Prot formats.
:::: Keith Robison
: Harvard University
: Department of Cellular & Developmental Biology
: Department of Genetics / HHMI
::robison at ribo.harvard.edu
Keith -- You've pointed out something I want to fix over the next couple
weeks. The only reason the PIR, Swiss-Prot, (and EMBL) come out looking
a little unfamiliar is that you are seeing the entries as formatted for
IRX text retrieval.
GenBank entries have been passed through a filter to turn them back into
flat-file style records. I just need to write the filters for the other
databases. So please, if you can be patient a little longer, don't
write converters for the present output.
I also wonder whether as an interim solution, a FASTA type output option
would take care of some of your needs, ie, do you need all of a PIR
record in PIR format or is your principal need to get an identifier line
and the sequence itself. It would be easy to have a field in the mail
message 'FASTA yes' or just 'FASTA' and have the program return just
the FASTA-formatted sequence for any of the databases.
Regards,
Dennis Benson
NCBI