In article <897062296snz at pdchem.demon.co.uk>, Paul at pdchem.demon.co.uk wrote:
> In article <6l7588$j4n$1 at netnews.upenn.edu>
>alwang at blue.seas.upenn.edu "Al Wang" writes:
>> > Does anyone know what's a good way to take a Brookhaven PDB file, extract
> > the sequence information, and save it in SwissPROT format? If it can
> > also extract the secondary structure features, even better.
> >
>> The program STRIDE by Frishman and Argos, which I obtained
> from
>>http://www.embl-heidelberg.de/argos/stride/stride_info.html>> will generate secondary structure assignments from the
> atomic coordinates in a PDB file
[snip]
Similarly, you can load the structure into RasMol and then type
"show sequence" on the command line to have the sequence displayed
(three letter format, not SwissProt) and you can then cut'n'paste
to your heart's content. Alas, I don't think there's an easy
way to grap the secondary structrure assignments - you'll need a
standalone version of the embedded DSSP algorithm.
Personally I echo a previous post to the effect that it is
much easier to use links to the appropriate protein database.
I believe the Brookhaven WWW site has hot links from their PDB
entries to GenBank/SwissProt or whatever and NCBI's ENTREZ browser
links the 3D structures to sequence files. Many protein sequence
files now seem to incorporate PDB-derived secondary structural
information in their remarks. Of course, you don't have this
luxury if this is a novel PDB file.
When playing with PDB files generated by CHARMM I have used
AWK scripts to extract the residue names from the PDB file lines
containing the C-alpha (CA) atoms and then to convert three letters
to one letter. However, as pointed out in previous posts, most
automatic methods are fooled by multiple chains or gaps so manual
intervention is almost always needed.
Good luck,
Bernard
--
Bernard Murray, PhD
Dept. Cell. Mol. Pharmacol., UCSF, San Francisco, USA