> Are there any books or references dealing with the efficient representation
> of a protein in a program?
I don't know of any. There are many different choices for
representations, and it depends on what you want to do with the
For instance, consider "bonds". In MD, bonds are simple harmonic
potentials with an equilibrium distance and a sping constant. In other
systems, bonds are only characterized as "single", "double", etc.
You may go for an atom-centric approach and keep a record of all the
bonds in the atom data structure, thus doubling the number of bonds
stored, or store a list of bonds, thus making the lookup of the bonds
coming off of each atom longer.
I've seens some papers on how to store molecular information
"efficiently" in terms of space, but I didn't find them too useful because
they were designed for the days when memory was expensive, so simplicity
was traded off for space.
You'll also have to consider how dynamic you want things to be.
In our C++ visualization code (VMD, http://www.ks.uiuc.edu/Research/vmd/,
free and comes with full source) you can't change the number of atoms
present, their names or their bonds. The problem is we do a lot of
precomputation to figure out things like where the protein backbone is
located. With the ability to remove one of the peptide bonds at any time
we would have to recompute these properties every time something changes.
Similarly, our simulation code (NAMD, http://www.ks.uiuc.edu/Research/namd/)
deals only with parallel MD, so the data structure must maintain inforoamtion
about the angle, dihedral, electrostatics, etc. forces but be easy to
OTOH, for some things, like structure building, you'll need something
that is more dynamic and simpler. I've written perl 4 code to deal with
molecular structures as a simple list of atoms, and that proved rather
useful for those cases.
> Useful suggestions or programming tips are welcome too.
My tip would be to look into one of the packages that already exists
and see if you could use them instead. Are you sure you want to delve
into code? It might be easier to use programs that already do something
similar to what you want. Here's some of the ones I know about:
Our viz. program, VMD, http://www.ks.uiuc.edu/Research/vmd/ has an
interactive scripting language based on Tcl that lets you analyze the
molecule and modify the coordinates, so you could flip a peptide bond.
The only free structure builder I know of is NAMOT from Los Alamos,
but is was designed for nucleic acid work. Perhaps someone else in
this group could point to one for proteins?
There are several programs that can do structure building that cost
a bit of money. Some are: WHATIF at http://swift.embl-heidelberg.de/whatif/
(~250 academic), NAOMI from
http://www.ocms.ox.ac.uk/~smb/Software/N_details/naomi.html, and some
MD programs like XPLOR http://xplor.csb.yale.edu/xplor-info/xplor-info.html
and CHARMm http://yuri.harvard.edu/charmm/charmm.html . Of all of these
I've only one I've really used is XPLOR.
You could also go commercial and get a program like Quanta and Insight,
but I don't think they offer the same level of versatility as the above
programs. There may also be other commercial programs I haven't heard
Saying that, if you want to write structure code, here's a couple places
you might want to look into as well:
libpdb++ is a C++ PDB reader from UCSF's CGL
http://www.cgl.ucsf.edu/cgl/software.html . It is one of the cleanest
versions I've used (including the two I wrote :).
Andrew Martin has a good collection of C code for manipulating structures
at http://www.biochem.ucl.ac.uk/~martin/cdoc/bioplib.html which may prove
useful to you.
If you really want C++ code, the only one I've heard of (besides our own
2) is the ongoing Chimera program from UCSF's CGL which isn't yet available.
dalke at ks.uiuc.edu