Making alignments

Warren Gallin wgallin at gpu.srv.ualberta.ca
Thu Jan 15 12:00:04 EST 1998

In Article <69lc4r$sor at net.bio.net>, "Dr. J.P. Clewley"
<jclewley at hgmp.mrc.ac.uk> wrote:
>What are the pros and cons of using in-frame nucleotide sequence alignments
>comapred with using non in-frame nucleotide alignments for making
>phylogenetic trees?
>For example, I have some sequences with indels, but with conserved motifs
>at either end. Using Megalign (Lasergene) I can make an amino acid
>alignment and then convert it to the equivalent nucleotide alignment,
>export it and use DNAdist (Phylip) to work out the rate of change
>at each codon position, and then use those parameters in DNAdist
>again to give a distance matrix for Neighbor.
>Alternatively, I can align the nucleotide sequences with e.g. Clustal,
>perhaps using the treefile from the above analysis as a guide tree,
>to give an input dataset for DNAdist, and then go to Neighbor.
>The alignments made by these two methods are similar, but not the same.
>If the non in-frame alignment has more aligned postions (columns) is
>it 'better' than the in-frame alignment, which could be argued to be
>more biologically 'real'?

I would argue that if you are looking at coding sequence it is better to
template the nucleic acid sequence onto the amino acid sequence.  Your aim
is to get an alignment in which the residues in each sequence are aligned
with homologous positions in the other sequences.  The amino acid sequence
is generally more robustly aligned, unless one of the proteins evolved by
frameshifts (not usual but you have to keep it in mind).  This becomes more
of an issue the more the sequences have diverged.

If you align nucleotides and do not get a purely in frame alignment, and you
use the nucleic acid sequence for phylogeny reconstruction, you are
implicitly assuming that the apparently conserved amino acid sequences in
fact evolved independently by (possibly radically) different pathways and
processes.  That is not, I think, the most parsimonious case, but if you do
the analysis that way you need to be aware of that inherent assumption.

>(As an aside, are there any public domain alignment programs that can
>make in-frame alignments as the commercial Megalign does?)

I do the amino acid alignments first and then just thread the corresponding
nucleotide sequences onto that backbone using DNA Stacks, a freely available
Hypercard program written by Doug Eernisse. The last web page address that I
have for this program is:

Warren Gallin
Department of Biological Sciences
University of Alberta
Edmonton,  Alberta     T6G 2E9
wgallin at gpu.srv.ualberta.ca

