IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

how to treat gaps in alignments for distance calculations?

Jerry Learn learn at u.washington.edu
Thu Oct 31 12:56:50 EST 2002


As someone who regularly deals with gappy sequences, I would caution against
blindly removing gaps from sequences using an automatic gap deletion
function as is possible with some software. Often gaps in alignments result
from reiterations of repeated elements.  Assigning positional homology to
various repeated elements (that might frequently lost as well as gained) is
impossible to do with any confidence.

For example, in the following:

atcagatagatagatcgagatcgatcagatcgtttagata
ttcagatagatag-------------agatcgtttagata
atcagatagatagatc-------atcagatcgtttagata
atcagataaatagatc-------atcagatcgaatagata
ttcaaatagatagatc-------atcagatcgtttagata
            <----------------->
             <===========>

the safest procedure would be to remove all of the repeated elements from
the particular alignment. Thus the region defined by the single-lined,
double-headed arrow (<->) should be deleted, not merely the <=> region.

Jerry Learn
Learn at u.washington.edu

Dept. of Microbiology
University of Washington
Seattle, WA  98195-8070  USA



in article app9ek$qrb$1 at mercury.hgmp.mrc.ac.uk, Mackenzie, BAS at
basm101 at york.ac.uk wrote on 10/30/02 10:46:

> Hello,
>
> As I understand it at present there is no reliable way of taking gaps into
> account in an
> analysis. In the gap regions the alignment is more uncertain - for this
> reason most
> people choose to remove the gap regions from their dataset. You can do this
> manually
> in a program like BioEdit, or you can use PAUP and the exclude command to
> specify
> which columns you want to exclude from your dataset.
>
> Hope that helps,
>
> basm101
> University of York
>
> Tilman Lamparter wrote:
>
>> How are gaps to be treated when aligned protein sequences are taken to
>> obtain distance matrices? Should the regions be excised in all sequences?
>> I use the Phylip protdist program with Jones-Taylor-Thornton model or
>> Dayhoff PAM matrix. I always get different results when alignments with
>> and without gaps are compared.
>>
>> --
>> Tilman Lamparter
>> Freie Universitaet Berlin, Pflanzenphysiologie
>> Koenigin Luise Str. 12-16, D-14195 Berlin
>> e-mail lamparte at zedat.fu-berlin.de
>>
>> ---
>
> ---
>

---




More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net