In article <1992Jun23.171414.4745 at jhunix.hcf.jhu.edu>, johnk at jhunix.hcf.jhu.edu (John J Kuszewski) writes:
> I remember a few posts (before the current thread began) about how to
> calculate the "cost" of an insertion during sequence comparisons. In
> general, the cost is of the form a + bx, where a is a constant cost
> for making any insertion, b is the cost per inserted residue, and x is
> the number of residues inserted.
>> This doesn't seem to make a whole lot of sense. An insertion at a site
> on the surface of a protein should be a lot less expensive to make than
> at an interior site.
>> Are there programs that, given a particular structure, will find sequence
> relatives (ala David Eisenberg), using a cost function that varies with the
> three-dimensional position of the insertion? Since residue exposure is
> one of Eisenberg's environment criteria, this should be straightforward to
> implement.
Well, forgive me if I have this upside down, but let's look at the false
positive rate. In the limit, you have a sequence that is *completely
unrelated* to the sequence/structure you are comparing it with. The region
you are scanning is inside, but you are moving it past the outside region
of the test structure. Consequently, your gap weight & extension weight
(or other parameters, however defined) go low. Shall we say, for argument's
sake, that they go to zero? Then you'll get a perfect match between regions
(by inserting lots of *long* gaps) that are completely unrelated.
Plus (go look at a structure) outside is a *very* temporary thing on a protein
surface. For instance: on a beta-strand on the edge of the protein, every
second amino-acid could be an "inside" amino acid. Or in an alpha-helix
that's mostly buried, ~1/4 amino-acids will be outside.
>> It might even make for better alignments.
>All in all, I don't think so.
Adrian Goldman
Goldman at MBCL.Rutgers.Edu