Dear all,
When doing a LogDet analysis it is desirable to use a dataset where the
rate of substitutions across sites is not excessively skewed in any
direction. A big problem is supposed to be the constant/invariable
sites problem. The original descriptions of the method by Lockhart et
al. used parsimony-informative sites only, as this is one way around the
problem. Another way is to mathematically (i.e. by subtraction) reduce
the entries in the diagonal of the 4x4 matrix (Fxy) by a certain amount
(usually the maximum-likelihood proportion of
constant-sites-that-are-not-invariable). And a third way is to plod
through the dataset and physically remove the requisite number of
constant sites and then analyse the new dataset.
My particular problem does not arise until you bootstrap the dataset.
If you use the mathematical method, then you will have to re-calculate
the estimated number of invariable sites for each replicate (using
maximum likelihood) and then remove the requisite number. It would be
wrong to remove, say 15% of constant sites in all replicates, as this
would be too much in some replicates and not enough in others (i.e. some
replicates would have too many invariant sites and some would have too
little).
My thoughts are that the third option of removing the constant sites
before you start is best. Is this right or wrong? There doesn't seem
to be any directions in the literature.
Regards,
James