Suppose we are considering one gene for simplicity (but the
hypothetical data sets contain info for all other sequences) and
suppose species 1 have two transcripts, A1 and A2. Species 2 has two
transcripts, a1 and a2.
A1 is longer than A2 and a1 is longer than a2.
If we remove the redundancies before running blast, the data sets will
contain A1 and a1 for this specific gene, and reciprocal BLAST may
identify A1-a1 match.
But if we do not remove redundancy before running BLAST, reciprocal
blast may identify A1-a2 and A2-a1 matches. So we have to choose one
pair from the two.
My concern is that when calculating Ka and Ks and such, each pair A1-
a1, A1-a2, and A2-a1 would result in different values. So the analysis
results are sensitive to the detail how computations were carried out.
This is a very simple example, but the difference (removing
redundancies before vs. after BLAST) can results in potentially very
different sets of sequence pairs. (and indeed the results based on
real data are different)
I am new to bioinformatics, and it is likely I am misunderstanding
Thank you for your help.