Hi all,
I hope someone can help me.
A colleague recently asked me whether he should expect different
evalues from running a blastx using ESTs as the query against a
protein database or in reverse running a tblastn using the individual
proteins to seed a search against the EST database.
I said yes, as the database size affects the evalue. Which I
understand. He mentioned that the bit scores were different, which
after a lot of thought still has me stumped...
I presumed that the bit scores would be the same as these are
calculated in construction of the HSP, which should be the same,
shouldn't it? Anyways I found examples where it was not. I've pasted
below the HSPs produced from a blastx and tblastn respectively.
1.a.)Why is the alignment longer in the latter, when surely this would
match in the blastx search?
1.b.)From the front of the alignment why is leu-leu identity picked up
in the first HSP but not the second?
2.) Why are the frames different? I presumed that they refered to the
frame in which the nucleotide would be translated whther it was the
query or subject sequence, so shouldn't they be the same?
Many thanks to anyone who can clear this up for me...
>AL0010
Length = 749
Score = 122 bits (307), Expect = 1e-30
Identities = 65/146 (44%), Positives = 88/146 (60%), Gaps = 2/146
(1%)
Frame = +2
Query: 5 LITLVVSAFLIPEVLADPCGDSNWRYFPQTNSCYKLIDENLPWTIAEFKCLFQGAHHVSI
64
L+TL SA V A C D WRY P T CY+ D + W AEF CLF+G H
+S+
Sbjct: 17 LLTLTFSA-----VAASRC-DPGWRYSPFTRKCYRFYDHDTMWPSAEFSCLFKGGHLISL
178
Query: 65 DSPEENQFVHELSRWSE-IWTGAAFFGKDQHYVNSDGSRYGNFENWKDGRKPPMNRARRC
123
S +N+F EL+R +E +W G A FG Y+ SD + Y NFENW + ++P + R
C
Sbjct: 179 HSNADNRFAIELARGAETVWLGNAQFGSSTEYIWSDHTTY-NFENWPNRKRPDKIKNRPC
355
Query: 124 IKMD-GNGEWFQSCCKKKTFTICEKK 148
K++ +GEWFQSCCK+ + ICEK+
Sbjct: 356 TKLNTTSGEWFQSCCKEPSPYICEKE 433
------------------------------------------------------------------------
>F25B4.9 CE09627 status:Confirmed TR:Q22966 protein_id:AAB37085.1
Length = 173
Score = 126 bits (316), Expect = 1e-29
Identities = 67/156 (42%), Positives = 93/156 (59%), Gaps = 7/156
(4%)
Frame = +3
Query: 9 LTLTFSA-----VAASRC-DPGWRYSPFTRKCYRFYDHDTMWPSAEFSCLFKGGHLISLH
170
+TL SA V A C D WRY P T CY+ D + W AEF CLF+G H +S+
Sbjct: 6 ITLVVSAFLIPEVLADPCGDSNWRYFPQTNSCYKLIDENLPWTIAEFKCLFQGAHHVSID
65
Query: 171 SNADNRFAIELARGAETVWLGNAQFGSSTEYIWSDHTTY-NFENWPNRKRPDKIKNRPCT
347
S +N+F EL+R +E +W G A FG Y+ SD + Y NFENW + ++P + R C
Sbjct: 66 SPEENQFVHELSRWSE-IWTGAAFFGKDQHYVNSDGSRYGNFENWKDGRKPPMNRARRCI
124
Query: 348 KLNTTSGEWFQSCCKEPSPYICEKEVSASNANYRNS 455
K++ +GEWFQSCCK+ + ICEK+ + S ++Y S
Sbjct: 125 KMD-GNGEWFQSCCKKKTFTICEKKAAYSASSYSGS 159
---