In article <3mjnnd$p8o at castle.york.ac.uk>,
richard harrop <rh9 at unix.york.ac.uk> wrote:
>Could anybody out there with experience of sequence analysis please lend a hand!
>I have recently become involved in the Schistosoma mansoni genome project and am
>using the technique of differential display to identify stage-specifically
>expressed gene products. I am producing quite a large number of sequences and
>performing several searches - BlastN, FASTA, TFASTA and BlastX. I realise that it is difficult to give a
>hard and fast figure to be able to say that you have identified a gene, but
>do people have a rule of thumb for any of these searches by which they believe they may have identifies a gene?
>For BlastX for example, does a P value of <0.05 identify a gene??
> Any comments would be greatly received.
> Many thanks,
> Richard Harrop
>Email: rh9 at unix.york.ac.uk
A P value of 0.05 means that if you compare a similar sequence to a
similar database 1 in 20 times you will find a match as good or better
purely by chance. Thus a P value of 0.05 is far from significant.
Also remember that the BLAST P values are telling us a mathmatical
significance, you must decide the biological significance. If you are
trying to identify a putative homolog of a gene sequenced in another
organizm of course the percentage match, and thus the P value, will
depend on the degree of conservation and the distance between the two
species. With that said I generally don't consider matches that are
greater than 10e-20. But you still have to carefully consider the
specifics of the match, is a repeated element present is it just a
particular motif that you have identified, like a homeobox. I try to
look at the gene structure too, are introns in the same place etc. If
you must match one exon of a multi-exon gene then maybe you haven't
identified the gene.