It is starting to become common for people to want to do BLAST searches
with sequences of 200 Kb and upwards.
There are then problems with memory, time to do the search and many
strong matches forcing interesting weak matches in other regions of the
query sequence off the bottom of the list of output alignments.
I am interested in how other sites are approaching this problem.
My initial thoughts on this are that the query sequence should be split
into lengths of maybe 50 Kb with a overlap of maybe 1 Kb. The results
can then be processed to produce a composite MSPcrunch format file which
can be searched with existing scripts. Display scripts can then
reintegrate the alignment results from two or more output files in the
region of interest.
Are there any publicly available solutions to doing BLAST searches with
large sequences, then viewing and manipulating the results?
Has anyone found ways to do BLAST searches with large sequences without
What other problems (and solutions) do people encounter with large
Gary Williams Tel: +44 1223 494522 Fax: +44 1223 494512
mailto:G.Williams at hgmp.mrc.ac.ukhttp://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK