IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Heuristic searches

Dr. Andrew G. McArthur mcarthur at onyx.si.edu
Tue Apr 7 13:50:54 EST 1998


Colleagues & Dr. Sikes,

This is indeed a valid and useful method, one that I have been using for
some time.  The bonus in speed is wonderful.  By limiting replicates to
keeping only a small number of trees in memory (I use 20), the random
replicates search gathers a good sampling of the various parsimonious
islands.  These islands have not been searched extensively - tree #21 may
have swapped to a more parsimonious topology.  But, by keeping the trees in
memory and doing exhaustive TBR swapping on them in a second heurstic
search, you ensure that PAUP will find all parsimonious trees instead of
just getting stuck in a single island.  This approach gives you a good
coverage of the parsimony surface while still being robust.

Cheers!
Andrew McArthur
--------------------------------------------
Dr. Andrew G. McArthur, Guest Researcher
University of Western Ontario, Canada
mcarthur at onyx.si.edu, Fax: (519) 439-5981
http://www.geocities.com/CapeCanaveral/8431/

At 10:16 AM 4/7/98 -0500, you wrote:
>To those experienced with heuristic searching using PAUP*:
>
>I have stumbled upon what seems to be a means by which shortest trees can 
>be found far quicker and with high confidence than by the method I 
>previously used.  First the details: I have a large enough data set (76 
>taxa) that I cannot do an exhaustive or Branch& Bound search.  It was 
>taught to me that the most rigorous options for heuristic searching 
>should include at least 100 replicates of randomly started independent 
>searches (with other settings at the PAUP defaults)-
>
>This, even on fast computers, would take many days.  I noticed, however, 
>that much of the time was spent "swapping" tree branches on "islands" of 
>suboptimal length (e.g. I knew that a tree of length 200 was possible but 
>many of the islands found yielded trees no better than 210 or 220 etc.).  
>
>So I examined the PAUP heuristic searching options and one option allows 
>users to limit the number of trees found for a single replicate before 
>starting another replicate.  Therefore, if one's dataset allows many mpts 
>(which take hours to days to swap) one can set a limit of say 100 trees 
>or 50 trees saved per "island".  With this setting PAUP can crank through 
>100 or even 200 replicates in a few hours-(and, of course, if one wants 
>to wait a few days one can get a few thousand replicates completed)
>
>PAUP  will save the shortest trees in memory and once the run is complete 
>you can start the search again specifying that PAUP use the "trees in 
>memory" as the starting trees.  This will result in complete swapping on 
>the "best" island(s) yielding all the mpts of that length (and maybe  
>some shorter...).
>
>The reason I am posting this is that it seems too good to be true- 
>shaving 90% of the time off a large-dataset search must have some serious 
>cons but I'd like to hear from someone who knows what the cons are (and 
>if this has been rigorously tested in a systematic fashion & published)...
>
>Thank You,
>
>Derek Sikes




More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net