test data set wanted

Brian Foley
Mon Sep 21 13:29:04 EST 1998

Oswaldo Trelles wrote:
> ... we have developed a new method to parallel implementation
>  of the DNAml program, a method to construct phylogenetic tree 
>  from DNA sequences (please, see references). We have applied 
>  successfully to the analysis of ribosomal RNA data. Here we 
>  have the interest to apply it to analyse other interesting 
>  data set consisting of large number of DNA
>  sequences and construct phylogenetic tree.
>  Who can point out to us such data sets?

There are over 30,000 sequences from primate immunodeficiency
viruses (HIV-1, HIV-2 and various SIVs).  The complete genomes
are roughly 10,000 bp in length.  All are apparenly derived
from a common ancestor, a lentivirus.

The Gag, Pol, Env and other genes from the primate
letiviruses can be reasonably aligned.  The LTR and
other non-coding regions cannot be unambiguously aligned.

Attempting to build a phylogenetic tree from 30,000 or
even 3,000 sequences would be a bit rediculous.  But
it is not uncommon for us to want to build a tree from
50 to 300 sequences.  For example we have nearly 90 complete 
genomes from HIV-1 isolates (The LTRs from within one type
of immunodeficiency virus can be aligned, it is only
aligning HIV-1 to HIV-2 or to SIVs that is difficult).

For an example of a data set, see the alginments at:

Select HIV-1 env DNA to see an alignment of the envelope 
genes from about 220 different HIV-1 isolates.  The alignments
can be downloaded in Intelligentetics or FASTA formats, or
viewed as printable text.

