Software for identifying new members of gene families

William R. Pearson wrp at alpha0.bioch.virginia.edu
Tue Nov 2 08:54:30 EST 1999

We have developed a simple computational/graphical strategy for
screening nightly updates of Genbank (actually we do the screens
weekly on the 7 nightly updates) for new members of large protein

     Retief, J. D., Lynch, K. R., and Pearson, W. R. (1999)
     Panning for genes - a visual strategy for identifying novel
     gene orthologs and paralogs. Genome Res. 9:373-382.

The software is available free for academic users from

The strategy searches DNA databases using 20 - 60 protein query
sequences, which represent the different known branches of the protein
family, with tfastx3.  The 20 - 60 tfastx3 search results are then
scanned and rearranged and summarized graphically in a way that
greatly simplifies identification of new family members.  For an
example, see:


If you view these pages using the acrobat plug-in to Netscape or IE,
you can click on each panel to see the underlying alignments.

I mention this because several investigators in this field were
unaware of the approach and were scanning for new gene family members
in a more cumbersome fashion.  The program is relatively easy to set
up if you are getting your data from the NCBI (e.g. est_human or
genbank nightly updates) and you have the fasta3 package of programs
from the University of Virginia.  People have had more difficulties
when using other databases, and it will not work with the GCG version

Bill Pearson

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net