CLEANUP v 1.8
CLEANUP 1.8 using an algorithm based on an "approximate string
matching" procedure is able to determine the overall degree of
similarity between each pair of sequences contained in a nucleotide sequence
database and to generate automatically nucleotide sequence collections
purified from redundancies.
A key concept in comparing sequence collections is the issue of redundancy.
The production of sequence collections cleaned from redundancy is undoubtedly
very useful both in performing statistical analyses and accelerating extensive
database searchings on nucleotide sequences. Indeed, publicly available
databases contain multiple entries of identical or almost identical sequences.
Performing statistical analysis on such biased data makes the risk of
assigning high significance to non significant patterns very high. In order to
carry out unbiased statistical analysis as well as more efficient database
searchings it is thus necessary to analyse sequence data purified from
redundancy. Given that a unambiguous definition of redundancy is impracticable
for biological sequence data, in CLEANUP a quantitative description
of redundancy is used based on the measure of sequence similarity. A
sequence is considered redundant if it shows a degree of similarity and
overlapping with a longer sequence in the database greater than a threshold
fixed by the user.
CLEANUP 1.8 works both on a GCG environment and on a GCG independent platform.
It has been compiled and tested using DEC Alpha AXP C compiler, Borland C++
4.0 compiler, and, moreover, it is portable over any machine with an ANSI C
Giorgio Grillo, Marcella Attimonelli, Sabino Liuni and Graziano Pesole
(1996) CLEANUP: a fast computer program for removing redundancies from
nucleotide sequence databases. CABIOS 12, 1-8
CLEANUP 1.8 program is available on anonymous ftp:
address: area.ba.cnr.it (184.108.40.206)
For support and any information contact:
Giorgio Grillo : giorgio at area.ba.cnr.it
Sandra Brunetta: areasb16 at area.ba.cnr.it
Sabino Liuni: sabino at area.ba.cnr.it
Dr. Sabino LIUNI Area di Ricerca CNR ^
Administrator for the Via Amendola 166/5 ^
Italian EMBnet Node 70126 Bari (Italy) ^
tel. +39-80-5482176/5482130 ^
Fax. +39-80-5484467 ^
E_mail:sabino at area.ba.cnr.it ^
OPEN YOUR EYES TO THE WORLD OF BIOCOMPUTING ^