IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Announcement: CLEANUP v 1.8 released

Sabino Liuni sabino at area.ba.cnr.it
Fri Dec 6 08:21:56 EST 1996

                        CLEANUP v 1.8

CLEANUP 1.8 using an algorithm based on an "approximate string
matching"  procedure is  able  to  determine  the  overall  degree of
similarity between each pair of  sequences contained in a  nucleotide sequence
database  and  to  generate  automatically   nucleotide  sequence  collections
purified from redundancies.

A key concept in  comparing sequence collections  is  the issue of redundancy.
The production of sequence collections cleaned from  redundancy is undoubtedly 
very useful both in performing statistical analyses and accelerating extensive
database  searchings  on  nucleotide  sequences.   Indeed,  publicly available 
databases contain multiple entries of identical or almost identical sequences.
Performing  statistical  analysis  on  such  biased  data  makes  the  risk of 
assigning high significance to non significant patterns very high. In order to
carry out  unbiased statistical analysis  as well as  more  efficient database
searchings  it is  thus  necessary to  analyse  sequence  data  purified  from
redundancy. Given that a unambiguous definition of redundancy is impracticable
for biological sequence data, in CLEANUP a quantitative description
of  redundancy  is used based on the  measure of  sequence similarity.  A 
sequence  is  considered  redundant  if  it  shows a  degree of similarity and 
overlapping with a  longer sequence in the database  greater  than a threshold
fixed by the user.

CLEANUP 1.8 works both on a GCG environment and on a GCG independent platform.
It has been  compiled and  tested  using DEC Alpha AXP C compiler, Borland C++
4.0    compiler, and, moreover, it is portable over any machine with an ANSI C 
standard compiler.

Giorgio Grillo, Marcella Attimonelli, Sabino Liuni and Graziano Pesole
(1996) CLEANUP: a fast computer program for removing redundancies from
nucleotide sequence databases. CABIOS 12, 1-8

CLEANUP 1.8 program is available on anonymous ftp:
	 address:    area.ba.cnr.it   (
         path:       pub/embnet/software/Cleanup        

For support and any information contact:
	Giorgio Grillo :  giorgio at area.ba.cnr.it
	Sandra Brunetta:  areasb16 at area.ba.cnr.it
	Sabino Liuni:     sabino at area.ba.cnr.it

 Dr. Sabino LIUNI            Area di Ricerca CNR         ^          
 Administrator for the       Via Amendola 166/5          ^          
 Italian EMBnet Node         70126 Bari (Italy)          ^       
                             tel. +39-80-5482176/5482130 ^
                             Fax. +39-80-5484467         ^
              E_mail:sabino at area.ba.cnr.it               ^ 

More information about the Bionews mailing list

Send comments to us at biosci-help [At] net.bio.net