Fri Jan 27 13:29:55 EST 1995

		Measuring physical map quality using bootstrap resampling


        This program is to assist in measuring the reconstructed physical
	map quality using bootstrap resampling.  

	The theory for reconstruction of chromosomes or chromosome fragments
        ("contig mapping") from a clonal library can be found in

        Cuticchia, A.J., Arnold, J., and W.E. Timberlake. (1992a). The
        use of simulated annealing in chromosome reconstruction experiments
        based on binary scoring. Genetics 132: 591-601

	The program that orders DNA sequences is based on similarity of
        their binary profiles assigned to clones in a library by one of
        several experimental approaches. The algorithm has been used
        to map the Schizosaccharomyces pombe genome, the Aspergillus
        nidulans genome, and a region of Human chromosome IX.
        DNA fragments with a high degree of overlap are expected
        to show a high degree of similarity in their profiles.
        The ordering process is based on minimizing the sum of the linking
        distances between clones as a function of their ordering along the

	This minimization algorithm used here is a new one called random
	cost. It is detailed in 

	Wang, Y., Prade, R. A., Griffith, J., Timberlake, W.E., and Arnold, J.
	(1994) A fast random cost algorithm for physical mapping. PNAS,
	91, 11094-11098

	In bootstrap resampling, probes are randomly resampled with replacement
	for many times (1000 default). This program calculates how often the 
	links in the original reconstructed map reappear in maps under 
	bootstrap resampling. Three such frequencies are calculated; they are:
        (i) how often two clones appear together- C1; (ii) how often a clone or
        one that is equivalent in hybridization profile appear next to each
        other - C2; (iii) how often two clones are within the same island - C3.
        A description of the bootstrap resampling procedure for assessing
        the reliability of a physical map is described in:

        Wang, Y., Prade, R.A., Griffith, J., Timberlake, W.E., and Arnold, J.
        (1994) ODS_BOOTSTRAP: assessing the statistical reliability
        of a physical map by bootstrap resampling.  CABIOS 10: 625-634



	A typical batch file is given as follows:

	$set def [wang.cm.bootstrap]
	$run boot

	The first line is to set the default directory.

	The second line is to run this program

	The third line is the input file name. In the input hybridization file,
	the first line should be the number of probes, clones  and bootstrap
        run numbers.  For all other lines, the first ten columns are reserved
        for the clone, name and the hybridization data should start at any
        column after 10th column. Total length of each 
	line is defined by MAX_BUFFER in the program. The example is
        the clone/probe hybridization matrix for Chromosome IV of
        A. nidulans.

	The fourth line is the probe names file name. It contains the probe
        names in the same order across the columns as in the input file.

	The fifth line is the output file name. The finally reconstructed
        physical map, the statistical confidence statistics from the
        bootstrap run and other statistics are written to this file.

	The sixth line is the name of the file that stores the total linking
        distance for each bootstrap run. This is for tracing how the
        program is running.

	The seventh line is the seed for random number generator.


        The number of clones must be between 1 and 600 for a DEC VAX
	station 4000.

        Filenames (with directory path, if specified) must be
        no longer than 40 characters.

	If the bootstrap run number is not given in the input binary hybridization 
	data file, it will be set to the default number of 1000. 


        The program assembled a physical map of 593  clones
        probed with 115 probes within less than 2 mins on a DEC
	VAX station 4000. Total time for 1000 bootstraps run will take
	30 CPU hours on this workstation.


        The software is only distributed via
        Internet using EMAIL. Please send an EMAIL request to:

                    ARNOLD at BSCR.UGA.EDU

        if you wish copies of the program. I will EMAIL you:

        1) a C program  boot.c;

        2) this documentation file, boot.DOC;

        3) a test input file, boot.dat;

	4) a test probe name file, probe.dat

        4) an example output file, boot.dat; and

        5) a command file, boot.COM.

        This last file is what you would use to submit a batch job in
        the VAX/VMS operating system to generate the file boot.out.


        If you have questions about
        the programs, please contact Yuhong Wang currently located
        at University of Georgia:

                    wang at bscr.uga.edu
        or myself at
                    arnold at bscr.uga.edu


        The programs have been run without modification on VAXstations,
        a DECstation 3100,  a Silicon Graphics IRIS 4D70/GT workstation
        and IBM Risc 6000 workstation.

  . - - - - - - - - - - - Jonathan Arnold - - - - - - - - - - - - - - - .
  |                       Dept. of Genetics,                            |
  |                       University of Georgia                         |
  |                       Athens, Georgia 30602                         |
  | Phone:       (706) 542-1449                                         |
  | messages:    (706) 542-8000                                         |
  | FAX:         (706) 542-3910                                         |
  | Internet:    ARNOLD at BSCR.UGA.EDU                                    |
  . - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - .

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net