Measuring physical map quality using bootstrap resampling
PROGRAM DESCRIPTION:
This program is to assist in measuring the reconstructed physical
map quality using bootstrap resampling.
The theory for reconstruction of chromosomes or chromosome fragments
("contig mapping") from a clonal library can be found in
Cuticchia, A.J., Arnold, J., and W.E. Timberlake. (1992a). The
use of simulated annealing in chromosome reconstruction experiments
based on binary scoring. Genetics 132: 591-601
The program that orders DNA sequences is based on similarity of
their binary profiles assigned to clones in a library by one of
several experimental approaches. The algorithm has been used
to map the Schizosaccharomyces pombe genome, the Aspergillus
nidulans genome, and a region of Human chromosome IX.
DNA fragments with a high degree of overlap are expected
to show a high degree of similarity in their profiles.
The ordering process is based on minimizing the sum of the linking
distances between clones as a function of their ordering along the
chromosome.
This minimization algorithm used here is a new one called random
cost. It is detailed in
Wang, Y., Prade, R. A., Griffith, J., Timberlake, W.E., and Arnold, J.
(1994) A fast random cost algorithm for physical mapping. PNAS,
91, 11094-11098
In bootstrap resampling, probes are randomly resampled with replacement
for many times (1000 default). This program calculates how often the
links in the original reconstructed map reappear in maps under
bootstrap resampling. Three such frequencies are calculated; they are:
(i) how often two clones appear together- C1; (ii) how often a clone or
one that is equivalent in hybridization profile appear next to each
other - C2; (iii) how often two clones are within the same island - C3.
A description of the bootstrap resampling procedure for assessing
the reliability of a physical map is described in:
Wang, Y., Prade, R.A., Griffith, J., Timberlake, W.E., and Arnold, J.
(1994) ODS_BOOTSTRAP: assessing the statistical reliability
of a physical map by bootstrap resampling. CABIOS 10: 625-634
PROGRAM INPUT
A typical batch file is given as follows:
$set def [wang.cm.bootstrap]
$run boot
boot.dat
probe.dat
boot.out
boot.rec
0
The first line is to set the default directory.
The second line is to run this program
The third line is the input file name. In the input hybridization file,
the first line should be the number of probes, clones and bootstrap
run numbers. For all other lines, the first ten columns are reserved
for the clone, name and the hybridization data should start at any
column after 10th column. Total length of each
line is defined by MAX_BUFFER in the program. The example is
the clone/probe hybridization matrix for Chromosome IV of
A. nidulans.
The fourth line is the probe names file name. It contains the probe
names in the same order across the columns as in the input file.
The fifth line is the output file name. The finally reconstructed
physical map, the statistical confidence statistics from the
bootstrap run and other statistics are written to this file.
The sixth line is the name of the file that stores the total linking
distance for each bootstrap run. This is for tracing how the
program is running.
The seventh line is the seed for random number generator.
PROGRAM INPUT LIMITATIONS:
The number of clones must be between 1 and 600 for a DEC VAX
station 4000.
Filenames (with directory path, if specified) must be
no longer than 40 characters.
If the bootstrap run number is not given in the input binary hybridization
data file, it will be set to the default number of 1000.
PROGRAM SPEED:
The program assembled a physical map of 593 clones
probed with 115 probes within less than 2 mins on a DEC
VAX station 4000. Total time for 1000 bootstraps run will take
30 CPU hours on this workstation.
OBTAINING THE SOFTWARE:
The software is only distributed via
Internet using EMAIL. Please send an EMAIL request to:
ARNOLD at BSCR.UGA.EDU
if you wish copies of the program. I will EMAIL you:
1) a C program boot.c;
2) this documentation file, boot.DOC;
3) a test input file, boot.dat;
4) a test probe name file, probe.dat
4) an example output file, boot.dat; and
5) a command file, boot.COM.
This last file is what you would use to submit a batch job in
the VAX/VMS operating system to generate the file boot.out.
SOFTWARE SUPPORT IN THE USE OF THE PROGRAMS:
If you have questions about
the programs, please contact Yuhong Wang currently located
at University of Georgia:
wang at bscr.uga.edu
or myself at
arnold at bscr.uga.edu
HARDWARE LIMITATIONS:
The programs have been run without modification on VAXstations,
a DECstation 3100, a Silicon Graphics IRIS 4D70/GT workstation
and IBM Risc 6000 workstation.
. - - - - - - - - - - - Jonathan Arnold - - - - - - - - - - - - - - - .
| Dept. of Genetics, |
| University of Georgia |
| Athens, Georgia 30602 |
| Phone: (706) 542-1449 |
| messages: (706) 542-8000 |
| FAX: (706) 542-3910 |
| Internet: ARNOLD at BSCR.UGA.EDU |
. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - .