A fast random cost algorithm for physical mapping
PROGRAM DESCRIPTION:
This program is to assist in constructing a physical map
of a whole chromosome or chromosome fragment from a
binary clone/probe hybridization matrix.
The theory for the physical mapping method can be
found in
Cuticchia, A.J., J. Arnold, and W.E. Timberlake (1992).
The use of simulated annealing in chromosome reconstruction
experiments based on binary scoring. Genetics 132: 591-601
The input data is a binary clone/probe hybridization matrix (with
a '1' indicating hybridization and a '0' indicating no hybridization)
of n clones (rows) and m probes (columns). Each row of
the clone/probe hybridization matrix is the digital 'call number'
of a particular clone. When clones overlap, they tend
to hybridize to the same probes, and their digital call numbers
tend to be similar. Probes (columns) link together clones
into contiguous blocks or contigs by their shared pattern
of clonal hybridization. This program for ordering cloned DNA
fragments into 'contig map' does so by permuting the rows
of the clone/probe hybridization matrix so that clones
with similar call numbers appear next to each other.
DNA fragments with a high degree of overlap are expected
to show a high degree of similarity in their profiles.
A distance is defined between each clone by counting
the number of differences in their digital call numbers.
The ordering process is based on minimizing the sum of the linking
distances between clones as a function of their ordering along the
chromosome.
The algorithm has been used to map the fourth chromosome of
Aspergillus nidulans, the whole genome of Schizosaccharomyces
pombe genome, and a region of human chromosome 9.
This algorithm used here to minimize the total linking distance
is a new one called random cost algorithm. It is detailed in
Wang, Y., Prade, R. A., Griffith, J., Timberlake, W.E., and Arnold, J.
(1994) A fast random cost algorithm for physical mapping. PNAS,
91, 11094-11098
PROGRAM INPUT
A typical batch file is given as follows:
$set def [wang.cm.bootstrap]
$run cost
cost.dat
probe.dat
cost.map
rec
0
The first line is to set the default directory.
The second line is to run this program
The third line is the input file name. In the input hybridization file,
the first line should be the number of probes, clones and bootstrap
run numbers. For all other lines, the first ten columns are reserved
for the clone name, and the hybridization data should start at any
column after 10th column. Total length of each
line is defined by MAX_BUFFER in the program.
The fourth line is the probe names file name. It contains the probe
names in the same order across the columns as in the input file.
The fifth line is the output file name. The finally reconstructed
physical map, the statistical confidence statistics from the
bootstrap run and other statistics are written to this file.
The sixth line is the name of the file that stores the total linking
distance for each bootstrap run. This is for tracing how the program
is running.
The seven line is the seed for random number generator.
PROGRAM INPUT LIMITATIONS:
The number of clones must be between 1 and 600 for a DEC VAX
station 4000.
Filenames (with directory path, if specified) must be
no longer than 40 characters.
If the bootstrap run number is not given in the input binary
hybridization data file, it will be set to the default number of 1000.
PROGRAM SPEED:
The program assembled a physical map of 593 clones
probed with 115 probes within less than 2 mins on a DEC
VAX station 4000. Total time for 1000 bootstraps run will take
30 CPU hours on this workstation.
OBTAINING THE SOFTWARE:
The software is only distributed via
Internet using EMAIL. Please send an EMAIL request to:
ARNOLD at BSCR.UGA.EDU
if you wish copies of the program. I will EMAIL you:
1) a C program cost.c;
2) this documentation file, cost.DOC;
3) a test input file, cost.dat;
4) a test probe name file, probe.dat
4) an example output file, map.cost ; and
5) a command file, cost.COM.
This last file is what you would use to submit a batch job in
the VAX/VMS operating system to generate cost.MAP.
SOFTWARE SUPPORT IN THE USE OF THE PROGRAMS:
If you have questions about
the programs, please contact Yuhong Wang currently located
at University of Georgia:
wang at bscr.uga.edu
or myself.
HARDWARE LIMITATIONS:
The programs have been run without modification on VAXstations,
a DECstation 3100, a Silicon Graphics IRIS 4D70/GT workstation
and IBM Risc 6000 workstation.
. - - - - - - - - - - - Jonathan Arnold - - - - - - - - - - - - - - - .
| Dept. of Genetics, |
| University of Georgia |
| Athens, Georgia 30602 |
| Phone: (706) 542-1449 |
| messages: (706) 542-8000 |
| FAX: (706) 542-3910 |
| Internet: ARNOLD at BSCR.UGA.EDU |
| Alternate: ARNOLD at BSCF.UGA.EDU |
. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - .