BIONET.MOLBIO.GENE-LINKAGE FREQUENTLY ASKED QUESTIONS (FAQ) AS OF
1996/04/29
1.0) FAQ ADMINISTRATIVE INFORMATION [1995/05/18]
1.1) Where can I obtain and/or access the bionet.molbio.gene-linkage
FAQ? [1995/03/01]
1.2) Who created the bionet.molbio.gene-linkage FAQ? [1995/03/01]
1.3) How can I help improve this FAQ? [1995/03/01]
1.4) Contributors to this FAQ. [1995/09/09]
1.5) When was the FAQ last updated? [1996/04/28]
2.0) INFORMATION RESOURCES
2.1) What anonymous FTP sites have programs/utilities useful for
linkage analysis? [1995/03/01]
2.2) What books are helpful when learning about linkage analysis?
[1995/03/01]
2.3) What WWW sites have useful linkage information? [1996/01/02]
2.4) What gopher sites have useful linkage information? [1995/03/01]
2.5) What "linkage centers" make information and assistance available
to researchers? [1995/12/11]
2.6) What journals are useful for linkage analysis? [1995/06/02]
2.7) What courses are offered in linkage analysis? [1995/09/09]
3.0) GENE-LINKAGE SOFTWARE OVERVIEW
3.1) What database management programs do people use for linkage data?
[1995/05/31]
3.2) What programs are available for pedigree drawing? [1995/04/01]
3.3) What linkage analysis helper programs are available? [1996/04/29]
3.4) Why are some programs used primarily for chromosome mapping,
while others are used for disease mapping? [1995/03/01]
3.5) What programs are used for physical mapping? [1995/11/30]
3.6) What programs are used for disease gene mapping? [1995/09/07]
3.7) What programs are available for running genetic simulations?
[1995/11/30]
3.8) What programs are available to help detect errors in linkage
data? [1995/11/30]
3.9) What programs help me recode genetic markers? [1995/03/01]
4.0) LINKAGE PACKAGE SPECIFIC INFORMATION
4.1) How do I get my CEPH data into CRI-MAP format? [1995/03/01]
4.2) How do you calculate MAXHAP? [1995/09/09]
4.3) When should you use binary coding instead of numeric allele
coding? [1995/03/01]
4.4) What do you do when allele frequencies do not add up to 1; for
example, when alleles are not present in a pedigree under study?
[1995/03/01]
4.5) I use LINKAGE and/or FASTLINK. Which references should I include
in my papers? [1995/03/01]
4.6) What is recoding of alleles all about anyway? [1995/03/01]
4.7) What do you do when you get thetas greater than 0.5 when using
linkage? [1996/22/01]
5.0) COMPUTER ADMINISTRATION AND OPTIMIZATION
5.1) How w can I increase the speed of the LINKAGE/FASTLINK package on
my workstation? [1995/05/18]
6.0) MOLECULAR BIOLOGY ISSUES IN LINKAGE ANALYSIS
6.1) What screening sets are available for linkage analysis?
[1995/09/14]
1.0) FAQ ADMINISTRATIVE INFORMATION
1.1) Where can I obtain the bionet.gene-linkage FAQ? [1995/03/01]
It is available by anonymous FTP from lenti.med.umn.edu in
/pub/linkage. The best way to view the FAQ is via the WWW, from
http://lenti.med.umn.edu/linkage/linkage.html. The FAQ is also
available via gopher at lenti.med.umn.edu in /Biologically Related
Information/Linkage Analysis. The FAQ will also be posted in the
USENET groups bionet.molbio.gene-linkage and news.answers the 1st and
15th of each month.
1.2) Who created the bionet.molbio.gene-linkage FAQ? [1995/03/01]
Darrell Root (rootd at ohsu.edu) originally started the
bionet.molbio.gene-linkage FAQ in May of 1994 in an attempt to share
information and experiences that may be of use to other people
involved in linkage analysis. I am Dean Flanders
(dean at lenti.med.umn.edu), the current maintainer of the FAQ, and began
my tenure in December of 1994. The FAQ will never serve as a short
course in linkage analysis, but instead it will ideally be a place to
help beginners get started in the area and to help experts not make
the same mistakes as others. All of the information in this FAQ by no
means comes completely from Darrell or me, but from a large number of
people that work in the area of linkage analysis. Their names are
listed at the end of this section of the FAQ.
1.3) How can I help improve this FAQ? [1995/03/01]
Feel free to send any information that you think would be beneficial
for other people who are just beginning in linkage or have been doing
linkage for years to linkage at lenti.med.umn.edu. Also, if there is
information you would like to see or errors in this FAQ please let us
know by sending email to linkage at lenti.med.umn.edu. If you would like
to see something changed or added to the FAQ please to send it in a
format that can be quickly incorporated into the FAQ, such as
correcting the errors in the section of the FAQ and emailing it back
to the FAQ maintainer.
1.4) Contributors to this FAQ. [1995/09/09]
David Adler, John Attwood, Michael Boehnke, Marcia Brott, Don Bowden,
Michael Braverman, Lucien Bachner, Young B Choi, Kevin Crawford, Dave
Curtis, Peter Doris, Bennett Dyke, David Featherstone, Dean Flanders,
Jonathan Haines, Rob Harper, Pierre Janssens, David Kikuchi, Wentian
Li, Tim Little, Tara Matise, Eli Meir, Mike Miller, Jurg Ott, Darrell
Root, Alex Schaffer, Robert Stodola, Frank Visser, Dan Weeks, Ellen
Wijsman, Scott Wildenberg, Matthias Wjst, and Kim Worley.
1.5) When was the FAQ last updated?[1996/04/29]
The last update of the FAQ was on 1996/04/29. All sections should
indicate what month and year they were last updated. In addition one
can go to the list of updates that are maintained at
http://lenti.med.umn.edu/linkage/gefaqup.html. This is a list in
chronological order of updates with direct links to the updates in the
FAQ.
2.0) INFORMATION RESOURCES
2.1) What anonymous-FTP sites have programs/utilities useful for
linkage analysis? [1995/03/01]
At present there is no one site that serves as a repository for all
linkage software. So the best way of finding FTP site information is
to read the software package information below, which should provide
all of the necessary FTP information.
2.2) What books are helpful when learning about linkage analysis?
[1995/03/01]
Bishop, M. J. "Guide to Human Genome Computing." Academic Press, 1994.
Davies, K. E. "Human Genetic Diseases - A Practical Approach." IRL
Press, Oxford England and Washington, D.C., 1986.
Dracopoli, N. C., Haines, J. L., Korf, B. R., Moir, D.T., Morton, C.
C., Seidman, C. E., Seidman, J. G., Smith, D. R. "Current Protocols in
Human Genetics." John Wiley and Sons, Inc., USA, 1994.
Khoury, M. J., Beaty, T. H., and Cohen, B. H. "Fundamentals of Genetic
Epidemiology." Oxford University Press, 1993.
Ott, J. "Analysis of Human Genetic Linkage." Johns Hopkins University
Press, 1991.
Terwilliger, J. D. and Ott, J. "Handbook of Human Genetic Linkage,"
Johns Hopkins University Press, 1994.
Thompson, E. A. "Pedigree Analysis in Human Genetics." Johns Hopkins
University Press, Baltimore and London, 1986.
2.3) What WWW sites have useful linkage information? [1996/01/02]
This is in no way an attempt to list the explosion of WWW sites of
biological interest on the Internet, but it is a listing of some of
the major ones and ones of particular interest in linkage analysis.
http://www.yahoo.com/Science/Biology/Genetics/, this is a list of
sites related to genetics that is kept very up to date.
http://www.gdb.org/Dan/DOE/intro.html, this is a short course of sorts
that gives some very basic information on how to go about gene
mapping.
http://lenti.med.umn.edu/linkage/linkage.html, which is serving as
linkage analysis home page, will have links to all of the WWW sites
listed as well as gopher servers and a hypertext version of the FAQ.
http://www.genethon.fr, the Genethon Center, Genethon's home page.
http://www.chlc.org, the Cooperative Human Linkage Center, CHLC's home
page.
http://gdbwww.gdb.org has a version of GDB available and access to
OMIM.
http://www.pathology.washington.edu has human and mouse standard
idiograms. The idiograms are useful for making illustrations for gene
mapping and for constructing abnormal chromosomes. The PostScript
idiograms can be manipulated band by band with illustration software
such as Adobe Illustrator, Aldus FreeHand, Canvas, and Altsys
Virtuoso.
http://www.gene.ucl.ac.uk/~john/programs.html contains software by
John Attwood.
http://www.gene.ucl.ac.uk/packages/dcurtis/ contains software by Dave
Curtis.
http://linkage.cpmc.columbia.edu has a lot of useful information on
linkage analysis; in particular it offers information on software, the
course offered by J. Ott, and the Linkage Newsletter.
2.4) What gopher sites have useful linkage information? [1995/03/01]
There is one that will be maintained with links to other gophers of
interest in linkage analysis, as well as links to other gopher servers
of biologically related information. It is at lenti.med.umn.edu, and
the path to it is Biologically Related Information/Genetic Linkage
Analysis.
2.5) What "linkage centers" make information and assistance available
to researchers? [1995/11/11]
One such center is the Cooperative Human Linkage Center (CHLC). The
goal of this center is to generate a high resolution map of the human
genome and rapidly distribute this information to the genome
community. They are in the process of identifying more human markers
and developing high resolution framework maps. One can obtain
information about CHLC from via gopher from gopher.chlc.org ,
http://www.chlc.org , ftp://ftp.chlc.org , info-server at chlc.org, or
help at chclc.org. Among other things, CHLC provides primer selection and
linkage analysis via email. Information on those services can be found
by sending email to: primer- server at chlc.org and
linkage-server at chlc.org.
David Featherston (davidf at caos.kun.nl) from the Dutch EMBnet Node is
starting a linkage analysis service: software availability,
support/advice initially, possibly training, and perhaps consultancy.
At present they have MapMaker/EXP 3.0b, MapMaker/QTL 1.1, Lathrop and
Lalouel's LINKAGE package, and Schaffer's FASTLINK package. This means
that if users have Genomics Package accounts at the CAOS/CAMM Center,
they can use these programs on their fast computers to analyze their
data sets. Please contact David Featherston if you are interested in
more information about such an account.
A major European center is the Human Genome Mapping Project Resource
Centre in Hinxton, England. It is funded by the Medical Research
Council, and has a broad range of software and databases available,
mainly focused on the Human Genome Project. In the area of Linkage
analysis it has the following programs available: FASTLINK, CRIMAP,
MAP MAPMAKER, HOMOZ, PEDPACK, APM, SIMLINK, FASTMAP, COMDS, DOLINK &
QDB, HANDLINK, GAS and Jurg Ott's collection of programs. The aim is
to have all major (Unix-based) gene linkage packages available for our
users. The Center also gives courses on linkage analysis. More
information about the Centre can be obtained from it's home- page:
http://www.hgmp.mrc.ac.uk/. If you want to register as user, send
e-mail to admin at hgmp.mrc.ac.uk for a registration form. For more
information about the gene-linkage services you can contact Frank
Visser (fvisser at hgmp.mrc.ac.uk).
INFOBIOGEN: This is the French GDB node that offers also a linkage
server and assistance in the process of linkage analysis. It uses
LINKAGE, FASTLINK and other programs running on a Sparc Center 2000E
with 1 giga RAM, 4 Gig of swap, and 6 CPU's. For furhter information
contact Lucien Bachner at bachner at infobiogen.fr or look at the
following web site http://www.infobiogen.fr/.
2.6) What journals are useful for linkage analysis? [1995/06/02]
American Journal of Human Genetics, Annals of Human Genetics, Computer
Applications in Biosciences (CABIOS), Genomics, Genetic Epidemiology,
Human Genome News (available by gopher from gopher.gdb.org), Human
Genome Project Journal, Human Heredity, Journal of Computational
Biology, Nature Genetics.
2.7) What courses are offered on linkage analysis? [1995/09/09]
There are three primary courses offered throughout the yeart on human
linkage analysis. One is a four day course offered once per year by
Drs. Margaret Pericak-Vance and Jonathan Haines. The next course will
be offered in late April, 1996 in Boston. The focus of the course is
on the overall design of a human disease gene mapping study, with
particular emphasis on the problems of common/complex disorders. The
course covers clinical classification, pedigree ascertainment,
collection, and follow-up, basic linkage techniques, linkaghe and
association analysis for complex disorders, laboratroy technqiues for
genotyping, and gene characterization. The courseemphasizes the global
decision-making process, rather than details of specific techniques.
For more information write to Genetic Methods Course; c/o Dr. Margaret
Pericak- Vance; Division of Neurology, Box 2900; Duke University
Medical Center; Durham, NC 27710, or you can send e-mail to
genclass at genemap.mc.duke.edu. The remaining two courses are both
offered by Jurg Ott on the software used for human linkage. One is a
beginner's course, and the other an advanced course for those familiar
with the linkage analysis software. These courses are offered several
times throughout the year and you can get more information by
contacting Katherine Montague/Jurg Ott; Columbia University, Unit 58;
722 West 168th Street; New York, NY 10032. In addition you can fax to
(212)568- 2750 or call (212)960 2507 or email km165 at columbia.edu for
more information.
A new beginner's level linkage course will be offered in French
October 24-25 1995 by INFOBIOGEN, in Villejuif south suburb of Paris.
It's free for all academic institutions. For furhter information
contact Lucien Bachner at bachner at infobiogen.fr or
linkage at infobiogen.fr.
3.0) GENE-LINKAGE SOFTWARE OVERVIEW
3.1) What database management programs do people use for linkage data?
[1995/05/31]
One must be aware that some pedigree drawing software can also serve
as databases for data as well as drawing pedigrees, see the next
question in the FAQ for a description of those packages.
CEPH DBMS: The CEPH DataBase Management System is specifically
designed for chromosome mapping with CEPH style pedigrees. It can
output data in ped.out format for the LINKAGE package. This program
can now be picked up via anonymous FTP from ftp.cephb.fr in
pub/ceph_genotype_db.
DOLINK: This DOS custom database program by D. Curtis manages genetic
data and sets up input files for linkage analysis. It is available
from ftp.gene.ucl.ac.uk. The DOS and Windows versions of DOLINK
program help manage genetic data and setup analysis. It is available
with the C++ source allowing compilation on Unix host running X and
possibly a Macintosh.
File Express: This is a DOS shareware database which can be used to
hold data for DOLINK (largely superseded by QDB). It is available as
fe51-a/b/c.zip via FTP from ftp.gene.ucl.ac.uk in
/pub/packages/dcurtis.
LABMAN and LINKMAN: These are linkage analysis databases for holding
linkage data and exporting it in various formats for linkage analysis.
They are available via anonymous FTP from lenti.med.umn.edu in
/pub/linkage/labman. These databases were developed by P. Adams of
Columbia University.
LYNKSYS: This custom-made database program was written by J. Attwood
and S. Bryant. Although they continue to use it, J. Attwood suggests
using DOLINK instead. LINKSYS is not currently available at any FTP
sites.
Map Manager: It is a program for the Macintosh which helps analyze the
results of genetic mapping experiments using backcrosses,
intercrosses, or recombinant inbred strains. In addition it also has
tools for statistical analysis of experiments. The program was created
by K. F. Manly at the Roswell Cancer Institute and is available via
FTP from mcbio.med.buffalo.edu in /pub/MapMgr.
QDB: This is a database program available as DOS and Windows versions
and with C++ source allowing compilation for X and possibly Macintosh.
It is available as qdb16a.zip via FTP from ftp.gene.ucl.ac.uk in
/pub/packages/dcurtis.
3.2) What programs are available for pedigree drawing? [1995/04/01]
One of the tricks of managing individuals in a mapping study is trying
to get the database you are using to export your family data in a
format acceptable for input into pedigree drawing programs. The
marriage between these two can be of great assistance. However, some
pedigree drawing programs have databases as a part of the package.
CYRILLIC: This is a pedigree editor for Windows with facilities for
including marker data which you can then have it output the input
files for LINKAGE. It is Windows-based, so input of the pedigree is
very efficient. You also have a data form associated with each
individual where you can store names and other pertinent data. It also
has the ability to interface with most standard PC databases. This
program is not public domain and is available from Cherwell Scientific
Publishing. If you would like more information send email to
csp at sable.ox.ac.uk and they would be very happy to send you a demo of
the program. Version 2 of Cyrillic should be coming out late summer of
1995.
FTREE: This is a DOS pedigree program written by R. Go at the
University of Alabama.
GENETREE: GeneTree 1.0 is a DOS package which provides a convenient
way to draw family tree diagrams suitable for genetics or genealogy.
The package consists of the GeneTree program, which draws pedigree
diagrams using a command language; and SC, using a menu driven program
that facilitates creation of GeneTree commands. GeneTree and SC are
made available with program manuals, examples of family tree diagrams,
and a GeneTree Quick Reference Guide. GeneTree is written in C. Note
that it is a DRAWING program and does not compute genetic parameters.
The GeneTree program is available from wijsman at max.u.washington.edu at
a price of $125 (because of licensing fees from a private company
which wrote one of the drivers used in the program).
KINDRED: This new DOS database program, distributed by Epicenter
Software, is specifically designed for linkage analysis. A free demo
is available by calling (818)-304-9487. In addition to database
duties, this program will draw pedigrees, haplotype marker data, and
can output data in LINKAGE format.
PEDPAK: This package is designed to handle large datasets for animals.
The package was written and distributed by Alan Thomas, who is in
Bath, England. The software is not public domain and must be
purchased.
Pedigree/Draw: It is a Macintosh based program, written by B. Dyke, P.
Mamelka, and J. MacCluer. It is available from bdyke at darwin.sfbr.org
or Pedigree/Draw; Department of Genetics; Southwest Foundation for
Biomedical Research; PO. Box 28147; San Antonio, TX 78228-0147. An
upgrade from a previous version is $10, the current version is 4.4.
Documentation costs $10 printed and the full package including
documentation costs $45. There is a script which converts linkage
format to Pedigree/Draw available via anonymous FTP at ftp.ee.pdx.edu
in /pub/users/cat/rootd/convert.new.
PEDRAW: This program is a pedigree drawing program written by D.
Curtis for DOS and available via FTP from ftp.gene.ucl.ac.uk in
/pub/packages/dcurtis. The most current version is called
pedraw16.zip. A companion program to PEDRAW is PEDHELP, it is a pop-up
help for PEDRAW.
PAP: The Pedigree Analysis Package (PAP) is a set of FORTRAN 77
programs for computing likelihoods and simulating phenotypes of
genetic models on pedigrees. It is available via gopher from
corona.med.utah.edu in Publicly Accessible Software, probes(sts),
etc./software/pap.
3.3) What linkage analysis helper programs are available? [1996/04/29]
CEPH2CRI: This program converts to output from the CEPH DBMS into the
format useable in CRI- MAP. It can be found at ftp.gene.ucl.ac.uk in
/pub/packages/linkage_utils.
EASISTAT: This is a DOS statistics package, it contains EASIGRAF which
draws graphs of lod scores from the output of FASTMAP. The lod scores
first need to be run through the TABLE utility, which is included in
the DOLINK and FASTMAP packages. It is available as estat21.zip via
anonymous FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
FIRSTORD: A demonstration of a method for preliminary ordering of loci
based on two-point lod scores. It is available as DOS executable and C
source called first11.zip from ftp.gene.ucl.ac.uk in
/pub/packages/dcurtis.
LINKMEND: A program for converting LINKAGE-format files to
MENDEL-format files. It is available by anonymous FTP from
watson.hgen.pitt.edu as linkmend.tar.Z.
MAP: A program to convert LINKMAP output into a table of multipoint
lod scores. It is available by anonymous FTP from watson.hgen.pitt.edu
as map.tar.Z.
PEDPREP: A program for converting a MENDEL-format pedigree file
('pedm.dat') to a Pedigree/Draw file for graphical display on a
Macintosh. It is available by anonymous FTP from watson.hgen.pitt.edu
as pedprep.tar.Z.
RECODE: A program for recoding character or sized-allele data into
numbered-allele data. It is available by anonymous FTP from
watson.hgen.pitt.edu as recode.tar.Z.
3.4) Why are some programs used primarily for human chromosome
mapping, while others are used for human disease mapping? [1995/03/01]
Any family can be used for chromosome mapping, so CEPH has picked a
particular family "shape" and generated a large database with these
families. Programs designed for chromosome mapping can be optimized
for using these families, reducing the time needed for calculations.
Only families afflicted with a disease can be used for disease gene
mapping. As a result, programs designed for disease gene mapping need
to be able to deal with arbitrary pedigrees. In addition, these
programs need to be able to handle incomplete penetrance.
3.5) What programs are used for physical mapping? [1995/11/30]
CLINKAGE: This is the special version of the LINKAGE programs for
3-generation CEPH pedigrees and codominant markers. The PC and VAX
versions are available by FTP from linkage.cpmc.columbia.edu. The Unix
version is available from corona.med.utah.edu.
CHROMLOOK: This is a program for generating haplotypes of marker data
in nuclear pedigrees with all individuals genotyped. It identified
both the maternal and paternal recombination events, and provides the
resulting haplotypes and recombinants in an easy-to-read format. It
should be available via FTP server sometime this summer. It was
written by Jonathan Haines and he can be contacted at
haines at helix.mgh.harvard.edu.
CINTMAX: This program is an extensively modified version of CILINK. It
uses map functions to model the transmission of gametes from parent to
child. Some of these map functions are multilocus feasible, and so can
be used with more than 3 loci at a time. It is available by anonymous
FTP from watson.hgen.pitt.edu as cintmax.tar.Z.
CRI-MAP: This program has been used for chromosome mapping for years.
It has options which can generate maps, calculate order probabilities,
and printout recombination data. It works on .gen files with data from
CEPH style families. It is written in K& R type C code, and the author
Phil Green has successfully ran it on Unix, DOS, VMS, and Macintosh
systems. It is not available via anonymous FTP. Phil Green distributes
CRI-MAP freely ONLY to academics/academic institutions. Contact him
at: Phil Green; Molecular Biotechnology Dept., FJ-20; Fluke Hall on
Mason Rd.; Univ. of Washington; Seattle, WA 98195; USA; Phone (206)
685-4341; Fax (206) 685-7344; or email phg at u.washington.edu.
FASTMAP: This program produces quick approximation to multipoint lod
score, available as a DOS executable and C source as fstmap11.zip from
ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
MULTIMAP: This LISP based expert system uses an customized version of
CRI-MAP to create a chromosome map. It is available via anonymous FTP
from chimera.gene.cwru.edu. The authors T. Matise, M. Perlin, and A.
Chakravarti continue to improve the code, add new functions, and
provide excellent support. When used with the CRI-MAP chrompic option
(to find double-recombinations to identify possible errors), it is
incredibly useful. This is Unix-only (supported for DEC-Ultrix,
HP9000, and Suns). The customized CRI-MAP version (called LISPCRI) is
distributed at the FTP site, but was not meant to be used
independently of MULTIMAP.
MAPMAKER: Dr. Eric Lander; Whitehead Institute; 9 Cambridge Center;
Cambridge, MA 02142; mapm%mitwibr at mitvma.mit.edu. MAPMAKER is
available via FTP at genome.wi.mit.edu in /pub/mapmaker3.
RHMAP: It is a set of three FORTRAN 77 programs that provide the means
for a complete statistical analysis of RH mapping data. RH2PT is a
program for data description and two-point analysis. It provides
estimates of locus-specific retention probabilities and pairwise
breakage probabilities, two-point lod scores for linkage of the
various marker pairs, and linkage groups. RHMAP is now also available
at the following URL http://www.sph.umich.edu/group/statgen/software.
If you would like email notification of updates please send email to
boehnke at umich.edu.
3.6) What programs are used for disease gene mapping? [1995/09/07]
APM: The Affected Pedigree Member Method distribution contains the new
APM programs, a new file conversion utility, and a
histogram/statistics generator. To build the entire distribution, you
need C, Pascal, and FORTRAN compilers, and a make utility is also
helpful. The programs which are built include: APM, a program to
calculate the single locus statistic over one or several marker loci;
SIM, a program to simulate pedigrees and, using output files of APM,
test for asymptotic normality of the null distribution; APMMULT, a
program to generate the multilocus statistic; SIMMULT, a program like
SIM but which simulates recombination and uses the output of APMMULT;
CHAPM, a program to convert LINKAGE files to APM files, or APM files
of one format to APM files of another format; and HIST, a program to
compute various statistical figures, plot a histogram, and compute
empirical p-values. The APMember package by D. Weeks is available via
anonymous FTP from watson.hgen.pitt.edu. Additionally, there are
pre-compiled executables of the APM programs for Sun-OS and
Sun-Solaris available as newapm.sunos.tar.Z newapm.solaris.tar.Z.
CLUMP: A Monte Carlo method for assessing significance of a
case-control association study with a multi-allelic marker, available
as DOS executable and C source. It is available as clump.zip via
anonymous FTP from ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
ESPA: This is a program used for extended sib pair analysis. It comes
in a DOS version and can only look at markers containing 5 alleles. It
was written by Lodeijk Sandkuijl and can be obtained by writing to him
at Voorstraat 27; Delft 2611 JK; THE NETHERLANDS.
ERPA: A program for carrying out nonparametric linkage analysis,
available as DOS executable and C source. It is called erpa12.zip via
anonymous FTP at ftp.gene.ucl.ac.uk in /pub/packages/dcurtis.
FASTLINK: This is a much faster implementation of the main programs in
LINKAGE (LODSCORE, ILINK, MLINK, LINKMAP) in C. The code is faster due
to the use of new and better algorithms for the time intensive parts
of the computation. FASTLINK is distributed by A. A. Schaffer from the
FTP site softlib.cs.rice.edu (cd pub/fastlink). Version 1 of FASTLINK
was instigated by R. W. Cottingham Jr. with implementation done by R.
M. Idury and A. A. Schaffer. Version 2 of FASTLINK includes further
improvements implemented by A. A. Schaffer, S. K. Gupta, and K.
Shriram, with guidance from R. W. Cottingham Jr. Version 2 includes
the capability to recover gracefully from a crash of the computer on
which FASTLINK is running. FASTLINK was initially intended for UNIX
machines, but the distribution now includes instructions for porting
to VMS as well as a version for DOS. FASTLINK allows you to compile in
"fast" or "slow" mode (the slow version of FASTLINK is still much
faster than the old LINKAGE programs). The "fast" version uses lots of
memory, but uses the extra memory to contain some of the intermediate
results which are repetitively recalculated in the "slow" version (and
the old linkage package). Best speed can be obtained by setting up 300
megs of virtual memory on a Unix workstation and using the "fast"
version. Schaffer maintains a mailing list of fastlink users
(fastlink-list at cs.rice.edu) to answer queries and keep users up to
date. Schaffer, Gupta, and other colleagues at Rice University have
implemented parallel versions of FASTLINK for either a shared-memory
multiprocessor or a network of UNIX workstations. This version is now
available as FASTLINK 2.3P at the above mentioned FTP site. Write to
schaffer at cs.rice.edu for more information.
GAS: It provides facilities for reading, writing, sectioning and
performing statistical analyses on phenotypic and genotypic data and
one of its features is sib pair analysis. It has been developed within
the Department of Medicine at Oxford University and is available via
FTP from well.ox.ac.uk in the directory pub/genetics/gas.
GREGOR: It is a piece of DOS based software for producing simulated
genetic data. It does not perform linkage analysis, but it may be
useful for testing methods or assumptions about linkage analysis.
GREGOR is operated by a series of hierarchical menus that permit the
user to define hypothetical genetic scenarios (gene positions and
effects) and produce simulated data-sets for a variety of population
structures. GREGOR is available by FTP from the site
sifon.cc.mcgill.ca in pub/McGill-Contrib. Questions should be directed
to the authors tinker at agradm.lan.mcgill.ca or
mather at agradm.lan.mcgill.ca.
LINKAGE: This package of programs was developed by M. Lathrop with
help from J. M. Lalouel, C. Jlier, and J. Ott. The LINKAGE package
consists of several analysis and several utility programs. Versions
are available for DOS, OS2, VAX, and Unix platforms. Here are some of
the analysis programs: MLINK: 2-point lod-score calculations at fixed
recombination distances; LINKMAP: multipoint lod score calculations at
fixed distances; ILINK: calculates the recombination distance with the
highest lod-score. Unix versions are available via gopher from
corona.med.utah.edu in Publicly Accessible Software, probes(sts),
etc./software/linkage, DOS and VMS versions are available from
linkage.cpmc.columbia.edu, or on floppy disks, when you write to:
Katherine Montague/Jurg Ott; Columbia University, Unit 58; 722 West
168th Street; New York, NY 10032. Send pre-formatted DOS disks if you
request linkage by mail. You can send email to km165 at columbia.edu if
you need more information regarding mail requests for the LINKAGE
package.
LIPED: This DOS program written by J. Ott calculates probabilities for
linkage between disease markers and genetic markers. Its input file
differentiates between phenotypes and genotypes. As a result, this
program is easiest to use when your data is from "old-style"
genetic-markers (such as blood phenotype data). This was one of the
first programs to do linkage analysis calculations, the LINKAGE
package is more commonly used now.
MIM: Multipoint IBD Method: mimintro.txt, mimsetup.txt, mim.txt,
changes.txt, testa.dat, and testa.out.
SAGE: Statistical Analysis Package for Genetic Epidemiology is
composed of 18 programs: AGEON: Estimating the Distribution of
Age-of-Onset, ASSOC: marker-trait Associations in Pedigree Data,
BCROSS: Genetic Hypothesis for Quantitative Data on Inbred strains,
their F1 and Backcross(es), CLUSTR: Power Transformation to Obtain
Normality and Homoscedasticity from Clustered Data, FCOR: Family
Correlations, FSP: Family Structure Program, LODLINK: Lod Score
Linkage Analysis, MAPLOC: Mapping a Disease Related Trait Relative to
a Set of Linked markers, MAXFUN: Function maximization Subroutine,
REGC,REGD,REGTL,REGTN: Segregation Analysis Programs, RELATE:
Relationship to Proband, SIBPAL: Sib-Pair Linkage Analysis, and
DBSORT, RENUM, SPLIT: Toolkit Programs. Author Dr. R.C. Elston,
address Department of Biometry and Genetics; Louisiana State
University Medical Center; 1901 Perdido Street; New Orleans, Louisiana
70112, USA. The email contact address is sage at haldne.biogen.lsumc.edu.
It is available for the following operating systems: VAX, SunOS 4.1.x,
Apple Macintosh II, and DOS. This program is not shareware and must be
bought.
X-LINKED APM: X-linked version of the APM programs (single-marker),
see APM above for more information on APM. It is available by
anonymous FTP from watson.hgen.pitt.edu as xlinkapm.tar.Z. Also,
xlinkapm.readm is available there, which is a readme about the
X-linked version of the APM programs.
3.7) What programs are available for running linkage simulations?
[1995/11/30]
FASTSLINK: This is program is just like SLINK (see SLINK below), but
it utilizes the enhancements incorporated into FASTLINK. It is
available via anonymous FTP from watson.hgen.pitt.edu.
SIMAPM: Is the SLINK based simulation program for the APM package.
This represents a hacked together package which only runs under a Unix
system. You will need FORTRAN, Pascal, and C compilers to use this
package. It is available via anonymous FTP from watson.hgen.pitt.edu
SIMLINK: This FORTRAN program developed by L. Ploughman and M. Boehnke
simulates linkage analysis on a family, and gives you an estimate the
probability, or power, of detecting linkage in a given family. It
allows the researcher to determine whether a family has sufficient
informativeness to detect linkage. SIMLINK requires large quantities
of memory. It was written for DOS, but has been ported to many
platforms. It is available from: Michael Boehnke; Department of
Biostatistics; School of Public Health; University of Michigan; Ann
Arbor, MI 48109-2029. No postage-money or blank disks are necessary to
get SIMLINK sent to you. SIMLINK may be available via anonymous FTP
soon. For further information send email to boehnke at umich.edu. SIMLINK
is now also available at the following URL
http://www.sph.umich.edu/group/statgen/software. If you would like
email notification of updates please send email to boehnke at umich.edu.
SLINK: It is a Pascal program developed by D. Weeks, M. Lathrop, and
J. Ott. It is similar to SIMLINK. It is more general than SIMLINK in
that it allows for partial marker typing at the locus to be generated,
but it runs slower than SIMLINK. It is available from
linkage.cpmc.columbia.edu and watson.hgen.pitt.edu or on floppies (use
the same address as for LINKAGE).
3.8) What programs are available to help detect errors in linkage
data? [1995/11/30]
Typically the linkage packages in and of themselves will detect errors
in linkage data that are obvious, such as impossible phenotypes and
genotypes, and obvious errors in pedigrees. Typically the programs
will just grind to halt and allow you to fix the error, and try again
until you finally succeed. However, errors that "make sense" to
linkage programs will not be detected.
GENO: It is a genotype entry/edit tool that will allow you to easily
enter and manipulate genotyping data. You can also check the quality
of your data with the built-in Mendelian inheritance checker. The
author the of program is Matt Stephenson and can be reaced at
stephenm at bioimage.mfldclin.edu. The program is available via FTP from
dgabby.mfldclin.edu in /pub/geno.
GENOCHECK: It is an error checking program designed to identify
individuals and loci that are likely to contain errors. the
statistical method was designed to identify typing error, but is
general enough to pinpoint any unlikely genotype still consistent with
Mendelian inheritance. The author is Dr. Margaret Gelder Ehm the ftp
site is at softlib.cs.rice.edu and it is in /pub/GenoCheck. It is
written for Unix.
3.9) What programs help me recode genetic markers? [1995/03/01]
DOLINK can downcode alleles automatically. However, the main use of
DOLINK is to prepare files for LINKAGE from a database. In addition P.
Adams package LABMAN and LINKMAN have features for the recoding of
alleles.
4.0) LINKAGE PACKAGE SPECIFIC INFORMATION
4.1) How do I get my CEPH data into CRI-MAP format? [1995/03/01]
You can output the file in linkage format and use link2gen in CRI-MAP.
The disadvantage here is that your marker names are separated from
your data and it's easy to make a mistake and get them mixed up. You
can output the file in ped.out format and use CEPH2CRI mentioned above
in the FAQ to do the conversion as well.
4.2) How do you calculate MAXHAP? [1995/09/09]
MAXHAP is the maximum possible number of haplotypes in your analysis.
You multiply together the number of alleles at each locus used in a
particular run; not all loci in your dataset, just the loci you are
using in that particular calculation. Remember that the affection
status counts as two alleles, regardless of the number of liability
classes. For example, if a dataset has the following information: the
liability classes, marker A has 3 alleles, marker B has 4 alleles, and
marker C has 5 alleles and your run includes a LINKMAP run between
affection status, marker A, and marker B, then your MAXHAP must be at
least 2*3*4=24.
FASTLINK 2.3P includes an auxiliary program called ofm (optimize for
maxhap) which can be used to automatically recompile the desired
program with the ideal value of maxhap under the following
assumptions: using UNIX or VMS (not DOS), running ILINK or LINKMAP or
MLINK (not LODSCORE), the main script is produced by the LINKAGE
auxiliary program LCP), and the locus file is produced by the LINKAGE
auxiliary program PREPLINK; see README.ofm in the FASTLINK
distribution.
4.3) When should you use binary coding instead of numeric allele
coding? [1995/03/01]
Usually there is no advantage to coding disease loci as either binary
or numeric using liability classes. Generally, binary coding is more
complex in that we humans often have a hard time thinking that way.
Some of the codominant phenotypes lend themselves to binary coding;
for example, ABO blood types: A (101), B (011), O (001), AB (111), and
unknown (000). Since you cannot distinguish AO from AA at the
phenotype level you code both genotypes as (101), presence of A and O.
In reality O represents absence of both A and B. However, do not code
using (000), since it would be an unknown. Use of binary codes has
decreased since DNA markers have come into use since they allow one to
type an individual with respect to genotype. You can use binary codes
if you have phenotypic data which does not allow for the
discrimination of the underlying genotype exactly, and one can code it
as the presence with 1 or absence with 0 of factors such as the A and
B antigens. Binary codes allow the representing loci with codominant
and dominant mode of inheritance, while allele number notation is good
only for codominant loci. Few people use binary factor notation. They
either use allele numbers for codominant loci, or affection status
notation for dominant loci. The main reason why binary factor notation
is still currently used is that CEPH's database is in that notation.
4.4) What do you do when allele frequencies not add up to 1, for
example, when alleles are not present in a pedigree under study?
[1995/03/01]
The best approach is to specify n+1 alleles, where there are n alleles
actually observed in the pedigree. Use the correct allele frequencies
for the n alleles, and for the n+1 allele, use 1 minus the sum of the
frequencies of the observed alleles.
4.5) I use LINKAGE and/or FASTLINK, what references should I cite in
my papers? [1995/03/01]
FASTLINK users should cite:
Cottingham, R. W. Jr., Idury, R. M., and Schaffer, A. A. "Faster
Sequential Linkage Computations." American Journal of Human Genetics.
53:252-263, 1993.
Schaffer, A. A. , Gupta, S. K., Shriram, K., and Cottingham, R. W. Jr.
"Avoiding Recomputation in Linkage Analysis". Human Heredity.
44(4):225-37, 1994 Jul-Aug.
In addition, all FASTLINK and LINKAGE users should also cite the
LINKAGE papers:
Lathrop, G.M., Lalouel, J.M., Julier, C. , and Ott, J. "Strategies for
Multilocus Analysis in Humans." PNAS. 81:3443-3446, 1984.
Lathrop, G.M. and Lalouel, J.M., "Easy Calculations of LOD Scores and
Genetic Risks on Small Computers." American Journal of Human Genetics.
36:460-465, 1984.
Lathrop, G.M., Lalouel, J.M., and R. L. White. "Construction of Human
Linkage Maps: Likelihood Calculations for Multilocus Analysis."
Genetic Epidemiology. 3:39-52, 1986.
4.6) What is recoding of alleles all about anyway? [1995/03/01]
One of the problems with highly polymorphic markers is that they can
increase the computational requirements of the computers by several
orders of magnitude due to the large number of alleles present. This
can put the computation of some lod scores out of reach for DOS
computers and take many days on higher end systems. So it is important
to use methods that reduce the number of alleles, and recoding will
reduce the number of alleles in your calculations.
The method of recoding of alleles described by J. Ott in the Annals of
Human Genetics, 42:255-257 (1978) works very well, but can only be
done when the mode of inheritance of the disease is known. An article
inspired by Ott's original work written M. Braverman in Computers and
Biomedical Research, 18:24-36 (1985) extends the recoding of alleles
in two ways: 1) it allows for pedigrees of arbitrary structure, and 2)
it allows for missing/partially known marker phenotypes. It is usually
possible to recode marker alleles to some extent even if the mode of
inheritance of the disease is not known since what is still desired
with respect to the marker is a labeling which preserves the available
information about the source of each marker allele. It is important,
however, where the full ancestry of alleles cannot be traced in a
pedigree, that the recoded alleles maintain the allele frequencies
appropriate to the original alleles. In a complex disorder, this may
not be possible.
Another method is if the marker in question has 14 alleles in the
general population, but only 9 alleles in the study population, it is
possible to collapse the functional number of alleles to 9 or 10.
Usually, adjust the allele frequencies to sum to 1 by dividing each
allele frequency by the sum of the (observed) allele frequencies. For
the latter all the allele frequencies remain the same, but the
unobserved ones are collapsed into a single allele (and frequency). If
there are 9 observed alleles (but there are 14 in the population),
then rescaling the frequencies of the observed 9 alleles will also not
produce quite correct results. Consider the unlikely example of a huge
pedigree with only the most recent generation observed in which the
observed 9 alleles all have very low and equal frequency. If there are
distantly separated relatives who are affected there is some
reasonable support for linkage since the alleles are rare. But if we
rescale frequencies to 1/9 per alleles, then sharing of alleles isn't
so unlikely. Coding the marker with 10 alleles produces correct
results as it will produce the same lod scores as would coding the
marker with 14 alleles.
4.7) What do you do when you get thetas greater than 0.5 when using
LINKAGE? [1996/22/01]
This seems to occur when the GEMINI optimization procedure prefers to
go for a local optimum of a theta greater than 0.5 as a result of the
starting theta values being to high in a LINKAGE run using ILINK or
LODSCORE. This can easily be fixed by modifying the starting theta
direclty with LCP or editing the LCP generated script. One can also
modify the starting value with PREPLINK or by editing the data file
containing allele and disease frequencies. This can be an iterative
process and one should change theta values by an order of magnitude
until reasonable thetas are obtained. One must also be careful of
having intial thetas too low, this can also cause problems in the form
of erroneous values. One can also run MLINK to examine what is
happening at different thetas to determine the best starting theta.
5.0) COMPUTER ADMINISTRATION AND OPTIMIZATION
5.1) How can I increase the speed of the LINKAGE/FASTLINK package on
my workstation? [1995/05/18]
1. Use FASTLINK, which is the C version of the LINKAGE package with a
few algorithmic improvements. It can increase the speed of your
calculations by an order of magnitude.
2. Setting up lots of paging space, which uses the hard drive as
virtual memory (300 megs is usually plenty). Note that paging space is
the same as swap space. Then use the "fast" versions of FASTLINK.
3. Use GCC, which is the GNU/Free Software Foundation C compiler, to
compile FASTLINK. GCC produces machine language that is about 10%
faster than Sun's C compiler.
4. Install the generic small kernel instead of the generic kernel. The
generic kernel has device files for almost everything, and can slow
the system down. The generic small kernel is configured for a system
without many devices and without many users. Installing a generic
small kernel is an option during system installation on Sun
workstations.
5. Reconfigure your kernel so it has only devices you need. This
should give you a small improvement in overall system speed, but if
you are already running the generic-small kernel, additional
improvement may be so small that it's not worth the trouble. If the
generic small kernel is insufficient for your system this step is a
must. The generic kernel will slow down your workstation significantly
and most of the device support is unnecessary.
6. Don't run your linkage analyses in the background, because running
programs in the background gives them a lower priority. Either do the
runs in the foreground or you can use the root password to nice the
pedin process by -3 to compensate (negative nice values give a higher
priority). If you need to log out, you can use the screen command and
"detach" a session so you can log out without programs terminating.
Later you can log back in and "reattach" the session, which continued
to run while you were logged out. The screen command is available at
prep.ai.mit.edu and is also on the O'Reilly Unix Power Tools CD- ROM.
According to the Sun documentation, nicing below -10 can interfere
with the operating system and actually reduce the process' speed.
Running them at the standard default level of 0 is usually sufficient.
Some people recommend to run a background job to using nice +19 (!).
In this way, the job will not interfere with other normal processes
like login.
7. Runs with 100% penetrance can run faster than runs with incomplete
penetrance. Of course, if you have an unaffected obligate carrier,
this won't work. In addition, incomplete penetrance runs may be
necessary for your research to be "good".
8. Change the block size of your file system. One can increase
performance of a file system by increasing the block size, thus
decreasing the number of read-write operations. A block device, such
as a hard disk, usually accesses a block of data simultaneously. Thus,
if one is expecting to use large files, having large blocks will be an
advantage. However, one usually trades the number of bytes lost to
partial files since one has to increase the fragment size to a number
larger than 1024, for example 2048. That is, each file or part of a
file occupies 2048 bytes, a file of 100 bytes will still occupy 2048
bytes. Therefore, bigger blocks give faster bigger blocks with bigger
fragments and more lost space.
9. It has been noted that you can increase the speed of programs which
create/access large files in the /tmp directory by creating a tmpfs
file system.
10. Of course, buying more RAM will increase your speed. It's been
said that increasing RAM from 16 to 32 megs will result in a large
increase in speed and increasing RAM from 32-64 megs will result in a
significant increase. However, increasing beyond 64 megs is not
particularly helpful.
6.0) MOLECULAR BIOLOGY ISSUES IN LINKAGE ANALYSIS
6.1) What screening sets are available for linkage analysis?
[1995/09/14]
For humans there are the Weber lab screening sets: 3, 3A, 4, 4A, 5,
5A, and 6 . Primers for the markers within these sets are available
from Research Genetics, both in unlabeled and fluorescent
dye-conjugated forms. The information on these screening sets can be
downloaded via FTP from dgabby.mfldclin.edu, they are in /pub.
EOF