> From: martin at tutor.oc.chemie.th-darmstadt.de (Martin Kroeker)
> Subject: Re: Linux MolBio Software
>9224076 at ul.ie wrote:
> : I'm posting this to ascertain whether anyone out there has compiled any
> : sort of list of molbio/biology software that runs under Linux (the
> : freeware unix clone for intel pc's). I've found this to be a powerful
> : operating system for numerous applications including items such as
> : Rasmol, DNAWorkBench and Raster3d.
> XtalView, WhatIf, Mage... - any generic list of unix mol/bio/chem software
> should do. At the current state of development, i would expect almost any
> molbio software for unix systems to run under linux. Perhaps a list of
> programs that _do_not_ work would be more useful, though i assume that it
> would mainly contain either hardware (e.g. SGI graphics)- specific things
> or commercial programs where the vendor does not provide either source code
> or a linux binary.
yes, that would probably be the case ... but a list of known packages
where people have bothered to port it may be useful for the
molbio-Linux community ...
another one to add to the List is Entrez ... I append the README_2 for
those of you who are still using version 3.x or earlier ... version 4.x
has much more to offer!
regards,
francis
--
| B.F. Francis Ouellette
| GenBank
||francis at ncbi.nlm.nih.gov
#############################################################
Last update to this document:
March 15, 1996
README_2 for the beta version of Network Entrez.
This document describes the software that can be obtained by
anonymous FTP from ncbi.nlm.nih.gov
in this directory:
/entrez/network/
please note that Network Entrez users should read
these documents
ftp://ncbi.nlm.nih.gov/entrez/network/READMEftp://ncbi.nlm.nih.gov/entrez/network/README_2
#############################################################
National Center for Biotechnology Information (NCBI)
National Library of Medicine
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894, USA
tel: (301) 496-2475
fax: (301) 480-9241
e-mail: info at ncbi.nlm.nih.gov
ver 4.000 Oct. 05, 1995
ver 4.010 Oct. 11, 1995
ver 4.012 Nov. 08, 1995
ver 4.013 Nov. 30, 1995 (Unix)/Dec 02, 1995 (mswin)/Dec 03, 1995 (Mac)
ver 4.017 Feb. 06, 1996 (Unix/mswin)/Feb. 05, 1996 (Mac)
ver 4.022 Mar. 15, 1996 (Unix/mswin)
***** README *****
This README presents some of the highlights of the beta version of
Network Entrez which incorporates a new genomes division and also
presents MMDB (Molecular Modeling DataBase).
This directory, and the ones below, include the latest version
of the network Entrez program which will run on various platforms:
/entrez/network/
Nentrez.hqx :Mac - binhexed
alphaOSF1.tar.Z :alpha - compressed
linux.tar.Z :Linux - compressed
mswin/
win32/nentrezZ.exe :NT/Win95 (32 bit) self extracting
winsock1.1/nentrezZ.exe :Win 3.1 (16 bit) self extracting
sgi.tar.Z :SGI - compressed (IRIX 4.0)
sgi5.3.tar.Z :SGI - compressed (IRIX 5.3)
solaris.tar.Z :Sun - Solaris - compressed
sun.tar.Z :SunOS - compressed
You need only copy (in binary mode) the executable version
for the platform you need and the two README files (README and
README_2) This file is README_2. If you desire binaries of
Network Entrez for a platform not presented above, you can ask us, and
we will see if we can compile one for you, on an unsupported basis.
Please send these requests to:
toolbox at ncbi.nlm.nih.gov
You should already be familiar with Entrez (Network or CD-ROM)
if you are reading this document, but if you are not, you may
want to read the first README document presented in this directory,
as well as the Entrez user manual present in BinHexed format
(Macintosh) in this file:
/entrez/docs/entrzdoc.hqx
This new version of Network Entrez (present in
/entrez/network/) should still be considered "beta" and users
should be aware that although we have taken every precaution to ensure
that this program will work without any problems, it may not always
perform as expected. We are also still modifying the code to add new
features, and you should visit this directory to make sure you have the
most up to date version.
The build dates on the writing of this document were:
Feb. 05 Nentrez.hqx
Mar. 15 alphaOSF1.tar.Z
Mar. 15 linux.tar.Z
mswin/
Mar. 15 win32/nentrezZ.exe
Mar. 15 winsock1.1/nentrezZ.exe
Mar. 15 sgi.tar.Z
Mar. 15 sgi5.3.tar.Z
Mar. 15 solaris.tar.Z
Mar. 15 sun.tar.Z
The current version is: 4.022
We are also changing the ergonomics and layout of some of the
features on this new programs, and suggestions and feedback are very
welcome. These should be sent to this e-mail address:
toolbox at ncbi.nlm.nih.gov
The text which follows explains some of the new features of
this version of Network Entrez, as well as some of the ways one must
manoeuver to visit some of these new features. Again, we are assuming
that you are already familiar with Entrez, and the general way of going
from one information space to another by linking or neighboring.
=================================================================
**** Network Entrez: From Genome to Structure ****
The NCBI has made a major new release of Entrez available in
October 1995. The new release adds graphical access to a new "genomes"
division of GenBank as well as graphical views of standard Entrez
sequence records. The new release also provides a database of 3-
dimensional structures derived from the PDB crystallographic database.
** Graphical Views of Sequences.
A tabbed-folder sequence viewer has been added to Entrez. This
allows the quick selection of alternate report formats for a sequence
entry, including GenBank, EMBL, and a graphical representation. The
viewer is resizable, and permits easy visualization of complex
annotations such as segmented sequences or alternative splicing in
coding regions.
********************************************
** Complete Genomes in Entrez and GenBank **
********************************************
Network Entrez now offers a new "genomes" division which
presents genome level views of a large number of complete chromosomes,
from organelle, through virus and phage, to completely sequenced
chromosomes from yeast or bacteria, to integrated genetic and physical
maps and contiged sequence islands from eukaryotes such as Human, mouse
and Drosophila. Following the Entrez tradition, the chromosome views
are tightly linked to DNA and protein sequence records, MEDLINE
citations, and the new three dimensional structure division described
below.
** Small Genomes
A number of chromosomes from viruses or organelles have been
completely sequenced and available from GenBank for some time.
However, there are often multiple versions of these sequences, parts of
the sequences, or of population variants. NCBI has selected a
reference sequence in these cases, then searched the database and
aligned the other versions of sequence from the same chromosome with
the reference sequence. In the genomes division of Entrez, selecting
such an entry will bring up a graphical map showing the coordinate
system of the whole chromosome. Selecting all or part of this map with
the mouse designates the region to be displayed in the other viewers.
This is done by "rubber-banding" or "click & dragging" of the are of
interest. Once you have selected the area of interest in the map view,
you must then choose one of the other views (e.g. Graphic) by clicking
on the appropriate "TAB". Choosing the Graphic view shows the detailed
feature table of the selected region of the reference sequence and the
positions of other GenBank records that align to it. Vertical black
lines below the aligned sequences indicate insertions relative to the
reference sequence, while black lines within the sequence indicate
gaps. In the Alignment view, sequences aligned to the reference are
retrieved over the network and a new display is constructed which shows
the coding region features on the reference sequence AND those on the
aligned sequences, permitting comparison of annotation between entries
within the alignment. In addition, red lines now show mismatches
between the aligned sequences as well as insertions and deletions as
before. If you click on these records, you will see the
GenBank flatfile view, but if you 'rubber-band' an area of
interest, you will see the the alignment of the sequence you
selected.
** Chromosomes from Contigs
Larger chromosomes have recently been completely sequenced from
yeast and Haemophilus. These records exist in GenBank as many smaller
overlapping records, as required by the international guidelines for
sequence data exchange to ensure compatibility with existing software
tools, and to provide convenient units of data for updating or detailed
analysis (See NCBI newsletter: Sept. 1995). Entrez provides a view for
these chromosomes which presents a virtual sequence representing the
whole chromosome, with bands of alternating colors to indicate where it
is made from different GenBank records. The Graphic and Alignment
views use the same display, but also show the details of overlaps of
the pieces, as well as the features and alignments described above.
Once you have located a region of interest in the genome view, you can
readily retrieve the appropriate constituent record with a double click
of the mouse. As larger chromosomes become available (and this is true
for all the examples in the next section: "Integrated Maps") and larger
amounts of data are requested you may reach the upper limit of the
system where too much of a chromosome was selected. You will simply
see a warning message, and the maximum default size will have been
selected, so you can now "TAB" to the Graphical view.
** Integrated Maps
In the higher eukaryotes, relatively small parts of chromosomes
have been sequenced. In these cases, the NCBI has collected various
genetic and physical maps for a particular organism, mapped them onto a
common coordinate system, and aligned any markers they share. The
beginning of a sequence map for the chromosome is made using contigs of
sequence from the same region and organism, then placing the composite
sequence onto the coordinate system provided by the integration of the
maps. For Human, these composites are known as the "UniGene" set and
are being used as mapping reagents by collaborating groups. As more of
the UniGene-derived markers are placed on the maps, more and more
sequence records will also be placed. For Human, the Map view shows
the integrated map which includes Genethon (as derived from MIT), the
MIT physical map, the CHLC framework map, the GDB cytogenetic map, the
Stanford radiation hybrid map (at present just for chromosome 4), and
the NCBI sequence map.
************************************
** 3D Structure in Network Entrez **
************************************
Network Entrez now includes an explicit 3D structure database,
based on crystallographic and NMR structure determinations. Structure
data can provide a wealth of information on the biological function and
mechanism of action of macromolecules. By adding the structure
database to Entrez we hope to make this information easily accessible
to biologists.
The structure data comprise a new database from NCBI called MMDB
(Molecular Modeling DataBase), derived from the Brookhaven Protein
DataBank 3-dimensional structures (currently over 3,000 biomolecules).
MMDB is a database of ASN.1-formatted records, not PDB formatted
records. MMDB is capable of archiving conventional structure data as
well as future descriptions of biomolecules, such as those generated by
electron microscopy (surface models).
** Searching Structures in MMDB
The structure database may be queried directly, using specific
fields such as author names, or text terms occurring anywhere in the
structure description. One may in this way check for structure data on
a specific protein or nucleic acid. A more powerful approach, however,
is to identify the molecule of interest in the sequence or MEDLINE
databases, identify its sequence neighbors (homologues), and then, by
linking to the structure database, ask whether structure data is
available for any of the members of this family. It is smaller than
the protein or nucleotide databases, but very many sequenced proteins
have homologues in this set, and one may often learn more about a
protein by examining the 3D structure of its homologues. Soon Entrez
will include 3D structural neighbors, which can help link protein
neighbors in the "twilight zone" of sequence homology.
** Viewing Structures
Structure data from Entrez may be viewed in 3D, with real-time
rotation, using the public domain graphics programs RasMol or Kinemage.
Both are freely available for many platforms, including Mac, Windows
and UNIX. Entrez itself simply writes structure "documents" in the
format required by these and other programs, including "PDB" format.
Future versions of Entrez will invoke an integrated 3D structure
viewer.
You'll need to save a structure file and run another structure
viewer program that will read in the saved structure. The Structure
database provides you with a means of saving either a PDB-formatted
file or a Kinemage formatted file. You must first obtain these or
similar programs in order to *see* these structures.
RasMol can be obtained from the author (Roger Sayle) by anonymous FTP
to: ftp.dcs.ed.ac.uk
in the directory:
/pub/rasmol/
or via your favorite WWW browser at this URL:
http://www.dcs.ed.ac.uk/generated/package-links/rasmol.
Kinemage can be obtained by anonymous FTP from the authors (Robert M.
Weiss and David C. Richardson) at:
suna.biochem.duke.edu
in one of these directories
/pub/MACprograms
/pub/PCprograms (use Windows MAGE_3_3.EXE, avoid MAGE_4_2.EXE)
/pub/UNIXprograms
/pub/LINUXprograms
Please familiarize yourself with these programs and try them out
first with their own demonstration files before using it to view
Entrez-generated structure files. NCBI does not offer user-support for
these programs, however we will be supporting an integrated Entrez 3D
structure viewer which is scheduled to be completed by the end of 1995.
** MMDB Content and Updates
As new PDB data becomes available from Brookhaven, MMDB is
updated. The PDB data are also changed in both form and content. PDB
data are checked and validated for consistency in the purported
chemistry, the sequence, and the 3D coordinates. Entrez users may
occasionally notice, for example, that the sequence of a PDB-derived
entry differs slightly from the PDB file, since all non-standard or
chemically modified residues, as judged by their 3D structure, are
explicitly identified as such in MMDB. These changes to PDB data are
intended to support computational applications such as homology
modeling and structure comparison. The MMDB database also differs from
PDB in that it provides pre-computed "views" of structures, containing
increasing levels of detail. MMDB also has explicit secondary
structure information in addition to any provided by PDB, and this
information is used to create vector models for the purposes of
structural comparison and alignment. As you can see, MMDB is a "value-
added" structure database.
------------------------------------------------------------------
** Availability
Network Entrez clients are available for Macintosh, MS Windows, MS
Windows 95, MS Windows NT, most UNIX machines under X11, and VMS under
X11, among others, via anonymous FTP from ncbi.nlm.nih.gov. We are
still refining the ergonomics and presentation and we welcome comments
and suggestions. This software should be considered in "beta" test.
There will be a series of software updates throughout the rest of
the year. We welcome comments, suggestions, and offers of curatorial
assistance with reference sequences. You can reach NCBI's Entrez
development team by sending e-mail to:
toolbox at ncbi.nlm.nih.gov.
** Credits
The genomes division and the graphical viewers for Entrez have been
built by: Jonathan Kans, Jinghui Zhang, Alex Smirnov,
Jonathan Epstein, Greg Schuler, Tatiana Tatusov, John Kuzio,
Colombe Chappey and Jim Ostell
The structure database for Entrez (MMDB) has been a joint project of
Hitomi Ohkawa, Christopher Hogue, Steve Bryant, Jonathan Kans,
Jonathan Epstein, Greg Schuler and Jim Ostell
The NCBI would like to acknowledge the following sources for their
contributions to the map information in the GenBank Genomes division:
Drosophila Physical Map Bill Gelbart and Wayne Rindone,
Harvard
Human Genetic Map Ken Buetow, CHLC
Human Physical Map Lincoln Stein, Whitehead Institute,
MIT
Human Radiation Hybrid Map Kathleen McKusick and David Cox,
Stanford
Mouse Genetic Map Prakash Nadkarni, Yale, and Janan
Eppig, Jackson Labs
==================================================================
Comments and suggestions are welcome! A FAQ (Frequently Asked
Questions) document about using Entrez, MMDB, Kinemage and can be
obtained by e-mail request to:
info at ncbi.nlm.nih.gov.
If you want to be added to the mailing list for our free newsletter
which will announce our new developments and projects, please send your
complete postal address to the e-mail address above.
DOCUMENT REVISION HISTORY:
Date | Change
======================================================================
12-12-95| Add revision history, and updated version dates for
| SGI compiled for IRIX 5.3
-----------------------------------------------------------------------
02-06-96| Updated this document to show new version installed (4.017)
-----------------------------------------------------------------------
03-15-96| Updated this document to show new version installed (4.022)
| added Linux version.
-----------------------------------------------------------------------