Linux MolBio Software (long message)

francis at NCBI.NLM.NIH.GOV francis at NCBI.NLM.NIH.GOV
Sun Mar 17 12:02:50 EST 1996

> From: martin at tutor.oc.chemie.th-darmstadt.de (Martin Kroeker)
> Subject: Re: Linux MolBio Software

> 9224076 at ul.ie wrote:
> : I'm posting this to ascertain whether anyone out there has compiled any 
> : sort of list of molbio/biology software that runs under Linux (the 
> : freeware unix clone for intel pc's). I've found this to be a powerful 
> : operating system for numerous applications including items such as 
> : Rasmol, DNAWorkBench and Raster3d. 

> XtalView, WhatIf, Mage... - any generic list of unix mol/bio/chem software 
> should do. At the current state of development, i would expect almost any 
> molbio software for unix systems to run under linux. Perhaps a list of 
> programs that _do_not_ work would be more useful, though i assume that it 
> would mainly contain either hardware (e.g. SGI graphics)- specific things 
> or commercial programs where the vendor does not provide either source code
> or a linux binary.

yes, that would probably be the case ... but a list of known packages
where people have bothered to port it may be useful for the
molbio-Linux community ...

another one to add to the List is Entrez ... I append the README_2 for
those of you who are still using version 3.x or earlier ... version 4.x
has much more to offer!



| B.F. Francis Ouellette  
| GenBank
| francis at ncbi.nlm.nih.gov   


Last update to this document: 

March 15, 1996

README_2 for the beta version of Network Entrez.

This document describes the software that can be obtained by 
anonymous FTP  from ncbi.nlm.nih.gov

in this directory:


please note that Network Entrez users should read
these documents



       National Center for Biotechnology Information (NCBI)
             National Library of Medicine
             National Institutes of Health
             8600 Rockville Pike
             Bethesda, MD 20894, USA
             tel: (301) 496-2475
             fax: (301) 480-9241
             e-mail: info at ncbi.nlm.nih.gov

ver 4.000 Oct. 05, 1995
ver 4.010 Oct. 11, 1995
ver 4.012 Nov. 08, 1995
ver 4.013 Nov. 30, 1995 (Unix)/Dec 02, 1995 (mswin)/Dec 03, 1995 (Mac)
ver 4.017 Feb. 06, 1996 (Unix/mswin)/Feb. 05, 1996 (Mac)
ver 4.022 Mar. 15, 1996 (Unix/mswin)

                     ***** README *****

This README presents some of the highlights of the beta version of 
Network Entrez which incorporates a new genomes division and also 
presents MMDB (Molecular Modeling DataBase).

This directory, and the ones below, include the latest version
of the network Entrez program which will run on various platforms:


Nentrez.hqx                :Mac - binhexed 
alphaOSF1.tar.Z            :alpha - compressed 
linux.tar.Z                :Linux - compressed
   win32/nentrezZ.exe      :NT/Win95 (32 bit) self extracting 
   winsock1.1/nentrezZ.exe :Win 3.1 (16 bit) self extracting 
sgi.tar.Z                  :SGI - compressed (IRIX 4.0)
sgi5.3.tar.Z               :SGI - compressed (IRIX 5.3)
solaris.tar.Z              :Sun - Solaris - compressed
sun.tar.Z                  :SunOS - compressed

	You need only copy (in binary mode) the executable version 
for the platform you need and the two README files (README and
README_2)  This file is README_2.  If you desire binaries of 
Network Entrez for a platform not presented above, you can ask us, and 
we will see if we can compile one for you, on an unsupported basis.  
Please send these requests to:

toolbox at ncbi.nlm.nih.gov

You should already be familiar with Entrez (Network or CD-ROM)
if you are reading this document, but if you are not, you may 
want to read the first README document presented in this directory, 
as well as the Entrez user manual present in BinHexed format
(Macintosh) in this file:


	This new version of Network Entrez (present in 
/entrez/network/) should still be considered "beta" and users 
should be aware that although we have taken every precaution to ensure 
that this program will work without any problems, it may not always 
perform as expected.  We are also still modifying the code to add new 
features, and you should visit this directory to make sure you have the 
most up to date version.  

The build dates on the writing of this document were:

Feb. 05  Nentrez.hqx
Mar. 15  alphaOSF1.tar.Z
Mar. 15  linux.tar.Z
Mar. 15      win32/nentrezZ.exe
Mar. 15      winsock1.1/nentrezZ.exe
Mar. 15  sgi.tar.Z
Mar. 15  sgi5.3.tar.Z
Mar. 15  solaris.tar.Z
Mar. 15  sun.tar.Z

The current version is: 4.022

	We are also changing the ergonomics and layout of some of the 
features on this new programs, and suggestions and feedback are very 
welcome.  These should be sent to this e-mail address:

toolbox at ncbi.nlm.nih.gov

	The text which follows explains some of the new features of 
this version of Network Entrez, as well as some of the ways one must 
manoeuver to visit some of these new features.  Again, we are assuming 
that you are already familiar with Entrez, and the general way of going 
from one information space to another by linking or neighboring.

   **** Network Entrez: From Genome to Structure ****

	The NCBI has made a major new release of Entrez available in 
October 1995.  The new release adds graphical access to a new "genomes" 
division of GenBank as well as graphical views of standard Entrez 
sequence records.  The new release also provides a database of 3-
dimensional structures derived from the PDB crystallographic database.  

** Graphical Views of Sequences.

	A tabbed-folder sequence viewer has been added to Entrez.  This 
allows the quick selection of alternate report formats for a sequence 
entry, including GenBank, EMBL, and a graphical representation.  The 
viewer is resizable, and permits easy visualization of complex 
annotations such as segmented sequences or alternative splicing in 
coding regions.

** Complete Genomes in Entrez and GenBank **

	Network Entrez now offers a new "genomes" division which 
presents genome level views of a large number of complete chromosomes,
from organelle, through virus and phage, to completely sequenced
chromosomes from yeast or bacteria, to integrated genetic and physical
maps and contiged sequence islands from eukaryotes such as Human, mouse
and Drosophila.  Following the Entrez tradition, the chromosome views
are tightly linked to DNA and protein sequence records, MEDLINE
citations, and the new three dimensional structure division described

** Small Genomes

	A number of chromosomes from viruses or organelles have been 
completely sequenced and available from GenBank for some time.  
However, there are often multiple versions of these sequences, parts of 
the sequences, or of population variants.  NCBI has selected a 
reference sequence in these cases, then searched the database and 
aligned the other versions of sequence from the same chromosome with 
the reference sequence.  In the genomes division of Entrez, selecting 
such an entry will bring up a graphical map showing the coordinate 
system of the whole chromosome.  Selecting all or part of this map with 
the mouse designates the region to be displayed in the other viewers.  
This is done by "rubber-banding" or "click & dragging" of the are of 
interest.  Once you have selected the area of interest in the map view, 
you must then choose one of the other views (e.g. Graphic) by clicking 
on the appropriate "TAB".  Choosing the Graphic view shows the detailed 
feature table of the selected region of the reference sequence and the 
positions of other GenBank records that align to it.  Vertical black 
lines below the aligned sequences indicate insertions relative to the 
reference sequence, while black lines within the sequence indicate 
gaps.  In the Alignment view, sequences aligned to the reference are 
retrieved over the network and a new display is constructed which shows 
the coding region features on the reference sequence AND those on the 
aligned sequences, permitting comparison of annotation between entries 
within the alignment.  In addition, red lines now show mismatches 
between the aligned sequences as well as insertions and deletions as 
before. If you click on these records, you will see the
GenBank flatfile view, but if you 'rubber-band' an area of
interest, you will see the the alignment of the sequence you

** Chromosomes from Contigs

	Larger chromosomes have recently been completely sequenced from 
yeast and Haemophilus.  These records exist in GenBank as many smaller 
overlapping records, as required by the international guidelines for 
sequence data exchange to ensure compatibility with existing software 
tools, and to provide convenient units of data for updating or detailed 
analysis (See NCBI newsletter: Sept. 1995).  Entrez provides a view for 
these chromosomes which presents a virtual sequence representing the 
whole chromosome, with bands of alternating colors to indicate where it 
is made from different GenBank records.  The Graphic and Alignment 
views use the same display, but also show the details of overlaps of 
the pieces, as well as the features and alignments described above.  
Once you have located a region of interest in the genome view, you can 
readily retrieve the appropriate constituent record with a double click 
of the mouse.  As larger chromosomes become available (and this is true 
for all the examples in the next section: "Integrated Maps") and larger 
amounts of data are requested you may reach the upper limit of the 
system where too much of a chromosome was selected.  You will simply 
see a warning message, and the maximum default size will have been 
selected, so you can now "TAB" to the Graphical view.

** Integrated Maps

	In the higher eukaryotes, relatively small parts of chromosomes 
have been sequenced.  In these cases, the NCBI has collected various 
genetic and physical maps for a particular organism, mapped them onto a 
common coordinate system, and aligned any markers they share.  The 
beginning of a sequence map for the chromosome is made using contigs of 
sequence from the same region and organism, then placing the composite 
sequence onto the coordinate system provided by the integration of the 
maps.  For Human, these composites are known as the "UniGene" set and 
are being used as mapping reagents by collaborating groups.  As more of 
the UniGene-derived markers are placed on the maps, more and more 
sequence records will also be placed.  For Human, the Map view shows 
the integrated map which includes Genethon (as derived from MIT), the 
MIT physical map, the CHLC framework map, the GDB cytogenetic map, the 
Stanford radiation hybrid map (at present just for chromosome 4), and 
the NCBI sequence map.  

** 3D Structure in Network Entrez **

	Network Entrez now includes an explicit 3D structure database, 
based on crystallographic and NMR structure determinations.  Structure 
data can provide a wealth of information on the biological function and 
mechanism of action of macromolecules.  By adding the structure 
database to Entrez we hope to make this information easily accessible 
to biologists.

	The structure data comprise a new database from NCBI called MMDB 
(Molecular Modeling DataBase), derived from the Brookhaven Protein 
DataBank 3-dimensional structures (currently over 3,000 biomolecules).  
MMDB is a database of ASN.1-formatted records, not PDB formatted 
records.  MMDB is capable of archiving conventional structure data as 
well as future descriptions of biomolecules, such as those generated by 
electron microscopy (surface models).  

** Searching Structures in MMDB

	The structure database may be queried directly, using specific 
fields such as author names, or text terms occurring anywhere in the 
structure description.  One may in this way check for structure data on 
a specific protein or nucleic acid.  A more powerful approach, however, 
is to identify the molecule of interest in the sequence or MEDLINE 
databases, identify its sequence neighbors (homologues), and then, by 
linking to the structure database, ask whether structure data is 
available for any of the members of this family.  It is smaller than 
the protein or nucleotide databases, but very many sequenced proteins 
have homologues in this set, and one may often learn more about a 
protein by examining the 3D structure of its homologues.  Soon Entrez 
will include 3D structural neighbors, which can help link protein 
neighbors in the "twilight zone" of sequence homology.

** Viewing Structures

	Structure data from Entrez may be viewed in 3D, with real-time 
rotation, using the public domain graphics programs RasMol or Kinemage.  
Both are freely available for many platforms, including Mac, Windows 
and UNIX.  Entrez itself simply writes structure "documents" in the 
format required by these and other programs, including "PDB" format.  
Future versions of Entrez will invoke an integrated 3D structure 

	You'll need to save a structure file and run another structure 
viewer program that will read in the saved structure.  The Structure 
database provides you with a means of saving either a PDB-formatted 
file or a Kinemage formatted file.  You must first obtain these or 
similar programs in order to *see* these structures.

RasMol can be obtained from the author (Roger Sayle) by anonymous FTP 
to:  ftp.dcs.ed.ac.uk 

in the directory:


or via your favorite WWW browser at this URL:


Kinemage can be obtained by anonymous FTP from the authors (Robert M. 
Weiss and David C. Richardson) at:  


in one of these directories 

/pub/PCprograms (use Windows MAGE_3_3.EXE, avoid MAGE_4_2.EXE) 

	Please familiarize yourself with these programs and try them out 
first with their own demonstration files before using it to view 
Entrez-generated structure files.  NCBI does not offer user-support for 
these programs, however we will be supporting an integrated Entrez 3D 
structure viewer which is scheduled to be completed by the end of 1995.

** MMDB Content and Updates

	As new PDB data becomes available from Brookhaven, MMDB is 
updated.  The PDB data are also changed in both form and content.  PDB 
data are checked and validated for consistency in the purported 
chemistry, the sequence, and the 3D coordinates.  Entrez users may 
occasionally notice, for example, that the sequence of a PDB-derived 
entry differs slightly from the PDB file, since all non-standard or 
chemically modified residues, as judged by their 3D structure, are 
explicitly identified as such in MMDB.  These changes to PDB data are 
intended to support computational applications such as homology 
modeling and structure comparison.  The MMDB database also differs from 
PDB in that it provides pre-computed "views" of structures, containing 
increasing levels of detail.  MMDB also has explicit secondary 
structure information in addition to any provided by PDB, and this 
information is used to create vector models for the purposes of 
structural comparison and alignment.  As you can see, MMDB is a "value-
added" structure database.


** Availability

	Network Entrez clients are available for Macintosh, MS Windows, MS 
Windows 95, MS Windows NT, most UNIX machines under X11, and VMS under 
X11, among others, via anonymous FTP from ncbi.nlm.nih.gov.  We are 
still refining the ergonomics and presentation and we welcome comments 
and suggestions.  This software should be considered in "beta" test.  
There will be a series of software updates throughout the rest of 
the year.  We welcome comments, suggestions, and offers of curatorial 
assistance with reference sequences.  You can reach NCBI's Entrez 
development team by sending e-mail to:

toolbox at ncbi.nlm.nih.gov.

** Credits

The genomes division and the graphical viewers for Entrez have been 
built by: Jonathan Kans, Jinghui Zhang, Alex Smirnov, 
Jonathan Epstein, Greg Schuler, Tatiana Tatusov, John Kuzio,
Colombe Chappey and Jim Ostell

The structure database for Entrez (MMDB) has been a joint project of 
Hitomi Ohkawa, Christopher Hogue, Steve Bryant, Jonathan Kans, 
Jonathan Epstein, Greg Schuler and Jim Ostell

The NCBI would like to acknowledge the following sources for their 
contributions to the map information in the GenBank Genomes division:

Drosophila Physical Map    Bill Gelbart and Wayne Rindone, 
Human Genetic Map          Ken Buetow, CHLC
Human Physical Map         Lincoln Stein, Whitehead Institute, 
Human Radiation Hybrid Map Kathleen McKusick and David Cox, 
Mouse Genetic Map          Prakash Nadkarni, Yale, and Janan 
                           Eppig, Jackson Labs


Comments and suggestions are welcome!  A FAQ (Frequently Asked 
Questions) document about using Entrez, MMDB, Kinemage and can be 
obtained by e-mail request to:

info at ncbi.nlm.nih.gov.

If you want to be added to the mailing list for our free newsletter 
which will announce our new developments and projects, please send your 
complete postal address to the e-mail address above.


 Date    | Change
 12-12-95| Add revision history, and updated version dates for
	 | SGI compiled for IRIX 5.3
 02-06-96| Updated this document to show new version installed (4.017)
 03-15-96| Updated this document to show new version installed (4.022)
         | added Linux version.

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net