FW: Announcing RefSeq Release 1

Pruitt, Kim (NIH/NLM/NCBI) pruitt at ncbi.nlm.nih.gov
Wed Jul 2 19:25:39 EST 2003


There is a typo in the FTP site links, the correct URL is:

                       all lower case

Sorry for any inconvenience this may have caused.  

-----Original Message-----

To: 'genbankb at net.bio.net'
Sent: 7/2/2003 5:03 PM
Subject: Announcing RefSeq Release 1

This announcement is being provided to the GenBank
newsgroup because of the high likelihood that GenBank
users will have interest in the RefSeq database 

ANNOUNCING: RefSeq Release 1

RefSeq Release 1, the first full release of all NCBI RefSeq records, 
is now available by anonymous FTP at:

The NCBI RefSeq project is an ongoing effort to provide a curated, 
non-redundant collection of reference sequences, representative 
of the central dogma (genomes, transcripts, protein), for each major 

This first release includes all of the sequence data that we have 
collected at this time. Although the RefSeq collection is not yet 
complete, its value as a non-redundant dataset has reached a level 
that justifies providing full releases.  

This full release, Release 1, incorporates genomic, transcript, and 
protein data available as of June 30, 2003 and includes over 
785,000 proteins and sequences from 2005 different organisms.

The release is provided in several directories as a complete dataset and
also as divided by logical groupings. The number of species represented
in each Release sub-directory, determined by counting distinct tax IDs,
is as follows:

        complete                2005
        fungi                   27
        invertebrate            80
        microbial               334
        mitochondrion           417
        plant                   30
        plasmid                 36
        plastid                 31
        protozoa                39
        vertebrate_mammalian    74
        vertebrate_other        206
        viral                   1179

The total number of accessions and length (number of nucleotides 
or amino acids, per type of molecule, is as follows:

   Type         Accessions        Length 
   Genomic:       64729           4339114280 
   RNA:           211803          333757669 
   Protein:       785143          263588685 

RefSeq Release 1 is available by anonymous FTP at:

Release notes documenting the scope and contents of the release are
provided at:

A catalog documenting the contents of the release is available at:

Release statistics are available at:

Additional information about the RefSeq project is available at:

  1. The NCBI RefSeq Web Site:   
  2. The NCBI Handbook 
     The Reference Sequence (RefSeq) Project. 
     Available from:  

Please send questions, comments, and suggestions concerning the RefSeq
release or the RefSeq project to:

        info at ncbi.nlm.nih.gov

- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/       
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb      
- GenBank on the WWW, see:  http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at cmmt.ubc.ca                  

More information about the Genbankb mailing list

Send comments to us at biosci-help [At] net.bio.net