IUBio

PC/Mainframe - data vs. storage

frist at ccu.umanitoba.ca frist at ccu.umanitoba.ca
Tue Jun 23 09:44:11 EST 1992


In article <1992Jun19.122907.12041 at athena.cs.uga.edu>, russell at dogwood.botany.uga.edu writes:

>Last I heard, GenBank was increasing at ~25% of total size per quarterly
>release.  The specific figure for the current release of EMBL (31) is 14%.
>(Both GenBank and EMBL should grow by the same amount, with a difference in
>rate determined only by the difference in size of the base pool of sequences
>which haven't been exchanged with each other yet.)  I don't have figures at my
>fingertips for protein sequence databases, but I expect that the amounts of
>growth are similar, especially if one considers predicted sequences derived
>from nucleic acid sequences.

Contrary to common wisdom, the growth rates are lower than that. GenBank
grows at an average rate of 10% per quarter, so it takes roughly eight
releases (two years) to double.
 
Here's the data I've been using to justify my requests for additional
space on our system:

GenBank
-------
Releases are issued quarterly. Size of database is either in total
size of all files, or number of nucleotides. Not all figures are 
available for all releases. However, either figure seems to give an 
accurate estimate of growth rate.

Release  ASCII files   growth rate  nucleotides   growth rate
                       (files)        x10e6       (nucleotides)
63.0            ---     ----            40.1            ----            
64.0            116     ----            42.5            1.06
65.0            131     1.13            49.2            1.16
66.0            ---     ----            51.3            1.04
67.0            150     ----            55.2            1.08
68.0            177     1.18            65.9            1.19
69.0            193     1.09            71.9            1.09
70.0            221     1.15            77.3            1.08
71.0            244     1.10            83.9            1.09
72.0            267     1.09            92.2            1.10

Average growth rate per release: 1.10


PIR
---
Release     residues     growth rate
              x10e6
27.0              7.6            ---
28.0              8.1            1.07
29.0              9.1            1.12
30.0              9.7            1.07
31.0             10.4            1.07 
32.0             11.8            1.13

(PIR releases are not always quarterly.)

What is interesting to me is that consistency of the growth rates for both
databases over the last few years. Since this is an exponential rate of
growth, the total amount of sequence data produced worldwide per month or
per year is increasing at a very constant rate, that is , each year we
generate more sequence data than the previous year, at a fairly constant
rate of growth. 

===============================================================================
Brian Fristensky                |  
Department of Plant Science     | "Ya don't have to be a rocket surgeon
University of Manitoba          |  ta know who's who!" 
Winnipeg, MB R3T 2N2  CANADA    |
frist at ccu.umanitoba.ca          | 
Office phone:   204-474-6085    |  - the incomparable Don Cherry 
FAX:            204-261-5732    |
===============================================================================




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net