johnp at worf.molbiol.ox.ac.uk (John Peden) writes:
> Problem on Digital UNIX 4.0 and SRS
>> I have been having trouble with a memory leak in getz for several months, the leak was present in SRS4 and is still
> present in SRS5.05. The problem occurs when multiple sequence entries are read from disk and written to standard
> output. Memory is allocated to hold each sequence when it is read, but the memory is not released after the
> sequence has been written to the standard out. This process repeats until memory is exhausted.
>> getz -sf fasta -f "id des seq" "[libs={genbank embl}-sl#45:] ! ([libs-org:escherichia coli] | [libs-org:metazoa])"
>>> After the query is processed and the first sequence is outputted, my memory usage is approximately 34Mb.
> However on my system memory usage continues to grow until it hits either the soft or hard memory limit
> (the hard limit is 600 Mb).
I tried this (using emblnew in place of genbank) on our Digital UNIX
4.0 system. "top" showed "SIZE: 45M RES: 29M" when the output
started. I guess the exact sizes depend on the databases being used.
The SIZE and RES figures did indeed creep up as the sequences were
written. That's a lot of sequences for one query, so I didn't wait for
the end :-)
Could this be somewhere in the parser, for example as the sequence
is being read by Icarus?
We use EMBL and EMBLNEW in the original flatfile format, but we have
made some changes to the embl.is file because in FASTA format there
was no ID for flatfile format databases (all a question of forcing the
ID to be parsed).
--
----------------------------------------------------------------------
Peter Rice | Informatics Division, The Sanger Centre,
E-mail: pmr at sanger.ac.uk | Wellcome Trust Genome Campus,
Tel: (44) 1223 494967 | Hinxton, Cambridge, CB10 1SA, England
Fax: (44) 1223 494919 | URL: http://www.sanger.ac.uk/Users/pmr/