Database integration with sequence analysis software - how?

Piotr Kozbial piotrk at ibb.waw.pl
Tue Jul 6 14:14:00 EST 1999

I am interested in testing several ideas about organization of genomic

Could you please send me references about:

1. Sequences management in relational databases.
Databases, I know, store data in tables and rows, but sequences seems to
be stored in flat files (i.e. in FASTA format). Is it good idea to chop
the sequences and transfer them into relational database? Some kinds of
sequences are well suited for storage in relational database (i.e.
protein and cDNA sequences), but genomic sequences are not. Is it good
idea to cut genomic sequences into fragments containing  ORFs with
theirs upstream and downstream sequence, and with some positioning
information (i.e.. IDs of upstream and downstream ORFs). With each ORF
in the database it is possible to store additional information (computed
or taken from known literature) like: 
-cDNA sequence, 
-IDs of known aa motives, 
-ID of known conserved structural domains,
-ID of interacting proteins,
-pre computed information about structural, sequence, and functional
homologies (similar to "neighbors" in NCBI databases),
-all other information (especially raw experimental data),

2. There are lots of tools for sequence analysis written in perl, c,
c++, etc.
How the interface between the database and the tools should be designed?
Are there any examples?


