Complex query retrieval question

Mark Addinall maddinall at iprimus.com.au
Thu Jan 31 08:53:06 EST 2002

Przemko Tylzanowski <przemko at med.kuleuven.ac.be> wrote in message
news:3C4EAA7E.2842EADA at med.kuleuven.ac.be...
> Hi!
> Long time ago existed a program called TargetFinder- it was used to
> identify sequences of interest (e.g. targets for transcription factors)
> in promoters. The superiority of that program over EPD was that it was
> not limited to 600bp of the promoter. Anyway, italians took it offline
> (after publishing it!). So, I am stuck now.
> What I would like to do is the following. Identify in the GenBank (or
> EMBL- does not matter) all sequences containing promoters, enhancers
> and/or sequences upstream of TATA of mouse or human origin (at this
> point forget about TATA-less). This bit is easy. I can do it using SRS
> (funny part is- it will work on the server in England but not
> Brussels...). But here problems start. What I get as an output is the
> Feature Sequence (I ask for it) but also the rest of the gene. In cases
> of large genomic sequences this is VERY PAINFUL... What I would like to
> do is yo  extract from these initial hits (between 2000-4500 depending
> on the selection of databases) ONLY the sequences containing the
> promoter part (IT WAS POSSIBLE IN SRS4- command line). There I could
> say- get me the feature and all sequences that are -2000 and +100 from
> it. Then I would like to build a database and then run Findpatterns or
> something like that. So, I guess I need a combination of SRS and
> So, HOW DO I DO IT IN SRS6? I know that I could probably write something
> in PERL, the problem is I don't really know it.
> Any suggestions, solutions are welcome!

You can probably do it with awk().

I have an idea what you need.  I'll ask permission first (now)
if you need a hand to implement this just yell. I hate banging
on mailboxes unannounced.

I'd like to make the solution open source if possible.


> Przemko
> --
> Przemko Tylzanowski Ph.D.
> LSD & Joint
> O & N
> University of Leuven
> Herestraat 49
> 3000 Leuven
> Belgium
> phone: (32-16)34-61-96
> fax  : (32-16)34-62-00

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net