IUBio

[Bio-software] Re: sequence partitioning

Mikhail Fursov via bio-soft%40net.bio.net (by mike.fursov from gmail.com)
Thu Feb 4 12:54:01 EST 2010


Hi Michal,

Have you read about suffix arrays or suffix trees?

http://en.wikipedia.org/wiki/Suffix_array

The 'suffix' theory could help you in your task probably.


On Feb 4, 12:38 am, Michal Walicki <Michal.Wali... from ii.uib.no> wrote:
> Gentle people,
>   I started to work with sequences a couple of months ago and
> encountered a problem which may be different from those faced by
> biological sequence analysis but, hopefully, you can give me an
> answer or, at least, point me in the right direction.
>
> The problem:
> partition a given sequence S into a minimal number of patterns.
>
> (We assume some finite alphabet and, of course, S is finite. The
> problem gets a bit different flavour when S may not have a partition,
> so let us assume that a partitioning of S is possible.)
>
> A pattern P is any sequence with all symbols distinct and length at least 2.
> P occurs in a sequence S if P is a subsequence of S (not necessarily
> a substring).
> A subsequence of S is an
> - do(P), disjoint occurence of P, if it consists of (at least 2)
> disjoint occurrences of pattern P
> and it is an
> - mdo(P), maximal do(P), if it is do(P) and there are no occurrences
> of P in the rest of S, i.e., in
> S \ mdo(P) = the subsequence of S obtained by removing the subsequence mdo(P).
>
> A partition (exact cover) of a sequence S is given by
>    a) a set of mutually distinct (not necessarily disjoint) patterns
> P1...Pn, with
>    b) mutually disjoint, nonempty subsequences do(P1)...do(Pn) of S which,
>    c) together, constitute the whole S.
> It is minimal if there is no partition with a smaller set a) (with
> fewer patterns).
> It is MDO if all do(Pi) in point b) are, actually, mdo(Pi).
>
> Question 1.
> The problem seems - obviously - NP-hard but does anybody know of a
> proof of that?
>
> Question 2.
> Suppose that
>     i) there is a minimal partition with N patterns, and
>    ii) there is some MDO partition (obviously, with at least N patterns).
> Is there, then, an MDO partition with N patterns?
>
> Conjecture 2 (the answer Yes to question 2) is quite intriguing
> since, on the one hand, it seems plausible - a minimal partition
> should contain mdo's of involved patterns but, on the other hand,
> there seems to be no reason why it should happen to be the case. (If
> we drop the assumption ii), it is, in fact, wrong, i.e., there are
> sequences which can be partitioned with N patterns, but which have no
> MDO partition.) We have run tens of tests (on random sequences (with
> exact partitions) up to 20-30 symbols) and never met a counterexample
> to Conjecture 2.
>
> Any hints are appreciated
>
>         Michal
> --
>
> --------------------
> Assoc. Prof. Michal Walicki
> Institute of Informatics
> University of Bergen
> 5020 Bergen, Norwayhttp://www.ii.uib.no/~michal
> (on leave at IST, Lisbon, until 8.2010)




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net