Have you read about suffix arrays or suffix trees?
The 'suffix' theory could help you in your task probably.
On Feb 4, 12:38 am, Michal Walicki <Michal.Wali... from ii.uib.no> wrote:
> Gentle people,
> I started to work with sequences a couple of months ago and
> encountered a problem which may be different from those faced by
> biological sequence analysis but, hopefully, you can give me an
> answer or, at least, point me in the right direction.
>> The problem:
> partition a given sequence S into a minimal number of patterns.
>> (We assume some finite alphabet and, of course, S is finite. The
> problem gets a bit different flavour when S may not have a partition,
> so let us assume that a partitioning of S is possible.)
>> A pattern P is any sequence with all symbols distinct and length at least 2.
> P occurs in a sequence S if P is a subsequence of S (not necessarily
> a substring).
> A subsequence of S is an
> - do(P), disjoint occurence of P, if it consists of (at least 2)
> disjoint occurrences of pattern P
> and it is an
> - mdo(P), maximal do(P), if it is do(P) and there are no occurrences
> of P in the rest of S, i.e., in
> S \ mdo(P) = the subsequence of S obtained by removing the subsequence mdo(P).
>> A partition (exact cover) of a sequence S is given by
> a) a set of mutually distinct (not necessarily disjoint) patterns
> P1...Pn, with
> b) mutually disjoint, nonempty subsequences do(P1)...do(Pn) of S which,
> c) together, constitute the whole S.
> It is minimal if there is no partition with a smaller set a) (with
> fewer patterns).
> It is MDO if all do(Pi) in point b) are, actually, mdo(Pi).
>> Question 1.
> The problem seems - obviously - NP-hard but does anybody know of a
> proof of that?
>> Question 2.
> Suppose that
> i) there is a minimal partition with N patterns, and
> ii) there is some MDO partition (obviously, with at least N patterns).
> Is there, then, an MDO partition with N patterns?
>> Conjecture 2 (the answer Yes to question 2) is quite intriguing
> since, on the one hand, it seems plausible - a minimal partition
> should contain mdo's of involved patterns but, on the other hand,
> there seems to be no reason why it should happen to be the case. (If
> we drop the assumption ii), it is, in fact, wrong, i.e., there are
> sequences which can be partitioned with N patterns, but which have no
> MDO partition.) We have run tens of tests (on random sequences (with
> exact partitions) up to 20-30 symbols) and never met a counterexample
> to Conjecture 2.
>> Any hints are appreciated
> Assoc. Prof. Michal Walicki
> Institute of Informatics
> University of Bergen
> 5020 Bergen, Norwayhttp://www.ii.uib.no/~michal> (on leave at IST, Lisbon, until 8.2010)