[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081218085524.12622.qmail@science.horizon.com>
Date: Thu, 18 Dec 2008 03:55:24 -0500
From: "George Spelvin" <linux@...izon.com>
To: srostedt@...hat.com, peterz@...radead.org, linux@...izon.com
Cc: tj@...nel.org, linux@...izon.com, linux-kernel@...r.kernel.org,
andi@...stfloor.org
Subject: Re: [RFC] globmatch() helper function
Eureka!
Never mind all of this angst; I've figured out a non-recursive way
to do it. Thanks for pushing me to think a bit harder and find it.
(I was actually looking for an example of the exponential pathology I
kept referring to when it dawned on me that it's actually impossible
without the additional expressive power of regexps.)
Somewhat abusing TeX's line-breaking terminology, consider a shell glob to
consist of a series of "boxes" and "glue" A box is a run of character
classes (of which literal characters and ? are degenerate cases). The
point is, a box has a well-defined width, the number of characters in the
string that it must match.
A *, on the other hand, is infinitely elastic "glue" between boxes that
matches an unknown number of characters.
So consider a pattern of the form box1*box2*box3*...
Assuming box1 has matched, we search forward, trying various widths of
the glue *, to find a point where box2 matches. Then we start searching
for the width of the second *. But! If we can't match the tail of the
pattern box3*... at any position to the right of the end of box2, there
is no point trying to move box2 forward and search again. Any solution
where box2*box3*... would match starting k characters later in the string
would also be a match with box2 (only) shifted k characters left, i.e.
in its original position.
So we can match boxes greedily: once we've found the leftmost place
where one matches (that is still to the right of all previous boxes),
we never need to try any other positions. Moving a box to the right
can never produce a match which doesn't exist in its previous position.
I'll rewrite it to be non-recusrive and quadratic in the worst case.
Thanks; I feel stupid for not maving looked this up, but smart for having
thought of it myself.
(For a discussion of how to get exponential backtracking in a regular
expression, see http://swtch.com/~rsc/regexp/regexp1.html)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists