lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081218085524.12622.qmail@science.horizon.com>
Date:	Thu, 18 Dec 2008 03:55:24 -0500
From:	"George Spelvin" <linux@...izon.com>
To:	srostedt@...hat.com, peterz@...radead.org, linux@...izon.com
Cc:	tj@...nel.org, linux@...izon.com, linux-kernel@...r.kernel.org,
	andi@...stfloor.org
Subject: Re: [RFC] globmatch() helper function

Eureka!

Never mind all of this angst; I've figured out a non-recursive way
to do it.  Thanks for pushing me to think a bit harder and find it.
(I was actually looking for an example of the exponential pathology I
kept referring to when it dawned on me that it's actually impossible
without the additional expressive power of regexps.)

Somewhat abusing TeX's line-breaking terminology, consider a shell glob to
consist of a series of "boxes" and "glue"  A box is a run of character
classes (of which literal characters and ? are degenerate cases).  The
point is, a box has a well-defined width, the number of characters in the
string that it must match.

A *, on the other hand, is infinitely elastic "glue" between boxes that
matches an unknown number of characters.

So consider a pattern of the form box1*box2*box3*...
Assuming box1 has matched, we search forward, trying various widths of
the glue *, to find a point where box2 matches.  Then we start searching
for the width of the second *.  But!  If we can't match the tail of the
pattern box3*... at any position to the right of the end of box2, there
is no point trying to move box2 forward and search again.  Any solution
where box2*box3*... would match starting k characters later in the string
would also be a match with box2 (only) shifted k characters left, i.e.
in its original position.

So we can match boxes greedily: once we've found the leftmost place
where one matches (that is still to the right of all previous boxes),
we never need to try any other positions.  Moving a box to the right
can never produce a match which doesn't exist in its previous position.

I'll rewrite it to be non-recusrive and quadratic in the worst case.


Thanks; I feel stupid for not maving looked this up, but smart for having
thought of it myself.

(For a discussion of how to get exponential backtracking in a regular
expression, see http://swtch.com/~rsc/regexp/regexp1.html)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ