[rt-users] fun with Klez

Smylers smylers at gbdirect.co.uk
Tue Aug 13 11:29:26 EDT 2002


Vivek Khera wrote:

> >>>>> "BC" == Bruce Campbell <bruce_campbell at ripe.net> writes:
>
> BC> 	    if ($RTLoop =~ /^\s*$RT::rtname\s*$/ ) {
>
> I'm using it but modified a bit for efficiency:
>
>     if ($RTLoop =~ m/\s?$RT::rtname\s?/) {
>
> since the * is greedy.

What's wrong with that?  I was always under the impression that greedy
modifiers are more efficient that non-greedy versions because the greedy
versions don't involve backtracking.

Also, expressions that are anchored are much more efficient than those
that aren't.  In Bruce's regexp Perl just has to start looking at the
start of the string.  It goes along one character at a time until it
finds something that isn't whitespace.  Then it looks to see if that
character is the first character of $RT::rtname, and so on.  If it finds
a non-matching character the whole match _must_ have failed -- there is
no other 'start of string' for it to look -- so it fails quickly.

A non-greedy version could look like this:

  if ($RTLoop =~ m/\s*?$RT::rtname\s*?/) {

Perl would first have to iterate through the string till it finds a
whitespace character.  Then it looks if $RT::rtname matches.  If it
doesn't, it has to go back to the whitespace character it's already
found and see if the following character is also whitespace and possibly
try $RT::rtname again.  Once it's exhausted all lengths of whitespace
starting from the first stretch of whitespace it tries again starting at
the second stretch of whitespace.  That's considerably more effort.

However what you wrote isn't non-greedy -- it just uses ? instead of *,
replacing one greedy quantifier with another.  So you're greedily
matching either one or zero spaces before and after $RT::rtname.  Since
the expression isn't anchored, that's completely pointless; these two
statements are equivalent:

  1 Does the string contain "ham" somewhere in its length?

  2 Does the string contain "ham" somewhere in its length, with that
    "ham" preceeded by either at least one character of whitespace or no
    whitespace at all?

\s? would only match one whitespace character, but because the string
isn't bounded there isn't anything to stop the (non-matching) character
just before the matching space also being a space.

Note in particular that both of the above would match "sham" and
"shamed" as well as "ham" and " ham ", so your regexp matches many more
things than Bruce's.

I suspect that \Q is needed in there somewhere though, in case
$RT::rtname contains any special characters (such as a . in a domain
name):

  if ($RTLoop =~ /^\s*\Q$RT::rtname\E\s*$/ ) {

Smylers
-- 
GBdirect
http://www.gbdirect.co.uk/





More information about the rt-users mailing list