[Bps-public-commit] r16963 - Parse-BooleanLogic/lib/Parse

ruz at bestpractical.com ruz at bestpractical.com
Fri Nov 21 16:43:29 EST 2008


Author: ruz
Date: Fri Nov 21 16:43:28 2008
New Revision: 16963

Modified:
   Parse-BooleanLogic/lib/Parse/BooleanLogic.pm

Log:
* update doc with more info and examples of new features
* add init method that compiles required regular expressions for us
  and supports cusomizable operators and parens

Modified: Parse-BooleanLogic/lib/Parse/BooleanLogic.pm
==============================================================================
--- Parse-BooleanLogic/lib/Parse/BooleanLogic.pm	(original)
+++ Parse-BooleanLogic/lib/Parse/BooleanLogic.pm	Fri Nov 21 16:43:28 2008
@@ -3,7 +3,7 @@
 Parse::BooleanLogic - parser of boolean expressions
 
 =head1 SYNOPSIS
-    
+
     my $parser = new Parse::BooleanLogic;
     my $tree = $parser->as_array( string => 'x = 10' );
     $tree = $parser->as_array( string => 'x = 10 OR (x > 20 AND x < 30)' );
@@ -21,14 +21,30 @@
 
 =head1 DESCRIPTION
 
-This module is quite fast parser for boolean expressions. Original it's been writen for
-Request Tracker for parsing SQL like expressions and it's still capable to, but
+This module is quite fast parser for boolean expressions. Originally it's been writen for
+Request Tracker to parse SQL like expressions and it's still capable, but
 it can be used to parse other boolean logic sentences with OPERANDs joined using
 binary OPERATORs and grouped and nested using parentheses (OPEN_PAREN and CLOSE_PAREN).
 
-=cut
+Operand is not qualified strictly what makes parser flexible enough to parse different
+things, for example:
+
+    # SQL like expressions
+    (task.status = "new" OR task.status = "open") AND task.owner_id = 123
+
+    # Google like search syntax used in Gmail and other service
+    subject:"some text" (from:me OR to:me) label:todo !label:done
+
+    # Binary boolean logic expressions
+    (a | b) & (c | d)
 
+You can change literals used for boolean operators and parens. Read more
+about this in description of constructor's arguments.
 
+As you can see quoted strings are supported and based on delimited strings
+from L<Regexp::Common> with ' and " as delimiters.
+
+=cut
 
 use strict;
 use warnings;
@@ -55,11 +71,42 @@
 
 =head1 METHODS
 
-=head2 new
+=head2 Building parser
+
+=head3 new
+
+A constuctor, takes the following named arguments:
+
+=over 4
+
+=item operators, default is ['AND' 'OR']
+
+Pair of literal strings representing boolean operators AND and OR,
+pass it as array reference. For example:
+
+    # from t/custom_ops.t
+    my $parser = Parse::BooleanLogic->new( operators => [qw(& |)] );
+
+    # from t/custom_googlish.t
+    my $parser = Parse::BooleanLogic->new( operators => ['', 'OR'] );
+
+It's ok to have any operators and even empty.
+
+=item parens, default is ['(', ')']
+
+Pair of literal strings representing parentheses, for example it's
+possible to use curly braces:
+
+    # from t/custom_parens.t
+    my $parser = Parse::BooleanLogic->new( parens => [qw({ })] );
 
-Very simple constructor, returns a new object. Now takes no options and most
-methods can be executed as class methods too, however there are plans to
-change it and using this lightweight constructor is recommended.
+No matter which pair is used parens must be balanced in expression.
+
+=back
+
+This constructor compiles several heavy weight regular expressions
+so it's better avoid building object right before parsing, but instead
+use global or cached one.
 
 =cut
 
@@ -68,6 +115,64 @@
     return bless {@_}, ref($proto) || $proto;
 }
 
+=head3 init
+
+An initializer, called from the constructor. Compiles regular expressions
+and do other things with constructor's arguments. Returns this object back.
+
+=cut
+
+sub init {
+    my $self = shift;
+    my %args = @_;
+    if ( $args{'operators'} ) {
+        my @ops = map lc $_, @{ $args{'operators'} };
+        $self->{'operators'} = [ @ops ];
+        @ops = reverse @ops if length $ops[1] > length $ops[0];
+        foreach ( @ops ) {
+            unless ( length ) {
+                $_ = "(?<=\\s)";
+            }
+            else {
+                if ( /^\w/ ) {
+                    $_ = '\b'. "\Q$_\E";
+                }
+                else {
+                    $_ = "\Q$_\E";
+                }
+                if ( /\w$/ ) {
+                    $_ .= '\b';
+                }
+            }
+            $self->{'re_operator'} = qr{(?:$ops[0]|$ops[1])}i;
+        }
+    } else {
+        $self->{'operators'} = [qw(and or)];
+        $self->{'re_operator'} = qr{\b(?:AND|OR)\b}i;
+    }
+
+    if ( $args{'parens'} ) {
+        $self->{'parens'} = $args{'parens'};
+        $self->{'re_open_paren'} = qr{\Q$args{'parens'}[0]\E};
+        $self->{'re_close_paren'} = qr{\Q$args{'parens'}[1]\E};
+    } else {
+        $self->{'re_open_paren'} = qr{\(};
+        $self->{'re_close_paren'} = qr{\)};
+    }
+    $self->{'re_tokens'}  = qr{(?:$self->{'re_operator'}|$self->{'re_open_paren'}|$self->{'re_close_paren'})};
+# the next need some explanation
+# operand is something consisting of delimited strings and other strings that are not our major tokens
+# so it's a (delim string or anything until a token, ['"](start of a delim) or \Z) - this is required part
+# then you can have zero or more ocurences of above group, but with one exception - "anything" can not start with a token or ["']
+    $self->{'re_operand'} = qr{(?:$re_delim|.+?(?=$self->{re_tokens}|["']|\Z))(?:$re_delim|(?!$self->{re_tokens}|["']).+?(?=$self->{re_tokens}|["']|\Z))*};
+
+    foreach my $re (qw(re_operator re_operand re_open_paren re_close_paren)) {
+        $self->{"m$re"} = qr{\G($self->{$re})};
+    }
+
+    return $self;
+}
+
 
 =head2 Parsing expressions
 



More information about the Bps-public-commit mailing list