[orm] Explicit ASTs (also sent to the DBIC list)

Sun Sep 3 09:09:32 EDT 2006

David E. Wheeler wrote:
 > On Sep 2, 2006, at 13:53, Matt S Trout wrote:
 >
 >>    DWIM AST -> explicit AST -> SQL
 >
 > Agreed. This way, the AST does not have to be specific to any one 
 > back end (PostgreSQL, SQLite, MySQL, LDAP, a object iterator, etc.).

Precisely. And we can have as many front-ends as we want (we'll need at 
least an SQL::Abstract DWIM AST -> explicit AST for our and for Nate's 
usage for a start, and I think figuring out how to compile Jifty::Query 
to this is probably the way to enable its usage against DBIC, which I 
know audrey wanted to happen).

 >> I reckon the basic query structure can be a hashref since we have
 >> @lots_of_clauses (and since we're bound to miss something first 
 >> time round)
 >> but each clause should probably be in arrayref form, so we have 
 >> something like
 >>
 >>    foo = 'yay' AND bar LIKE 'wooho%'
 >
 > I'll just assume that you're using explicit SQL here, since we're not 
 > discussing the DWIM part. IOW, You're saying, to get the above, one 
 > has to create an AST like the below, yes?

Right.

 >> becoming
 >>
 >>    [ -op, 'AND',
 >>      [ -op, '=', [ -name, 'foo' ], [ -bind, 'yay' ] ],
 >>      [ -op, 'LIKE', [ -name, 'bar' ], [ -bind, 'wooho%' ] ],
 >>    ],
 >>
 >> and a function call like
 >>
 >>    COUNT( * )
 >>
 >> would become
 >>
 >>    [ -func, 'COUNT', '*' ]
 >
 > Sure, that works for me.
 >
 >> the basic principle here is that an op generally takes a specific 
 >> number of
 >> arguments, whereas a function is a containing clause that can take a
 >> potentially arbitrary number. So a complete statement might be 
 >> something like
 >>
 >>    SELECT my.a, my.b FROM my_table my WHERE spork = 1
 >>
 >> and that would become
 >>
 >>    {
 >>      select => [ [ -name, 'my', 'a' ], [ -name, 'my', 'b' ] ],
 >
 > The '-name' key might be unnecessary, no? Unless there's something 
 > else that might be included? Oh, wait, there is. It should probably 
 > be something like:
 >
 >    [ [ -name => 'my', -alias => 'a' ],
 >      [ -name => 'my', -alias => 'b' ],
 >    ]

Err, no, that's going the wrong way -

  [ -name, 'my', 'a' ] => my.a
  [ -alias, [ -name, 'my', 'a' ], 'a' ] => my.a AS a

I think.

 > If we're going to be precise (and pedantic). No? And since there 
 > multiple keys per token, it should actually probably be:
 >
 >    [ { -name => 'my', -alias => 'a' },
 >      { -name => 'my', -alias => 'b' },
 >    ]

No, not at all. The structure logically nests the way shown above - that 
should enable things like

  [ -alias,
    [ -func, 'COUNT',
      [ -func, 'DISTINCT',
        [ -name, 'my', 'a' ]
      ]
    ],
    'a_cnt'
  ]

for "COUNT(DISTINCT my.a) AS a_cnt"

 >>      from => [ [ -alias, 'my_table', 'my' ] ],
 >
 > That would have to be:
 >
 >        from => [ { -name => 'my_table', -alias => 'my' } ]

More

  [ [ -alias, [ -name, 'my_table' ], 'my' ] ]

to preserve the nesting (again that -name could be replaced with a 
subselect, or a function invocation for SELECT * FROM my_sproc(...), or 
etc.)

 >>      where => [ [ -op, '=', [ -name, 'spork' ], [ -bind, 1 ] ] ],
 >
 > And here:
 >
 >       where => [ [ -op, '=', { -name => 'spork', -bind => 1 } ] ],
 >
 > No?

No, definitely not, that restricts us to <column reference> = <value> 
which is one of our big bugbears with SQL::Abstract on its own - for 
example keeping it as a pair of arrayrefs allows

  [ [ -op, '=', [ -name, 'spork' ], [ -name, 'spoon' ] ] ]

to produce "spork = spoon" in the final SQL - or having a function on 
the LHS or whatever.

 >>    }
 >>
 >> which is ... spectacularly ugly, but I -think- explicit enough. Fun 
 >> will be
 >> handling specific things like
 >
 > Yes, I was making it even more explicit. Whether that's over the top 
 > or not I'm not sure. Also, I'm thinking that the -bind value ought to 
 > support a code ref, so that it could potentially bind a different 
 > value every time it's called.

I agree in places, but your hashref style results in data loss as 
expanded above so I'm pretty much certain that's wrong.

We'll have to think carefully about how -bind works; Scalar::Defer might 
be sufficient to the task but I think explicit support for subrefs and 
for named bind parameters somehow will also be worthwhile.

 >>    MATCH (title,body) AGAINST ('database')
 >>
 >> but I guess that's just
 >>
 >>    [ -func, 'MATCH',
 >>      [ [ -name, 'title' ], [ -name, 'body' ] ],
 >>      [ [ -bind, 'database' ] ]
 >>    ]
 >
 > Right.
 >
 >> and have something like "func_specific_$func" and "func_generic" 
 >> methods on
 >> the emitter, so MATCH gets handled by func_specific_MATCH whereas 
 >> something
 >> "normal" like COUNT can go through func_generic (we'd enumerate the 
 >> specific
 >> methods at instantiation time for performance, I think).
 >
 > Well, I think that COUNT() is only in the select or group by clauses, 
 > never in where. But I'd also like to explicitly include support for 
 > MATCH(), so that one can use regular expressions in WHERE clauses, 
 > just as one can use LIKE.

True re count, but the point here is more one of enabling easy writing 
of syntax generators, and COUNT and its ilk are a relatively standard 
syntax whereas MATCH isn't. The func_generic/func_specific stuff would 
be an implementation detail from an external POV.

As for supporting MATCH explicitly we'd have to do a run-through of what 
other DBs provide by way of regex syntax, but I guess you can already 
tell us that?

-- 
     Matt S Trout       Offering custom development, consultancy and support
  Technical Director    contracts for Catalyst, DBIx::Class and BAST. Contact
Shadowcat Systems Ltd.  mst (at) shadowcatsystems.co.uk for more information

+ Help us build a better perl ORM: http://dbix-class.shadowcatsystems.co.uk/ +