[cl-ppcre-devel] Quickly finding which alternative matched

Edi Weitz edi at agharta.de
Tue Feb 13 07:58:46 UTC 2007


On Mon, 12 Feb 2007 16:18:43 -0800, Harold Lee <harold at hotelling.net> wrote:

> For an :ALTERNATION, is there an O(1) way to tell which alternate
> matched? e.g. for a regexp (a)|(b)|(c)|(d)|... I don't want the O(n)
> performance of scanning the list returned by SCAN or
> SCAN-TO-STRINGS. Instead, I'd just like a number saying which one
> matched.

You could use filters:

  http://weitz.de/cl-ppcre/#filters

However, are you really sure that the O(n) operation of looping
through the registers is causing performance problems?  Have you
profiled the code?  This looks like premature optimization to me.

I'm pretty sure that whatever you'll do to "improve" this (filters or
changing CL-PPCRE internally to return more information for example)
you'll end up with something even slower.

> For a quick and dirty scanner (similar to lex) I thought I could use
> CL-PPCRE. The "rules" (in lex terminology) are just regular
> expressions, and so I combine them into a set of alternatives, e.g.
>
> \s+ -> whitespace
> [0-9]+ -> number
> [a-zA-Z][a-zA-Z0-9]* -> identifier
>
> => "(\\s+)|([0-9]+)|([a-zA-Z][a-zA-Z0-9]*)"
>
> Actually, I use CL-PPCRE::PARSE-STRING to get a parse tree for each
> expression, change any :REGISTER tags to :GROUP tags, and combine
> them under an :ALTERNATION tag:

That doesn't feel right.  I wouldn't create regex strings first just
to parse them into s-expressions afterwards.  Why don't you start with
s-expressions right away?

Cheers,
Edi.



More information about the Cl-ppcre-devel mailing list