[cl-ppcre-devel] Re: Does PPCRE cache scanners?

Edi Weitz edi at agharta.de
Wed Dec 22 00:04:30 UTC 2004


[Cc to mailing list]

On Tue, 21 Dec 2004 10:40:32 -0800, Dave Roberts <ldave at droberts.com> wrote:

> Right, okay. I guess that's what I meant. It's having to create an
> internal representation from the regex string each time you call
> SCAN with a string regex rather than something already created from
> CREATE-SCANNER.

Yes, except for constant regexes, see below.

> As an side, why "CREATE-SCANNER" and not "MAKE-SCANNER?" Everything
> else in CL gets created with a MAKE-foo function of some sort
> (MAKE-INSTANCE, MAKE-struct, etc.). Anyway, I keep mistyping it. ;-)

Hehe, I don't know... :)

> Ah, interesting. So isn't every string a constant?

Every /literal/ string is constant.  The form "dave" is constant, but
the form (FORMAT NIL "~A" "dave") isn't.

> That is, even if I had typed a more complex regex as a constant
> string to SCAN, would that then be compiled a load time?

Yes, it doesn't depend on the complexity of the regex.  Literal
s-expression regexes will also be translated at load time, BTW.

> So (SCAN "a complex.*regex\\d+of +(some sort)+" "a string to scan")
>
> would be compiled at load time? The regex string is CONSTANTP.

Yep - see the compiler macros in api.lisp.  The form above will
basically be replaced by a form like

  (SCAN FOO "a string to scan")

where FOO is the load-time value of

  (CREATE-SCANNER "a complex.*regex\\d+of +(some sort)+")

> If so, that's pretty neat. No penalty for putting the regex string
> directly into the function call. In other languages, I'm used to
> having to create a separate scanner once in the code and then use
> that everywhere. I had been doing that a bit with CL-PPCRE, too,
> having assumed it was similar.
>
> Just so I understand, if I'm constructing the string on the fly,
> that then would have to get parsed and constructed every time. So
> something like
>
> (SCAN (CONCATENATE 'STRING "part of my regex" "and the second half
> (a*b)+") "a string to scan")
>
> would result in creating the scanner afresh every time?

Yes.

> Assuming I understand things correctly, with the creation of the
> scanner at load time, I'm not sure you need to. The common cases are
> pretty well covered and you have hooks with CREATE-SCANNER for the
> advanced usage.  Declare victory and move on, as they say! ;-)

Hehe.  Well, you could need a caching scheme if you're using many
different regexes which are created at runtime.  However, then you
also have to think about how long scanners should stay in the cache,
how you can invalidate cache entries, etc.  As I said, it's kind of
orthogonal to CL-PPCRE's purpose.

Happy Holidays,
Edi.



More information about the Cl-ppcre-devel mailing list