[cl-ppcre-devel] Re: cl-ppcre new feature proposal

Ondrej Svitek ondrej.svitek at gmail.com
Fri Mar 23 23:29:10 UTC 2007


Hello Edi,

I've followed your list of suggestions and am sending you the patch [note:
ASDF recognizes the new system as :cl-ppcre-testing], parser extension is
now user-controllable through *ALLOW-NAMED-REGISTERS* switch, changes are
documented in the source and html doc.

I've also discovered a subtle problem - according to *ALLOW-QUOTING*
documentation:

* (let ((cl-ppcre:*allow-quoting* t))
    (cl-ppcre:scan "^\\Qa+\\E$" "a+"))
0
2
#()
#()

but my SBCL simply returns NIL. It will be immediately obvious what's
happening from the following code:

(let ((cl-ppcre:*allow-named-registers* t))
     (cl-ppcre:scan "(?<reg>.*)" "abc"))

=> error

...
;   (LOAD-TIME-VALUE (CL-PPCRE:CREATE-SCANNER "(?<reg>.*)"))
;
; caught ERROR:
;   (during EVAL of LOAD-TIME-VALUE)
;   Character 'r' may not follow '(?<' at position 3 in string "(?<reg>.*)"

; ==>
;   (CL-PPCRE:SCAN (LOAD-TIME-VALUE (CL-PPCRE:CREATE-SCANNER (?<reg>.*)"))
"abc")
...

SCAN function has a compiler-macro, which precompiles constant Perl regexes
at load time. But LOAD-TIME-VALUE doesn't know about any runtime bindings
(of course) affecting the scanner closure creation. Since compiler-macros
may or may not get expanded, it is implementation dependent what happens.
This code is likely to work in an interpreted REPL (but SBCL compiles all
forms by default, hence it doesn't work here), but less likely to work when
compiled. The situation probably affects more special variables than the
mentioned two.

Again, this is a rather subtle problem and unsuspecting user can get quite
puzzled by it. I can think of the following remedies:

1. Clearly mention the pitfall in the doc and warn users to always
explicitly use CREATE-SCANNER when binding special variables affecting
closure generation. They can even use LOAD-TIME-VALUE, provided that it
contains the desired binding inside.

2. Don't use LOAD-TIME-VALUE in the SCAN compiler-macro (I think there are
more similar places that have to be fixed too, but haven't investigated
them), but rather some kind of "FIRST-TIME-VALUE" - I mean, some simple sort
of memoization, which would compute a scanner closure  when it is needed for
the first time, remembering it afterwards. This would fix the problem with
binding of specials (safe only for constant values, though, as only the
first-time encountered binding would be remembered and effective). It would
also have the effect of spreading closure creation through program execution
time. This could be seen as a benefit sometimes, e.g. when a program uses
lots of constant regexes, which cause a noticeable start-up pause while
compiling them during load time (hypothetically, I haven't run across such a
case).

Maybe there are some other possibilities, that's why I have just mentioned
this issue and haven't done anything to fix it.

I hope this helped.

Regards,

Ondrej

On 19/03/07, Edi Weitz < edi at agharta.de> wrote:
>
> [Cc to mailing list.]
>
> Hi Ondrej,
>
> On Sat, 17 Mar 2007 00:31:46 +0100, "Ondrej Svitek" <ondrej.svitek at gmail.com>
> wrote:
>
> > I've written a little extension to your wonderful CL-PPCRE library -
> > support for named registers and back-references. I don't know if
> > Perl has them (never used it), but ACL does and they proved useful
> > for me in certain situations.
> >
> > [...]
> >
> > Feel free to incorporate this change, if you like it. Or not, if not
> > :)
>
> Thanks for the code.  I'd be interested to incorporate this, but for
> that I'd like you to do the following:
>
> 1. Send a "unified diff" (diff -u) of your changes instead of a full
>    tarball.
>
> 2. Make sure to (if necessary) update all docstrings of functions that
>    changed their behaviour and to add docstrings for functions,
>    classes, or slots you added.
>
> 3. Add a user-visible switch to turn this new behaviour on or off, so
>    users can opt to have the old, Perl-compatible syntax instead.  The
>    default should be off.
>
> 4. Update the HTML documentation accordingly.
>
> Thanks in advance,
> Edi.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/cl-ppcre-devel/attachments/20070324/6af11b12/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cl-ppcre-1.2.20-testing.diff.tar.gz
Type: application/x-gzip
Size: 16752 bytes
Desc: not available
URL: <https://mailman.common-lisp.net/pipermail/cl-ppcre-devel/attachments/20070324/6af11b12/attachment.bin>


More information about the Cl-ppcre-devel mailing list