[cl-ppcre-devel] behavior of \w

Edi Weitz edi at agharta.de
Mon Mar 12 15:18:38 UTC 2012


If they insist on using "\w", there's no portable way to change this
except for patching the code.

Otherwise, they could of course use a character class or add their own
property resolver.

Cheers,
Edi.


On Mon, Mar 12, 2012 at 4:10 PM, Robert Brown <robert.brown at gmail.com> wrote:
> Some folks I work with are using cl-ppcre.  They've run into an
> incompatibility between cl-ppcre and the PCRE library that boils
> down to cl-ppcre's handling of \w.  The behavior is documented in
> cl-ppcre's manual:
>
>  CL-PPCRE uses ALPHANUMERICP to decide whether a character
>  matches Perl's "\w", so depending on your CL implementation you
>  might encounter differences between Perl and CL-PPCRE when
>  matching non-ASCII characters.
>
> This reliance on ALPHANUMERICP may be a misfeature.  It means
> that cl-ppcre behaves differently depending on the Lisp
> implementation it's running on.
>
> My co-workers desire compatibility between cl-ppcre on SBCL
> (where ALPHANUMERICP follows Unicode) and PCRE for matching
> Latin-1 encoded strings.  They patched the cl-ppcre code to make
> \w match a-z, A-Z, 0-9, and underscore.  Is there a better
> workaround for them?
>
> bob
>
> _______________________________________________
> cl-ppcre-devel site list
> cl-ppcre-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/cl-ppcre-devel
>




More information about the Cl-ppcre-devel mailing list