[cl-ppcre-devel] behavior of \w

Robert Brown robert.brown at gmail.com
Mon Mar 12 15:10:04 UTC 2012


Some folks I work with are using cl-ppcre.  They've run into an
incompatibility between cl-ppcre and the PCRE library that boils
down to cl-ppcre's handling of \w.  The behavior is documented in
cl-ppcre's manual:

  CL-PPCRE uses ALPHANUMERICP to decide whether a character
  matches Perl's "\w", so depending on your CL implementation you
  might encounter differences between Perl and CL-PPCRE when
  matching non-ASCII characters.

This reliance on ALPHANUMERICP may be a misfeature.  It means
that cl-ppcre behaves differently depending on the Lisp
implementation it's running on.

My co-workers desire compatibility between cl-ppcre on SBCL
(where ALPHANUMERICP follows Unicode) and PCRE for matching
Latin-1 encoded strings.  They patched the cl-ppcre code to make
\w match a-z, A-Z, 0-9, and underscore.  Is there a better
workaround for them?

bob




More information about the Cl-ppcre-devel mailing list