[cl-ppcre-devel] *regex-char-code-limit*

Edi Weitz edi at agharta.de
Sun Nov 26 22:09:04 UTC 2006


On Sun, 26 Nov 2006 14:55:21 +0200, "Alex Mizrahi" <alex.mizrahi at gmail.com> wrote:

> i have an implementation that reports char-code-limit less than
> actual -- it's ABCL (working on top of Java), only 256 codes are
> officially suported, but it uses Java strings, so there's no problem
> with handling Unicode strings -- i set *regex-char-code-limit* to
> some 10000 (thanks, Edi!).  however, there are characters like
> 0xFFEF (the BOM), so i should set *regex-char-code-limit* to
> 65535. i think it's overkill to do that -- i see ppcre creates array
> of that size to do matching.
>
> how do people cope with it on unicode-enabled lisps? (afaik
> SteelBank uses UCS-4 char codes, so there's definitely no sane
> char-code-limit)
>
> does ppcre create that for each scanner? if there's one global array
> that's ok, but array for each scanner is too much..
>
> does *use-bmh-matchers* affect usage of this array?

Yes.  If you set it to NIL, you don't create BMH matchers and that's
where the arrays are needed.

The limit is also used in a few cases related to hash tables for
character classes, but I think this is not really important.

> if so, would it be much slower if i disable it?

BMH matchers will only help you if your regular expression starts or
ends with constant strings (the longer, the better) /and/ if your
target strings are very long.

HTH,
Edi.



More information about the Cl-ppcre-devel mailing list