[cl-ppcre-devel] *regex-char-code-limit*

Alex Mizrahi alex.mizrahi at gmail.com
Sun Nov 26 20:30:20 UTC 2006


> Hmm ... no! I can't think of a single use case where i would need to
> treat the BOM as part of the content. Actually, i can only come to the
> conclusion that a BOM within the content would be a serious bug. After
> all, your appication should _never_ deal with the binary representation,
> only with code points. What _code point_ do you get for BOM?

i just download HTML pages using Java functions into Java strings.
then i use CL-PPCRE to extract some information from it. certainly, i
don't care about BOM, but CL-PPCRE crashes on it trying to aref array
beyong char-code-limit.
i can pre-filter data removing BOM, but i'm not guaranteed that i
won't get some other wild character.

well, there are better ways to tokenize HTML, but i've made quick and
dirty solution via CL-PPCRE :)



More information about the Cl-ppcre-devel mailing list