[cl-ppcre-devel] Roles of scanner vs. parser vs. lexer?

Wed Aug 3 21:43:38 UTC 2005

Hi!

On Wed, 3 Aug 2005 11:59:48 -0700, Derek Peschel <dpeschel at eskimo.com> wrote:

> I've been reading the CL-PPCRE docs and code to get a clear
> specification of the syntax.

Uh, I think there is no clear specification of the syntax.  Your best
bets probably are `man perlre' and the Camel Book but these are moving
targets.

> Ultimately I'd like to add syntax highlighting for CL-PPCRE regexps
> to the Climacs text editor.

Cool...

> But there seems to be a certain amount of defensive or sloppy
> programming (things being done in more than one place).

I wouldn't be surprised.

> The scanner knows something about skipping # comments but the lexer
> does too.

See below.

> The lexer has code to ignore \E markers but I get the impression the
> scanner removes them before the lexer starts.  If this kind of
> duplication does exist, is there a useful reason for it?

The \Q\E stuff (*allow-quoting*) was added pretty late, almost a year
after CL-PPCRE's first release.  The problem with \Q\E and friends is
that they're not really part of Perl's regex syntax either - they're
part of Perl's string syntax:

  edi at vmware:~$ perl -le '$a = "\Q*\E"; print $a'
  \*

That's why I ignored them first and later implemented them as a kind
of "pre-parsing" of the regex string (which itself uses regular
expressions).  In the process of doing this it is possible that a
dangling \E remains in the regex string and that's why the lexer is
instructed to specifically ignore these.  Maybe this can be done in
api.lisp as well but at that time it seemed easier to me to do that in
the lexer.  (The lexer is pretty ugly anyway because it has to cope
with a very ugly syntax.)

If you have a patch to make the code cleaner without breaking it I'd
be happy to incorporate it.

Cheers,
Edi.