From klaus at harbo.net Wed Jun 2 11:02:44 2004 From: klaus at harbo.net (Klaus Harbo) Date: Wed, 02 Jun 2004 13:02:44 +0200 Subject: [cl-ppcre-devel] defpatt Message-ID: <40BDB3D4.3080708@harbo.net> Working with cl-ppcre, I have found that I increasingly use the s-expr representation rather than the traditional string representation with its infix operators. To make it easier to work with the s-expressions, I've developed 'defpatt' - a package which implements a notation for defininig and referring to regular expressions in terms of cl-ppcre s-expressions. I thought it might interest the readers of this list. The package can be downloaded from http://www.harbo.net/downloads/defpatt-0.2.tar.gz . Suggestions, comments, improvements are welcome. best regards, -Klaus. ------ defpatt examples (from defpatt.lisp): ------ #| EXAMPLES ; If you want to try the examples, be sure to evaluate the ; expression below first - otherwise the other ones won't work. > (defpatt:defpatt-set-default-macro-char) ; Defines #\? as macro character => T > (cl-ppcre:all-matches-as-strings ?(alt "a" "c" "f") "abcdefghi") ; Note: Equivalent to "a|c|f" => ("a" "c" "f") ; That's all very well, but doesn't buy us very much. ; However `defpatt' (as per cl-ppcre's sexpr-based ; representation of REs) enables us to both document ; the patterns much better by letting us insert comments ; into REs... > (cl-ppcre:scan-to-strings ?(seq digit+ ; used space ws+ digit+ ; available space ws+ digit+ ; remaining space ) "123 4567 7887") ; Note: `ws+' and `digit+' are defined above, in `defpatt-initialize'. => "123 4567 7887", #() ; ...as well as lets us capture data in a structured fashion... > (cl-ppcre:register-groups-bind (used avail remain) (?(seq (reg digit+) ; used space ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space ) "123 4567 7887") (mapcar #'parse-integer (list used avail remain))) ; Note: `(reg ...)' creates a register binding => (123 4567 7887) ; ...but also lets us _FIRST_ define and document the abstraction... > (defpatt match-nums () ?(seq (reg digit+) ; used space ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space )) => MATCH-NUMS ; ...and _THEN_ use it... > (cl-ppcre:register-groups-bind (used avail remain) (?match-nums "123 4567 7887") (mapcar #'parse-integer (list used avail remain))) => (123 4567 7887) ; which is a lot more easily understood, as I am sure you will ; agree. > (cl-ppcre:scan-to-strings ?(upto "efg") "abcdefghi") => "abcd", #() > (cl-ppcre:scan-to-strings ?(upto+ "efg") "abcdefghi") => "abcdefg", #() ; To see the raw cl-ppcre expansion of a `defpatt' expression, ; simply enter it: > ?(seq (reg digit+) ; used space ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space ) => (:SEQUENCE (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))) (:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS) (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9)))) (:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS) (:REGISTER (:GREEDY-REPETITION 1 NIL (:CHAR-CLASS (:RANGE #\0 #\9))))) ; To see _HOW_ `defpatt' expands an expression use `macroexpand': > (macroexpand-1 '?(seq (reg digit+) ; used space ws+ (reg digit+) ; available space ws+ (reg digit+) ; remaining space )) => (LABELS ((++ (PATT) (REP PATT 1 NIL)) (UPTO (PATT) `(:SEQUENCE (:FLAGS :SINGLE-LINE-MODE-P) (:GREEDY-REPETITION 0 NIL (:SEQUENCE :EVERYTHING (:NEGATIVE-LOOKAHEAD ,PATT))) :EVERYTHING)) (?? (PATT) (REP PATT 0 1)) (UPTO+ (PATT) `(:SEQUENCE ,(UPTO PATT) ,PATT)) (ALT (&REST ARGS) `(:ALTERNATION , at ARGS)) (** (PATT) (REP PATT 0 NIL)) (SEQ (&REST ARGS) `(:SEQUENCE , at ARGS)) (REG (&REST ARGS) `(:REGISTER , at ARGS)) (REP (PATT &OPTIONAL (MIN 0) (MAX NIL)) `(:GREEDY-REPETITION ,MIN ,MAX ,PATT))) (SYMBOL-MACROLET ((WS+ '(:GREEDY-REPETITION 1 NIL :WHITESPACE-CHAR-CLASS)) (WS* '(:GREEDY-REPETITION 0 NIL :WHITESPACE-CHAR-CLASS)) (DIGIT '(:CHAR-CLASS (:RANGE #\0 #\9))) (DIGIT+ (++ DIGIT)) (MATCH-NUMS (DEFPATT-PATTERN (SEQ (REG DIGIT+) WS+ (REG DIGIT+) WS+ (REG DIGIT+)))) (DIGIT* (** DIGIT))) (SEQ (REG DIGIT+) WS+ (REG DIGIT+) WS+ (REG DIGIT+)))) ; `upto' and `upto+' are good examples of how having an abstraction ; mechanism helps keep maintainable and understandable REs. See ; their definitions above. |# From klaus at harbo.net Fri Jun 11 19:24:00 2004 From: klaus at harbo.net (Klaus Harbo) Date: Fri, 11 Jun 2004 21:24:00 +0200 Subject: [cl-ppcre-devel] defpatt updated Message-ID: <40CA06D0.9080204@harbo.net> I have just posted version v0.2.1 of 'defpatt' - a mechanism for defining and using regular expression abstractions with CL-PPCRE. The update fixes a bothersome error affilicting certain types of defpatt expressions. I strongly recommend anyone looking at or using defpatt. 'defpatt' can be downloaded from http://www.harbo.net/downloads. best regards, -Klaus. From edi at agharta.de Sat Jun 12 14:39:50 2004 From: edi at agharta.de (Edi Weitz) Date: Sat, 12 Jun 2004 16:39:50 +0200 Subject: [cl-ppcre-devel] Re: cl-ppcre In-Reply-To: (Daniel Skarda's message of "Sat, 12 Jun 2004 15:54:11 +0200") References: Message-ID: <87n0384z7t.fsf@bird.agharta.de> Hi Daniel! On Sat, 12 Jun 2004 15:54:11 +0200, Daniel Skarda <0rfelyus at ucw.cz> wrote: > today I explored the possibilities of regular expressions > implementations in various Debian Common Lisp packages. I really > liked your library - thank you for writing cl-ppcre library. You're welcome. > I also looked into elegant cl-lexer package built on top of > cl-regex library. What I missed in cl-ppcre is a parse-tree node > similar to cl-regex's 'success node, which defines return value of > match/scan functions. With 'success node one can build `deflexer' > macro on top of cl-ppcre as easy as on top of cl-regex package. > > Is it possible to extend cl-ppcre with similar feature? I might look into this for a future version but see below. > Footnote: In cl-lexer, deflexer macro > > (deflexer foo > ("regexp" some action) ; 0 > ("another regexp" another action) ; 1 > ...)) > > numbers each pair of regexp and action, then combine regexp parse > trees into one big parse tree > > `(alt > (seq (regexp tree) (success 0)) > (seq (another regexp tree) (success 1)) > ...) > > and use return value from match (ie regexp serial number) to select > an action associated to matching regexp) I've recently written demo code like this for another CL-PPCRE user who also wanted to build a lexer: (in-package :cl-user) (eval-when (:compile-toplevel :load-toplevel :execute) (defmacro with-unique-names ((&rest bindings) &body body) ;; see `(let ,(mapcar #'(lambda (binding) (check-type binding (or cons symbol)) (if (consp binding) (destructuring-bind (var x) binding (check-type var symbol) `(,var (gensym ,(etypecase x (symbol (symbol-name x)) (character (string x)) (string x))))) `(,binding (gensym ,(symbol-name binding))))) bindings) , at body))) (defmacro deflexer (name &body body) (with-unique-names (regex-table regex token sexpr-regex anchored-regex string start scanner next-pos) `(let ((,regex-table (loop for (,regex . ,token) in (list ,@(loop for (regex token) in body collect `(cons ,regex ,token))) for ,sexpr-regex = (etypecase ,regex (function (error "Compiled scanners are not allowed here")) (string (cl-ppcre::parse-string ,regex)) (list ,regex)) for ,anchored-regex = (cl-ppcre:create-scanner `(:sequence :modeless-start-anchor ,,sexpr-regex)) collect (cons ,anchored-regex ,token)))) (defun ,name (,string &key ((:start ,start) 0)) (loop for (,scanner . ,token) in ,regex-table for ,next-pos = (nth-value 1 (cl-ppcre:scan ,scanner ,string :start ,start)) when ,next-pos do (return (values ,token ,next-pos))))))) You should be able to use it like this: * (deflexer mylexer ("'.*'" 'string) ("#.*$" 'comment) ("[ \t\r\f]+" 'ws) (":=" 'assign) ("[\[]" 'lbrack) ("[\]]" 'rbrack) ("[\,]" 'comma) ("[\:]" 'colon) ("[\;]" 'semicolon) ("[+-]?[0-9]*[\.][0-9]+([eE][+-]?[0-9]+)?" 'float) ("[+-]?[0-9]+" 'integer) ("[a-zA-Z0-9_]+" 'id) ("." 'unknown)) ; Converted MYLEXER. MYLEXER * (mylexer "a:=123.4?") ID 1 * (mylexer "a:=123.4?" :start 1) ASSIGN 3 * (mylexer "a:=123.4?" :start 3) FLOAT 8 * (mylexer "a:=123.4?" :start 8) UNKNOWN 9 This one only returns tokens but it should be trivial to change the macro such that the newly-defined lexer invokes functions instead. Wouldn't that already do what you want? I'm not sure what the approach you sketched above would buy you compared to this one. Cheers, Edi. PS: Please, if possible, continue this conversation on the mailing list. Thanks.