From ron at flownet.com Wed Mar 19 00:15:37 2014 From: ron at flownet.com (Ron Garret) Date: Tue, 18 Mar 2014 17:15:37 -0700 Subject: Extending regexps to other kinds of sequences Message-ID: <5085ABDE-0A5A-418B-B017-3B5BF62FAC9D@flownet.com> The theory of regular expressions can be applied to any kind of sequence, not just strings. This would be potentially useful for pattern-matching applications, where current approaches make it very cumbersome to say things like, ?Match a list containing between three and five integers?. This sort of thing is easy to express in CL-PPCRE tree-like notation as, e.g. (:repetition (:type integer) 3 5) My question is: how hard would be it be to adapt the CL-PPCRE code to handle things like this? Is there a sequence-type-agnostic core in CL-PPCRE that could be easily re-used for this purpose, or is the assumption that regexps only apply to strings woven deeply into the code? Thanks, rg -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 455 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ron at flownet.com Wed Mar 19 11:06:35 2014 From: ron at flownet.com (Ron Garret) Date: Wed, 19 Mar 2014 04:06:35 -0700 Subject: Extending regexps to other kinds of sequences In-Reply-To: References: <5085ABDE-0A5A-418B-B017-3B5BF62FAC9D@flownet.com> Message-ID: <3B9F5C20-D409-4746-91A0-4263D37D0251@flownet.com> No, I?m not sure, but I strongly suspect that optima isn?t the right tool for my job. My use-case is a macro I?m trying to write called DEFINE-RELATION, which, as one might suspect from a macro with this name, defines relations between instances of classes. For example: (define-relation window <-> layout) This means that every instance of a WINDOW is associated with an instance of a LAYOUT, and vice versa. But not every relation can be defined simply by the classes. For example, one might want to define familial relationships among people: (define-relation person as father <->> person as child) This means that a single instance of a PERSON in the role of a father is associated with multiple instances of PERSON in the role of a child. or? (define-relation person as manager <->> person as employee) (define-relation person as owner <->> animal as pet) (define-relation person as owner <->> rock as pet) (define-relation person as owner <->> rock as weapon) The ?pattern? is naturally expressed as a regex: (define-class ($class (:optional as $role)) (:or <-> <->> <<-> <<->>) ($class2 (:optional as $role2))) I don?t see how to express this sort of thing in optima without enumerating all the possible cases, which rather defeats the purpose (one might as well just write a little parser at that point). Then I want to be able to say things like this: (define-relation user <->> time-period <->> goal) which means that a user is associated with multiple time periods, and that each user-time-period pair is in turn associated with a number of goals. This generalization is easily expressed as a minor tweak to the above regex: (define-class ($class (:optional as $role)) (:one-or-more (:or <-> <->> <<-> <<->>) ($classN (:optional as $roleN)))) but AFAICT this is entirely beyond what optima can do. rg On Mar 19, 2014, at 2:40 AM, Hans H?bner wrote: > Ron, > > are you sure that a general-purpose pattern matching library like Optima (https://github.com/m2ym/optima) would not be better than a generalized regular expression matching library for what you need to do? > > -Hans > > > 2014-03-19 1:15 GMT+01:00 Ron Garret : > The theory of regular expressions can be applied to any kind of sequence, not just strings. This would be potentially useful for pattern-matching applications, where current approaches make it very cumbersome to say things like, ?Match a list containing between three and five integers?. This sort of thing is easy to express in CL-PPCRE tree-like notation as, e.g. (:repetition (:type integer) 3 5) > > My question is: how hard would be it be to adapt the CL-PPCRE code to handle things like this? Is there a sequence-type-agnostic core in CL-PPCRE that could be easily re-used for this purpose, or is the assumption that regexps only apply to strings woven deeply into the code? > > Thanks, > rg > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 455 bytes Desc: Message signed with OpenPGP using GPGMail URL: From philipp at marek.priv.at Thu Mar 20 10:50:49 2014 From: philipp at marek.priv.at (Philipp Marek) Date: Thu, 20 Mar 2014 11:50:49 +0100 Subject: Extending regexps to other kinds of sequences In-Reply-To: References: <5085ABDE-0A5A-418B-B017-3B5BF62FAC9D@flownet.com> Message-ID: > I'm pretty sure that someone already did this, i.e. they forked > CL-PPCRE for arbitrary sequences. But I can't remember the details > right now. You'll probably find a link hidden in the mailing list > archives. I tried quite some time ago to change the RE-compilation into a macro, so that the _whole_ needed code would be visible to the compiler in one compile unit. That should have enabled quite a few optimizations - starting from matching against a base-string, an (unsigned-byte 8) vector, any other sequence ... But I didn't get that far ... I'd have had to reimplement most of the existing code, resp. convert everything to return forms. That got a bit messy, too. So later on I decided that the expected performance-improvements would be reached faster by waiting for 18 months (to get the CPUs catch up) than trying to completely reinvent the wheel here. Regards, Phil From edi at agharta.de Wed Mar 19 11:21:13 2014 From: edi at agharta.de (Edi Weitz) Date: Wed, 19 Mar 2014 12:21:13 +0100 Subject: Extending regexps to other kinds of sequences In-Reply-To: <5085ABDE-0A5A-418B-B017-3B5BF62FAC9D@flownet.com> References: <5085ABDE-0A5A-418B-B017-3B5BF62FAC9D@flownet.com> Message-ID: I'm pretty sure that someone already did this, i.e. they forked CL-PPCRE for arbitrary sequences. But I can't remember the details right now. You'll probably find a link hidden in the mailing list archives. Cheers, Edi. On Wed, Mar 19, 2014 at 1:15 AM, Ron Garret wrote: > The theory of regular expressions can be applied to any kind of sequence, not just strings. This would be potentially useful for pattern-matching applications, where current approaches make it very cumbersome to say things like, "Match a list containing between three and five integers". This sort of thing is easy to express in CL-PPCRE tree-like notation as, e.g. (:repetition (:type integer) 3 5) > > My question is: how hard would be it be to adapt the CL-PPCRE code to handle things like this? Is there a sequence-type-agnostic core in CL-PPCRE that could be easily re-used for this purpose, or is the assumption that regexps only apply to strings woven deeply into the code? > > Thanks, > rg > From hans.huebner at gmail.com Wed Mar 19 09:40:08 2014 From: hans.huebner at gmail.com (=?ISO-8859-1?Q?Hans_H=FCbner?=) Date: Wed, 19 Mar 2014 10:40:08 +0100 Subject: Extending regexps to other kinds of sequences In-Reply-To: <5085ABDE-0A5A-418B-B017-3B5BF62FAC9D@flownet.com> References: <5085ABDE-0A5A-418B-B017-3B5BF62FAC9D@flownet.com> Message-ID: Ron, are you sure that a general-purpose pattern matching library like Optima ( https://github.com/m2ym/optima) would not be better than a generalized regular expression matching library for what you need to do? -Hans 2014-03-19 1:15 GMT+01:00 Ron Garret : > The theory of regular expressions can be applied to any kind of sequence, > not just strings. This would be potentially useful for pattern-matching > applications, where current approaches make it very cumbersome to say > things like, "Match a list containing between three and five integers". > This sort of thing is easy to express in CL-PPCRE tree-like notation as, > e.g. (:repetition (:type integer) 3 5) > > My question is: how hard would be it be to adapt the CL-PPCRE code to > handle things like this? Is there a sequence-type-agnostic core in > CL-PPCRE that could be easily re-used for this purpose, or is the > assumption that regexps only apply to strings woven deeply into the code? > > Thanks, > rg > > -------------- next part -------------- An HTML attachment was scrubbed... URL: