From edi at agharta.de Thu Dec 9 07:58:48 2004 From: edi at agharta.de (Edi Weitz) Date: Thu, 09 Dec 2004 08:58:48 +0100 Subject: [cl-ppcre-devel] Re: cl-ppcre crash in openmcl In-Reply-To: (Alan Ruttenberg's message of "Wed, 8 Dec 2004 21:41:56 -0500") References: Message-ID: Hi Alan! On Wed, 8 Dec 2004 21:41:56 -0500, Alan Ruttenberg wrote: > This kills my lisp. > (ppcre::scan "\\(.*\\)$" "") > > Yours? No, neither CMUCL, nor SBCL, nor AllegroCL, nor LispWorks. I don't have a Mac available so it'd be nice if you could investigate this further. Please send bug reports to the mailing list if possible. Thanks, Edi. From edi at agharta.de Thu Dec 9 09:27:21 2004 From: edi at agharta.de (Edi Weitz) Date: Thu, 09 Dec 2004 10:27:21 +0100 Subject: [cl-ppcre-devel] Re: [Openmcl-devel] cl-ppcre crash in openmcl In-Reply-To: <20041209001341.R69803@clozure.com> (Gary Byers's message of "Thu, 9 Dec 2004 01:33:23 -0700 (MST)") References: <20041209001341.R69803@clozure.com> Message-ID: On Thu, 9 Dec 2004 01:33:23 -0700 (MST), Gary Byers wrote: > OpenMCL dies doing the moral equivalent of: > > (defun foo (s i) > (declare (optimize (speed 3) (safety 0))) > (schar s i)) > > (foo "" -1) > > If I do: > > ? (let* ((policy > ;; Be conservative about generating unsafe code, > ;; regardless of OPTIMIZE declaration settings > (new-compiler-policy > :trust-declarations #'(lambda (env) > (declare (ignore env)) nil) > :inhibit-safety-checking #'(lambda (env) > (declare (ignore env)) nil) > :open-code-inline #'(lambda (env) (declare (ignore env)) nil)))) > (set-current-compiler-policy policy) > (set-current-file-compiler-policy policy)) > > and recompile cl-ppcre with those settings in effect, I get > > ? (ppcre::scan "\\(.*\\)$" "") >> Error in process listener(1): Array index -1 out of bounds for "" . >> While executing: # > > (The empty string that's being referenced is the value of > CL-PPCRE::*STRING*, and -1 is the value of a local variable named > CL-PPCRE::START-POS.) > > Why this code was trying to call SCHAR on an empty string (with -1 > for an index) isn't clear; it certainly -could- be a compiler bug or > something similar, but it's also plausible to me that an unsafe > (SCHAR "" -1) is a quieter error in other implementations. > > Wherever the bug is, it might be easier to isolate if the compiler's > operating under a policy that discourages generation of unsafe code. Good catch, thanks. This didn't occur to me because it doesn't seem to happen with Lisps other than OpenMCL. I've just uploaded a new version (0.9.3) of CL-PPCRE which hopefully fixes this. Maybe someone using OpenMCL can try it. Thanks again, Edi. From alanr-l at mumble.net Wed Dec 15 06:13:56 2004 From: alanr-l at mumble.net (Alan Ruttenberg) Date: Wed, 15 Dec 2004 01:13:56 -0500 Subject: [cl-ppcre-devel] cl-ppcre speedup Message-ID: <8081EE6F-4E60-11D9-83DA-000A95DA5F3C@mumble.net> I was profiling the following expression (cl-ppcre::scan "(\\S+)\\s*(.*)" "DE Halobacterium halobium ribosomal proteins, partial and complete") and char=, char/= and char<= were coming up highest in the breakdown. One way to conservatively fix (least number of edits to the source) would be the following, which gets about a factor of 2x for the above expression. Arguably, this might be considered for inclusion in openmcl proper. Similar could be done for char-equal etc. -Alan #+openmcl (define-compiler-macro char<= (&whole form &environment env char &rest others) "" (if (and (= (ccl::speed-optimize-quantity env) 3) (= (ccl::safety-optimize-quantity env) 0)) (cond ((= (length others) 1) `(ccl::%i<= (the fixnum (char-code (the character ,char))) (the fixnum (char-code (the character ,(car others)))))) ((= (length others) 2) `(let ((middle (char-code (the character ,(car others))))) (declare (fixnum middle)) (and (ccl::%i<= (the fixnum (char-code (the character ,char))) middle) (ccl::%i<= middle (the fixnum (char-code (the character ,(second others)))))))) (t form)) form)) #+openmcl (define-compiler-macro char= (&whole form &environment env char &rest others) "" (if (and (= (ccl::speed-optimize-quantity env) 3) (= (ccl::safety-optimize-quantity env) 0)) (cond ((= (length others) 1) `(eq ,char ,(car others))) (t form)) form)) #+openmcl (define-compiler-macro char/= (&whole form &environment env char &rest others) "" (if (and (= (ccl::speed-optimize-quantity env) 3) (= (ccl::safety-optimize-quantity env) 0)) (cond ((= (length others) 1) `(not (eq ,char ,(car others)))) (t form)) form)) ;; add the optimize declares in the lambdas below so the compiler optimization kicks in. (defmethod create-matcher-aux ((char-class char-class) next-fn) (declare (type function next-fn)) ;; insert a test against the current character within *STRING* (insert-char-class-tester (char-class (schar *string* start-pos)) (if (invertedp char-class) (lambda (start-pos) (declare (type fixnum start-pos)) (declare (optimize (speed 3) (safety 0))) (and (< start-pos *end-pos*) (not (char-class-test)) (funcall next-fn (1+ start-pos)))) (lambda (start-pos) (declare (type fixnum start-pos)) (declare (optimize (speed 3) (safety 0))) (and (< start-pos *end-pos*) (char-class-test) (funcall next-fn (1+ start-pos))))))) From alanr-l at mumble.net Wed Dec 15 07:49:56 2004 From: alanr-l at mumble.net (Alan Ruttenberg) Date: Wed, 15 Dec 2004 02:49:56 -0500 Subject: [cl-ppcre-devel] cl-ppcre speedup In-Reply-To: <8081EE6F-4E60-11D9-83DA-000A95DA5F3C@mumble.net> References: <8081EE6F-4E60-11D9-83DA-000A95DA5F3C@mumble.net> Message-ID: Actually, closer to a 3x speedup. (DOTIMES (I 1000000) (CL-PPCRE:SCAN "(\\S+)\\s*(.*)" "DE Halobacterium halobium ribosomal proteins, partial and complete")) took 14,655 milliseconds (14.655 seconds) to run. vs. (DOTIMES (I 1000000) (CL-PPCRE:SCAN "(\\S+)\\s*(.*)" "DE Halobacterium halobium ribosomal proteins, partial and complete")) took 4,989 milliseconds (4.989 seconds) to run. -Alan On Dec 15, 2004, at 1:13 AM, Alan Ruttenberg wrote: > I was profiling the following expression > > (cl-ppcre::scan "(\\S+)\\s*(.*)" "DE Halobacterium halobium > ribosomal proteins, partial and complete") > > and char=, char/= and char<= were coming up highest in the breakdown. > > One way to conservatively fix (least number of edits to the source) > would be the following, which gets about a factor of 2x for the above > expression. > Arguably, this might be considered for inclusion in openmcl proper. > > Similar could be done for char-equal etc. From edi at agharta.de Wed Dec 15 08:12:48 2004 From: edi at agharta.de (Edi Weitz) Date: Wed, 15 Dec 2004 09:12:48 +0100 Subject: [cl-ppcre-devel] cl-ppcre speedup In-Reply-To: <8081EE6F-4E60-11D9-83DA-000A95DA5F3C@mumble.net> (Alan Ruttenberg's message of "Wed, 15 Dec 2004 01:13:56 -0500") References: <8081EE6F-4E60-11D9-83DA-000A95DA5F3C@mumble.net> Message-ID: Hi Alan! On Wed, 15 Dec 2004 01:13:56 -0500, Alan Ruttenberg wrote: > I was profiling the following expression > > (cl-ppcre::scan "(\\S+)\\s*(.*)" "DE Halobacterium halobium > ribosomal proteins, partial and complete") > > and char=, char/= and char<= were coming up highest in the breakdown. > > One way to conservatively fix (least number of edits to the source) > would be the following, which gets about a factor of 2x for the > above expression. Arguably, this might be considered for inclusion > in openmcl proper. > > Similar could be done for char-equal etc. That's an impressive speedup for such a few lines of code but I agree with you that this should be actually included into OpenMCL. I hesitate to add stuff to CL-PPCRE that's so specific to one implementation. Thanks, Edi. From edi at agharta.de Sat Dec 18 04:20:25 2004 From: edi at agharta.de (Edi Weitz) Date: Sat, 18 Dec 2004 05:20:25 +0100 Subject: [cl-ppcre-devel] New version 0.9.4 Message-ID: Changelog: Version 0.9.4 2004-12-18 Fixed bug in NORMALIZE-VAR-LIST (caught by Dave Roberts) Download: From edi at agharta.de Tue Dec 21 07:12:18 2004 From: edi at agharta.de (Edi Weitz) Date: Tue, 21 Dec 2004 08:12:18 +0100 Subject: [cl-ppcre-devel] Re: Does PPCRE cache scanners? In-Reply-To: <1103593445.13746.36.camel@linux.droberts.com> (Dave Roberts's message of "Mon, 20 Dec 2004 17:44:05 -0800") References: <1103593445.13746.36.camel@linux.droberts.com> Message-ID: <871xdki0q5.fsf@miles.agharta.de> [Cc to mailing list] On Mon, 20 Dec 2004 17:44:05 -0800, Dave Roberts wrote: > Does CL-PPCRE cache scanners when you just pass in a string to scan > or scan-to-strings? That is, if I just say (SCAN-TO-STRINGS "a > regex" "string-to-scan for a regex"), is it compiling "a regex" to a > scanner every time, or is it caching that expression for later? Hi Dave! There's more than one answer to this question: 1. CL-PPCRE never compiles scanner in the sense that the Lisp compiler is invoked. It just combines existing closures which means, e.g., that in delivered applications you can excise the compiler from the image and CL-PPCRE will still work. 2. Nevertheless, creating a scanner (as with CREATE-SCANNER) still takes some time because the regex has to be parsed and the chain of closures has to be created in memory. 3. If it encounters a constant (see CONSTANTP in the CLHS) regular expression, CL-PPCRE uses compiler macros to make sure the scanner is created only once - at load time. See this for an explanation: So this'll apply to your example above. 4. I briefly thought about generally caching scanners but realized that it is kind of orthogonal to the rest of CL-PPCRE so I'll leave it up to the user to do it if he needs it. Cheers, Edi. From edi at agharta.de Wed Dec 22 00:04:30 2004 From: edi at agharta.de (Edi Weitz) Date: Wed, 22 Dec 2004 01:04:30 +0100 Subject: [cl-ppcre-devel] Re: Does PPCRE cache scanners? In-Reply-To: <1103654432.13746.54.camel@linux.droberts.com> (Dave Roberts's message of "Tue, 21 Dec 2004 10:40:32 -0800") References: <1103593445.13746.36.camel@linux.droberts.com> <871xdki0q5.fsf@miles.agharta.de> <1103654432.13746.54.camel@linux.droberts.com> Message-ID: <87mzw7tcz5.fsf@miles.agharta.de> [Cc to mailing list] On Tue, 21 Dec 2004 10:40:32 -0800, Dave Roberts wrote: > Right, okay. I guess that's what I meant. It's having to create an > internal representation from the regex string each time you call > SCAN with a string regex rather than something already created from > CREATE-SCANNER. Yes, except for constant regexes, see below. > As an side, why "CREATE-SCANNER" and not "MAKE-SCANNER?" Everything > else in CL gets created with a MAKE-foo function of some sort > (MAKE-INSTANCE, MAKE-struct, etc.). Anyway, I keep mistyping it. ;-) Hehe, I don't know... :) > Ah, interesting. So isn't every string a constant? Every /literal/ string is constant. The form "dave" is constant, but the form (FORMAT NIL "~A" "dave") isn't. > That is, even if I had typed a more complex regex as a constant > string to SCAN, would that then be compiled a load time? Yes, it doesn't depend on the complexity of the regex. Literal s-expression regexes will also be translated at load time, BTW. > So (SCAN "a complex.*regex\\d+of +(some sort)+" "a string to scan") > > would be compiled at load time? The regex string is CONSTANTP. Yep - see the compiler macros in api.lisp. The form above will basically be replaced by a form like (SCAN FOO "a string to scan") where FOO is the load-time value of (CREATE-SCANNER "a complex.*regex\\d+of +(some sort)+") > If so, that's pretty neat. No penalty for putting the regex string > directly into the function call. In other languages, I'm used to > having to create a separate scanner once in the code and then use > that everywhere. I had been doing that a bit with CL-PPCRE, too, > having assumed it was similar. > > Just so I understand, if I'm constructing the string on the fly, > that then would have to get parsed and constructed every time. So > something like > > (SCAN (CONCATENATE 'STRING "part of my regex" "and the second half > (a*b)+") "a string to scan") > > would result in creating the scanner afresh every time? Yes. > Assuming I understand things correctly, with the creation of the > scanner at load time, I'm not sure you need to. The common cases are > pretty well covered and you have hooks with CREATE-SCANNER for the > advanced usage. Declare victory and move on, as they say! ;-) Hehe. Well, you could need a caching scheme if you're using many different regexes which are created at runtime. However, then you also have to think about how long scanners should stay in the cache, how you can invalidate cache entries, etc. As I said, it's kind of orthogonal to CL-PPCRE's purpose. Happy Holidays, Edi. From edi at agharta.de Wed Dec 22 16:04:02 2004 From: edi at agharta.de (Edi Weitz) Date: Wed, 22 Dec 2004 17:04:02 +0100 Subject: [cl-ppcre-devel] Anniversary release 1.0.0 Message-ID: Hi! I realized that CL-PPCRE is two years old now (actually I missed its birthday by two days because my laptop with the CVS tree was broken) and as my birthday will be tomorrow I thought it might be a good idea to finally give it a 1.x version number. So, there you are, it's called 1.0.0 now. This doesn't mean I think it's bug-free, just that it seems to be stable enough for most people using it. No new code this time: Happy Holidays, Edi.