From yazicivo at ttnet.net.tr Thu Jul 5 22:37:56 2007 From: yazicivo at ttnet.net.tr (Volkan YAZICI) Date: Fri, 06 Jul 2007 01:37:56 +0300 Subject: [cl-ppcre-devel] Weird :START Behaviour in CL-PPCRE:SCAN Message-ID: <87ir8ymq7v.fsf@ttnet.net.tr> Hi, In a part of the program, I find out that the below CL-PPCRE:SCAN call really slows down the whole operation: (cl-ppcre:scan "^\\[([^ ]{1,})+[ ]*(.{1,})?\\]" "foo [[Main]] [http://baz]''bold'''''''bar''''" :start 13) When I remove the :START keyword and the beginning `^' regex character, SCAN finishes the operation quite fast, as it should be. What can be the problem in here? (A bug?) How can I fix this weird behaviour? Regards. From ctdean at sokitomi.com Thu Jul 5 23:31:04 2007 From: ctdean at sokitomi.com (Chris Dean) Date: Thu, 05 Jul 2007 16:31:04 -0700 Subject: [cl-ppcre-devel] Weird :START Behaviour in CL-PPCRE:SCAN In-Reply-To: <87ir8ymq7v.fsf@ttnet.net.tr> (Volkan YAZICI's message of "Fri, 06 Jul 2007 01:37:56 +0300") References: <87ir8ymq7v.fsf@ttnet.net.tr> Message-ID: > (cl-ppcre:scan > "^\\[([^ ]{1,})+[ ]*(.{1,})?\\]" > "foo [[Main]] [http://baz]''bold'''''''bar''''" > :start 13) This is a regular expression that does lots of backtracking when it fails. If you change that you'll most likely see a large performance improvement. A small change is to simplify the first grouping: "^\\[([^ ]{1,})[ ]*(.{1,})?\\]" The reason that having :start is so much slower is that the regex matches a different string that needs far less backtracking that without the :start. Cheers, Chris Dean From ctdean at sokitomi.com Thu Jul 5 23:51:16 2007 From: ctdean at sokitomi.com (Chris Dean) Date: Thu, 05 Jul 2007 16:51:16 -0700 Subject: [cl-ppcre-devel] Weird :START Behaviour in CL-PPCRE:SCAN In-Reply-To: (Chris Dean's message of "Thu, 05 Jul 2007 16:31:04 -0700") References: <87ir8ymq7v.fsf@ttnet.net.tr> Message-ID: > The reason that having :start is so much slower is that the regex > matches a different string that needs far less backtracking that > without the :start. Should be "... that needs to perform far more backtracking than the version without the :start keyword." Cheers, Chris Dean From yazicivo at ttnet.net.tr Fri Jul 6 07:55:53 2007 From: yazicivo at ttnet.net.tr (Volkan YAZICI) Date: Fri, 06 Jul 2007 10:55:53 +0300 Subject: [cl-ppcre-devel] Re: Weird :START Behaviour in CL-PPCRE:SCAN In-Reply-To: (Chris Dean's message of "Thu\, 05 Jul 2007 16\:31\:04 -0700") References: <87ir8ymq7v.fsf@ttnet.net.tr> Message-ID: <87myyaj792.fsf_-_@ttnet.net.tr> Chris Dean writes: >> (cl-ppcre:scan >> "^\\[([^ ]{1,})+[ ]*(.{1,})?\\]" >> "foo [[Main]] [http://baz]''bold'''''''bar''''" >> :start 13) > > This is a regular expression that does lots of backtracking when it > fails. If you change that you'll most likely see a large performance > improvement. > > A small change is to simplify the first grouping: > > "^\\[([^ ]{1,})[ ]*(.{1,})?\\]" > > The reason that having :start is so much slower is that the regex > matches a different string that needs far less backtracking that > without the :start. Next time, how can I understand when a regex will need that much backtracking? I'll be really appreciated if you'd explain the pattern a little bit more. By the way, what I'm trying to do is to parse string patterns like `[href]' and `[href text]'. And as you can realize from :START 13 keyword, I'm previously determined that at 13th character, there exists a `['. Do you suggest any other method to parse such strings more efficiently? Regards. From ctdean at sokitomi.com Fri Jul 6 09:04:49 2007 From: ctdean at sokitomi.com (Chris Dean) Date: Fri, 06 Jul 2007 02:04:49 -0700 Subject: [cl-ppcre-devel] Re: Weird :START Behaviour in CL-PPCRE:SCAN In-Reply-To: <87myyaj792.fsf_-_@ttnet.net.tr> (Volkan YAZICI's message of "Fri, 06 Jul 2007 10:55:53 +0300") References: <87ir8ymq7v.fsf@ttnet.net.tr> <87myyaj792.fsf_-_@ttnet.net.tr> Message-ID: > By the way, what I'm trying to do is to parse string patterns like > `[href]' and `[href text]'. And as you can realize from :START 13 > keyword, I'm previously determined that at 13th character, there > exists a `['. Do you suggest any other method to parse such strings > more efficiently? A regex seems like a fine way to me. > Next time, how can I understand when a regex will need that much > backtracking? I'll be really appreciated if you'd explain the pattern > a little bit more. You can play around with the Regex Coach http://weitz.de/regex-coach/ and step through the matching. If you really want a deeper understanding Jeffrey Friedl's Mastering Regular Expressions is very good. And most compiler text books will cover regexs as well. Maybe someone else has a simple backtracking explanation. Cheers, Chris Dean From yazicivo at ttnet.net.tr Fri Jul 6 09:31:03 2007 From: yazicivo at ttnet.net.tr (Volkan YAZICI) Date: Fri, 06 Jul 2007 12:31:03 +0300 Subject: [cl-ppcre-devel] Excessive Memory Usage by CREATE-SCANNER Message-ID: <87644xswtk.fsf@ttnet.net.tr> Hi, While trying to build a parse tree list using CREATE-SCANNER, in the attempts after first 2-3 tries, below code exhausts the whole system memory and halts the system. (defparameter *markup-transformations* (loop for (element syntax) on (list ;; Link formatting. :link-internal "\\[\\[" :link-external "\\[http://" :link-external "\\[https://" :link-external "\\[ftp://" ;; Text formatting. :text-italic-bold "'''''" :text-italic "'''" :text-bold "''" :text-underline "\\_\\_" :text-monospace "`" :text-superscript "\\^" :text-subscript ",," ;; Formattings requiring a fresh line. :header-3 "\\n=== " :header-2 "\\n== " :header-1 "\\n= " :code-start "\\n{{{" :code-end "\\n}}}" :blockquote "\\n ") by #'cddr collect (list element (cl-ppcre:create-scanner (string-append "^" syntax))))) After 15 minutes, still no OOM reactions. (SBCL 1.0.6) So I needed restart the machine. (And therefore, I cannot narrow down the problematic part of the code by bisectioning.) Do you have any ideas about the erronous line? Regards. From ctdean at sokitomi.com Fri Jul 6 15:37:13 2007 From: ctdean at sokitomi.com (Chris Dean) Date: Fri, 06 Jul 2007 08:37:13 -0700 Subject: [cl-ppcre-devel] Excessive Memory Usage by CREATE-SCANNER In-Reply-To: <87644xswtk.fsf@ttnet.net.tr> (Volkan YAZICI's message of "Fri, 06 Jul 2007 12:31:03 +0300") References: <87644xswtk.fsf@ttnet.net.tr> Message-ID: Volkan YAZICI writes: > While trying to build a parse tree list using CREATE-SCANNER, in the > attempts after first 2-3 tries, below code exhausts the whole system > memory and halts the system. FWIW, the code runs fine and compiles in 0.03 seconds on LispWorks 5.0.2 on a Mac. Cheers, Chris Dean From edi at agharta.de Thu Jul 12 19:13:04 2007 From: edi at agharta.de (Edi Weitz) Date: Thu, 12 Jul 2007 21:13:04 +0200 Subject: [cl-ppcre-devel] Excessive Memory Usage by CREATE-SCANNER In-Reply-To: <87644xswtk.fsf@ttnet.net.tr> (Volkan YAZICI's message of "Fri, 06 Jul 2007 12:31:03 +0300") References: <87644xswtk.fsf@ttnet.net.tr> Message-ID: On Fri, 06 Jul 2007 12:31:03 +0300, Volkan YAZICI wrote: > While trying to build a parse tree list using CREATE-SCANNER, in the > attempts after first 2-3 tries, below code exhausts the whole system > memory and halts the system. > > (defparameter *markup-transformations* > (loop for (element syntax) on > (list > ;; Link formatting. > :link-internal "\\[\\[" > :link-external "\\[http://" > :link-external "\\[https://" > :link-external "\\[ftp://" > ;; Text formatting. > :text-italic-bold "'''''" > :text-italic "'''" > :text-bold "''" > :text-underline "\\_\\_" > :text-monospace "`" > :text-superscript "\\^" > :text-subscript ",," > ;; Formattings requiring a fresh line. > :header-3 "\\n=== " > :header-2 "\\n== " > :header-1 "\\n= " > :code-start "\\n{{{" > :code-end "\\n}}}" > :blockquote "\\n ") > by #'cddr collect > (list element (cl-ppcre:create-scanner > (string-append "^" syntax))))) > > After 15 minutes, still no OOM reactions. (SBCL 1.0.6) So I needed > restart the machine. (And therefore, I cannot narrow down the > problematic part of the code by bisectioning.) Do you have any ideas > about the erronous line? Works fine for me with SBCL 1.0.5 on Linux. I put the above code in a file, defined STRING-APPEND in the obvious way using FORMAT, and compiled and loaded the file a couple of times without problems. If you can reproduce the problem, then maybe you should try to print something (and force output) for each iteration, so you can see when printing stops. (Which version of CL-PPCRE did you use, BTW?) HTH, Edi. From yazicivo at ttnet.net.tr Thu Jul 12 21:44:45 2007 From: yazicivo at ttnet.net.tr (Volkan YAZICI) Date: Fri, 13 Jul 2007 00:44:45 +0300 Subject: [cl-ppcre-devel] Re: Excessive Memory Usage by CREATE-SCANNER In-Reply-To: (Edi Weitz's message of "Thu\, 12 Jul 2007 21\:13\:04 +0200") References: <87644xswtk.fsf@ttnet.net.tr> Message-ID: <87wsx5uwj6.fsf_-_@ttnet.net.tr> Edi Weitz writes: > Works fine for me with SBCL 1.0.5 on Linux. I put the above code in a > file, defined STRING-APPEND in the obvious way using FORMAT, and > compiled and loaded the file a couple of times without problems. If > you can reproduce the problem, then maybe you should try to print > something (and force output) for each iteration, so you can see when > printing stops. Excuse me for not informing the list about the latest state of the problem. After Zach Beane suggested me to turn *USE-BMH-MATCHERS* off, I saw that next time I need to read the documentation more carefully. Turning *USE-BMH-MATCHERS* off solved the problem. Regards.