[cl-ppcre-devel] Re: Weird :START Behaviour in CL-PPCRE:SCAN

Volkan YAZICI yazicivo at ttnet.net.tr
Fri Jul 6 07:55:53 UTC 2007


Chris Dean <ctdean at sokitomi.com> writes:
>>   (cl-ppcre:scan
>>    "^\\[([^ ]{1,})+[ ]*(.{1,})?\\]"
>>    "foo [[Main]] [http://baz]''bold'''''''bar''''"
>>    :start 13)
>
> This is a regular expression that does lots of backtracking when it
> fails.  If you change that you'll most likely see a large performance
> improvement.
>
> A small change is to simplify the first grouping:
>
>    "^\\[([^ ]{1,})[ ]*(.{1,})?\\]" 
>
> The reason that having :start is so much slower is that the regex
> matches a different string that needs far less backtracking that
> without the :start.

Next time, how can I understand when a regex will need that much
backtracking? I'll be really appreciated if you'd explain the pattern
a little bit more.

By the way, what I'm trying to do is to parse string patterns like
`[href]' and `[href text]'. And as you can realize from :START 13
keyword, I'm previously determined that at 13th character, there
exists a `['. Do you suggest any other method to parse such strings
more efficiently?


Regards.



More information about the Cl-ppcre-devel mailing list