From edi at agharta.de Fri Aug 24 08:26:00 2007 From: edi at agharta.de (Edi Weitz) Date: Fri, 24 Aug 2007 10:26:00 +0200 Subject: [cl-ppcre-devel] New release 1.3.1 Message-ID: ChangeLog: Version 1.3.1 2007-08-24 Second return value for REGEX-REPLACE (patch by Matthew Sachs) Download: http://weitz.de/files/cl-ppcre.tar.gz From seb-cl-mailist at matchix.com Mon Aug 27 17:12:48 2007 From: seb-cl-mailist at matchix.com (=?ISO-8859-1?Q?S=E9bastien_Saint-Sevin?=) Date: Mon, 27 Aug 2007 19:12:48 +0200 Subject: [cl-ppcre-devel] CL-PPCRE Split behaviour Message-ID: <46D30610.4070709@matchix.com> Hi Edi & list, While using cl-ppcre:split recently, I discover that when the regex match at pos 0, the function returns an empty string in first pos, where I think it should not as I do not consider the empty string being a substring of the original one. I don't know the Perl behaviour in this particular case, but I hope it is not a peculiar behaviour as the doc says :-) Ex : (cl-ppcre:split "\\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz") I would prefer to get ("foo" "bar" "baz") What do you think of it ? Thks, Sebastien. From seb-cl-mailist at matchix.com Mon Aug 27 17:27:04 2007 From: seb-cl-mailist at matchix.com (=?ISO-8859-1?Q?S=E9bastien_Saint-Sevin?=) Date: Mon, 27 Aug 2007 19:27:04 +0200 Subject: [cl-ppcre-devel] CL-PPCRE Split behaviour In-Reply-To: <46D30610.4070709@matchix.com> References: <46D30610.4070709@matchix.com> Message-ID: <46D30968.6030607@matchix.com> Hi again, > Hi Edi & list, > > While using cl-ppcre:split recently, I discover that when the regex > match at pos 0, the function returns an empty string in first pos, > where I think it should not as I do not consider the empty string being > a substring of the original one. I should have say "the empty string at pos 0" (I'm ok with empty strings in the middle of the string when two consecutives matches occurs with no char in between). The same can be said for an empty string at the end (but this can't be seen as the empty strings are removed when at the end). Hope this clarifies a bit my thought... > > I don't know the Perl behaviour in this particular case, but I hope it > is not a peculiar behaviour as the doc says :-) > > Ex : (cl-ppcre:split "\\s+" " foo bar baz ") > ==> ("" "foo" "bar" "baz") > > I would prefer to get ("foo" "bar" "baz") > > What do you think of it ? > Thks, Sebastien. > _______________________________________________ From msachs at itasoftware.com Mon Aug 27 17:39:58 2007 From: msachs at itasoftware.com (Matthew Sachs) Date: Mon, 27 Aug 2007 13:39:58 -0400 Subject: [cl-ppcre-devel] CL-PPCRE Split behaviour In-Reply-To: <46D30968.6030607@matchix.com> References: <46D30610.4070709@matchix.com> <46D30968.6030607@matchix.com> Message-ID: <46D30C6E.9080300@itasoftware.com> S?bastien Saint-Sevin wrote: >> I don't know the Perl behaviour in this particular case, but I hope it >> is not a peculiar behaviour as the doc says :-) >> >> Ex : (cl-ppcre:split "\\s+" " foo bar baz ") >> ==> ("" "foo" "bar" "baz") >> >> I would prefer to get ("foo" "bar" "baz") bash$ perl -e 'print join(" ", map { "\"$_\"" } split(/\s+/, " foo bar baz ")), "\n"' "" "foo" "bar" "baz" CL-PPCRE is matching Perl's behavior here. From seb-cl-mailist at matchix.com Mon Aug 27 17:55:52 2007 From: seb-cl-mailist at matchix.com (=?ISO-8859-1?Q?S=E9bastien_Saint-Sevin?=) Date: Mon, 27 Aug 2007 19:55:52 +0200 Subject: [cl-ppcre-devel] CL-PPCRE Split behaviour In-Reply-To: <46D30C6E.9080300@itasoftware.com> References: <46D30610.4070709@matchix.com> <46D30968.6030607@matchix.com> <46D30C6E.9080300@itasoftware.com> Message-ID: <46D31028.4010305@matchix.com> Matthew Sachs a ?crit : > S?bastien Saint-Sevin wrote: >>> I don't know the Perl behaviour in this particular case, but I hope >>> it is not a peculiar behaviour as the doc says :-) >>> >>> Ex : (cl-ppcre:split "\\s+" " foo bar baz ") >>> ==> ("" "foo" "bar" "baz") >>> >>> I would prefer to get ("foo" "bar" "baz") > > bash$ perl -e 'print join(" ", map { "\"$_\"" } split(/\s+/, " foo bar > baz ")), "\n"' > "" "foo" "bar" "baz" > > CL-PPCRE is matching Perl's behavior here. I'm not that much surprised that PERL can be doing it this way... Thanks for the perl test, Matthew. Cheers, Sebastien. > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel > From ctdean at sokitomi.com Mon Aug 27 19:14:36 2007 From: ctdean at sokitomi.com (Chris Dean) Date: Mon, 27 Aug 2007 12:14:36 -0700 Subject: [cl-ppcre-devel] CL-PPCRE Split behaviour In-Reply-To: <46D30610.4070709@matchix.com> (=?iso-8859-1?Q?S=E9bastien?= Saint-Sevin's message of "Mon, 27 Aug 2007 19:12:48 +0200") References: <46D30610.4070709@matchix.com> Message-ID: S?bastien Saint-Sevin writes: > While using cl-ppcre:split recently, I discover that when the regex > match at pos 0, the function returns an empty string in first pos, > where I think it should not as I do not consider the empty string > being a substring of the original one. > > Ex : (cl-ppcre:split "\\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz") It is an interesting question, but I believe that the current split behavior of the returning the leading empty string is the rational behavior. In mind my in comes down to the definition of split "returns a list of the substrings between the matches". Having said that I often have real-world needs to *not* have the leading string around. I wish there were explicit keyword args to omit any leading and trailing empty strings. If I get motivated, I might even make a patch! Perl's version of split doesn't have keyword args so it tries to fit several behavior changes into its arguments. Here's some more practical advice: If you know your problem domain well, you can try the inverse match trick. Instead of calling SPLIT, call ALL-MATCHES-AS-STRINGS with the inverse regex. In this case: (all-matches-as-strings "\\S+" " foo bar baz ") => ("foo" "bar" "baz") (This will skip internal empty strings in the general case, but doesn't matter for your example case.) It's also easy to also write your own split that does what you want. An untested version is below. Cheers, Chris Dean (defun simple-split (regex target-string) "A simple version of split that doesn't handle registers in any special way and discards leading and trailing empty matches. Untested!" (let ((res nil) ; The result (last-end 0)) ; The end positon of the last match (cl-ppcre:do-matches (mstart mend regex target-string) (unless (zerop mstart) (push (subseq target-string last-end mstart) res)) (setf last-end mend)) (when (< last-end (length target-string)) (push (subseq target-string last-end) res)) (nreverse res))) From seb-cl-mailist at matchix.com Tue Aug 28 08:38:04 2007 From: seb-cl-mailist at matchix.com (=?ISO-8859-1?Q?S=E9bastien_Saint-Sevin?=) Date: Tue, 28 Aug 2007 10:38:04 +0200 Subject: [cl-ppcre-devel] CL-PPCRE Split behaviour In-Reply-To: References: <46D30610.4070709@matchix.com> Message-ID: <46D3DEEC.9040204@matchix.com> Thanks a lot Chris, Very interesting feedback Cheers, sebastien. Chris Dean a ?crit : > S?bastien Saint-Sevin writes: >> While using cl-ppcre:split recently, I discover that when the regex >> match at pos 0, the function returns an empty string in first pos, >> where I think it should not as I do not consider the empty string >> being a substring of the original one. >> >> Ex : (cl-ppcre:split "\\s+" " foo bar baz ") ==> ("" "foo" "bar" "baz") > > It is an interesting question, but I believe that the current split > behavior of the returning the leading empty string is the rational > behavior. In mind my in comes down to the definition of split > "returns a list of the substrings between the matches". > > Having said that I often have real-world needs to *not* have the > leading string around. I wish there were explicit keyword args to > omit any leading and trailing empty strings. If I get motivated, I > might even make a patch! Perl's version of split doesn't have keyword > args so it tries to fit several behavior changes into its arguments. > > Here's some more practical advice: If you know your problem domain > well, you can try the inverse match trick. Instead of calling SPLIT, > call ALL-MATCHES-AS-STRINGS with the inverse regex. In this case: > > (all-matches-as-strings "\\S+" " foo bar baz ") => ("foo" "bar" "baz") > > (This will skip internal empty strings in the general case, but > doesn't matter for your example case.) > > It's also easy to also write your own split that does what you want. > An untested version is below. > > Cheers, > Chris Dean > > > (defun simple-split (regex target-string) > "A simple version of split that doesn't handle registers in any > special way and discards leading and trailing empty matches. > Untested!" > (let ((res nil) ; The result > (last-end 0)) ; The end positon of the last match > (cl-ppcre:do-matches (mstart mend regex target-string) > (unless (zerop mstart) > (push (subseq target-string last-end mstart) res)) > (setf last-end mend)) > (when (< last-end (length target-string)) > (push (subseq target-string last-end) res)) > (nreverse res))) > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel >