From mario.maio at libero.it Sat Apr 9 15:41:51 2011 From: mario.maio at libero.it (Mario Maio) Date: Sat, 09 Apr 2011 17:41:51 +0200 Subject: [cl-ppcre-devel] Fwd: string length limit ? Message-ID: <4DA07E3F.2020708@libero.it> Sorry if this is a trivial issue, I'm a common lisp newbie. If I apply the following very simple command (replacing one or more consecutive CR chars with one LF char) (cl-ppcre:regex-replace-all (concatenate 'string (string #\return) "+") mystring (string #\linefeed)) to my string of 455079 characters (loaded from a utf-8 file), some of the last #\return characters are not substituted (even if they should, since if a apply again the command to the resulting string they ARE subsituted). It looks like in the search there is a sort of length limit, or maybe some string length mistake connected to multi-byte characters representation ? Cheers. Mario From edi at weitz.de Sat Apr 9 16:13:09 2011 From: edi at weitz.de (Edi Weitz) Date: Sat, 9 Apr 2011 18:13:09 +0200 Subject: [cl-ppcre-devel] Fwd: string length limit ? In-Reply-To: <4DA07E3F.2020708@libero.it> References: <4DA07E3F.2020708@libero.it> Message-ID: This is an issue of the "This should not happen" variety. Certainly, there is no such limit in CL-PPCRE. If you could provide us (i.e. the mailing list) with a self-contained test case that demonstrates the problem in a reproducible way, I'll look into it. Please also make sure to let us know which Lisp on which OS you are using and which version of CL-PPCRE. Thanks, Edi. On Sat, Apr 9, 2011 at 5:41 PM, Mario Maio wrote: > Sorry if this is a trivial issue, I'm a common lisp newbie. > > If I apply the following very simple command (replacing one or more > consecutive CR chars with one LF char) > > (cl-ppcre:regex-replace-all (concatenate 'string (string #\return) "+") > mystring ?(string #\linefeed)) > > to my string of 455079 characters (loaded from a utf-8 file), some of > the last #\return characters are not substituted (even if they should, > since if a apply again the command to the resulting string they ARE > subsituted). > It looks like in the search there is a sort of length limit, or maybe some string length mistake connected to multi-byte characters representation ? > > Cheers. > > Mario > > > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel > > From mario.maio at libero.it Thu Apr 14 11:12:47 2011 From: mario.maio at libero.it (Mario Maio) Date: Thu, 14 Apr 2011 13:12:47 +0200 Subject: [cl-ppcre-devel] Fwd: string length limit ? In-Reply-To: References: <4DA07E3F.2020708@libero.it> Message-ID: <4DA6D6AF.7000901@libero.it> Well, I reinstalled my clisp/emacs/slime bundle switching to Lisp Cabinet and I was not able to replicate the problem, so everything's fine on that regard. But I have another question: how do I enter Unicode chars in the rexexp? For example I need to replace "whatever" with ?whatever?, I tried to replace "([^"\r\n]*)" with \u201c\1\u201d but it didn't work. I know I could generate and concatenate Unicode chars with Lisp, e.g. (code-char #x201c), but it'd be cleaner to do it directly inside the regexp. Thanks. Mario Il 09/04/2011 18:13, Edi Weitz ha scritto: > This is an issue of the "This should not happen" variety. Certainly, > there is no such limit in CL-PPCRE. If you could provide us (i.e. the > mailing list) with a self-contained test case that demonstrates the > problem in a reproducible way, I'll look into it. Please also make > sure to let us know which Lisp on which OS you are using and which > version of CL-PPCRE. > > Thanks, > Edi. > > > On Sat, Apr 9, 2011 at 5:41 PM, Mario Maio wrote: >> Sorry if this is a trivial issue, I'm a common lisp newbie. >> >> If I apply the following very simple command (replacing one or more >> consecutive CR chars with one LF char) >> >> (cl-ppcre:regex-replace-all (concatenate 'string (string #\return) "+") >> mystring (string #\linefeed)) >> >> to my string of 455079 characters (loaded from a utf-8 file), some of >> the last #\return characters are not substituted (even if they should, >> since if a apply again the command to the resulting string they ARE >> subsituted). >> It looks like in the search there is a sort of length limit, or maybe some string length mistake connected to multi-byte characters representation ? >> >> Cheers. >> >> Mario >> >> >> _______________________________________________ >> cl-ppcre-devel site list >> cl-ppcre-devel at common-lisp.net >> http://common-lisp.net/mailman/listinfo/cl-ppcre-devel >> >> > . > From edi at weitz.de Thu Apr 14 12:52:43 2011 From: edi at weitz.de (Edi Weitz) Date: Thu, 14 Apr 2011 14:52:43 +0200 Subject: [cl-ppcre-devel] Fwd: string length limit ? In-Reply-To: <4DA6D6AF.7000901@libero.it> References: <4DA07E3F.2020708@libero.it> <4DA6D6AF.7000901@libero.it> Message-ID: On Thu, Apr 14, 2011 at 1:12 PM, Mario Maio wrote: > But I have another question: how do I enter Unicode chars in the rexexp? > For example I need to replace "whatever" with ?whatever?, I tried to replace > > "([^"\r\n]*)" > > with > > \u201c\1\u201d > > but it didn't work. > > I know I could generate and concatenate Unicode chars with Lisp, e.g. > (code-char #x201c), but it'd be cleaner to do it directly inside the regexp. For a portable solution, you could give this a try: http://weitz.de/cl-interpol/ Edi. From mario.maio at libero.it Thu Apr 14 16:51:25 2011 From: mario.maio at libero.it (Mario Maio) Date: Thu, 14 Apr 2011 18:51:25 +0200 Subject: [cl-ppcre-devel] missing replacement Message-ID: <4DA7260D.2070704@libero.it> If i try this regexp (cl-ppcre:regex-replace-all " d {1-3}'" "la presa d 'aria" " d'" :preserve-case t) I don't get the expected replacement (remove the space before the apostrophe) "la presa d'aria" whereas Regex Coach works as expected. Thanks. Mario From hans.huebner at gmail.com Thu Apr 14 17:07:59 2011 From: hans.huebner at gmail.com (=?ISO-8859-1?Q?Hans_H=FCbner?=) Date: Thu, 14 Apr 2011 19:07:59 +0200 Subject: [cl-ppcre-devel] missing replacement In-Reply-To: <4DA7260D.2070704@libero.it> References: <4DA7260D.2070704@libero.it> Message-ID: On Thu, Apr 14, 2011 at 6:51 PM, Mario Maio wrote: > If i try this regexp > > (cl-ppcre:regex-replace-all " d {1-3}'" "la presa d 'aria" " d'" > :preserve-case t) > > I don't get the expected replacement (remove the space before the > apostrophe) The correct syntax is {1,3}, not {1-3}. -Hans