From edi at agharta.de Thu Jul 3 08:47:12 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 03 Jul 2008 10:47:12 +0200 Subject: [cl-ppcre-devel] New release 1.4.0 Message-ID: ChangeLog: Version 1.4.0 2008-07-03 Replaced hash tables with charsets (by Nikodemus Siivola) Get rid of duplicates in REGEX-APROPOS(-LIST) Download: http://weitz.de/cl-ppcre.tar.gz Edi. From edi at agharta.de Thu Jul 3 09:38:52 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 03 Jul 2008 11:38:52 +0200 Subject: [cl-ppcre-devel] New release 1.4.1 Message-ID: ChangeLog: Version 1.4.1 2008-07-03 Skip non-characters in CREATE-RANGES-FROM-SET Download: http://weitz.de/files/cl-ppcre.tar.gz Edi. From edi at agharta.de Thu Jul 3 09:41:30 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 03 Jul 2008 11:41:30 +0200 Subject: [cl-ppcre-devel] New release 1.4.0 In-Reply-To: (Edi Weitz's message of "Thu, 03 Jul 2008 10:47:12 +0200") References: Message-ID: On Thu, 03 Jul 2008 10:47:12 +0200, Edi Weitz wrote: > Download: > > http://weitz.de/cl-ppcre.tar.gz Woops, should have been: http://weitz.de/files/cl-ppcre.tar.gz Sorry for the noise today... Edi. From edi at agharta.de Thu Jul 3 13:24:08 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 03 Jul 2008 15:24:08 +0200 Subject: [cl-ppcre-devel] 1.4.0/1 performance Message-ID: Hi, The recent 1.4.x release which replaced hash tables with a "charset" implementation by Nikodemus was meant to make scanner creation cheaper while not sacrificing matching performance (ideally increasing it). I have to admit I haven't tested this a lot before releasing it (except for correctness), but there seems to be some evidence that scanners are significantly slower now for some Lisps. I'd be interested in your experiences on different platforms. This change should only affect regular expressions with character classes and it should only make a difference for large values of *REGEX-CHAR-CODE-LIMIT*. If you see significant changes in that area, good or bad, please let me know, including your Lisp and OS. If the majority sees a degradation, I might just revoke this change. Thanks, Edi. From dave.pawson at gmail.com Mon Jul 7 09:17:12 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Mon, 7 Jul 2008 10:17:12 +0100 Subject: [cl-ppcre-devel] Bug? Message-ID: <711a73df0807070217j1906108dmb063b20f412a0224@mail.gmail.com> Input text * Representatives from & versions) & , , Docs & Spreadsheets and Microsoft Office. code (defun msg(message) (format t "~%~A~%" message) ) (defun cleanup (line) "Replace < and & with entities" (msg (cl-ppcre:regex-replace-all "&" line "&")) (msg (cl-ppcre:regex-replace-all "<" line "<")) ) The first message replaces & The second shows that & has been changed back to & * Representatives from & versions) & , , Docs & Spreadsheets and Microsoft Office. * Representatives from <Sun> & <Novell> versions) & <Symphony>, <KOffice>, <Google> Docs & Spreadsheets and Microsoft Office. regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From edi at agharta.de Mon Jul 7 09:25:51 2008 From: edi at agharta.de (Edi Weitz) Date: Mon, 07 Jul 2008 11:25:51 +0200 Subject: [cl-ppcre-devel] Bug? In-Reply-To: <711a73df0807070217j1906108dmb063b20f412a0224@mail.gmail.com> (Dave Pawson's message of "Mon, 7 Jul 2008 10:17:12 +0100") References: <711a73df0807070217j1906108dmb063b20f412a0224@mail.gmail.com> Message-ID: On Mon, 7 Jul 2008 10:17:12 +0100, "Dave Pawson" wrote: > Input text > > * Representatives from & versions) & , > , Docs & Spreadsheets and Microsoft Office. > > code > > (defun msg(message) > (format t "~%~A~%" message) > ) > > (defun cleanup (line) > "Replace < and & with entities" > (msg (cl-ppcre:regex-replace-all "&" line "&")) > (msg (cl-ppcre:regex-replace-all "<" line "<")) > ) > > > The first message replaces & > The second shows that & has been changed back to & You're applying REGEX-REPLACE-ALL to LINE in both cases. You probably want to apply it to the result of the first invocation in the second case instead. From dave.pawson at gmail.com Mon Jul 7 09:32:21 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Mon, 7 Jul 2008 10:32:21 +0100 Subject: [cl-ppcre-devel] Bug? In-Reply-To: References: <711a73df0807070217j1906108dmb063b20f412a0224@mail.gmail.com> Message-ID: <711a73df0807070232uc282e32odf57fe13718a923f@mail.gmail.com> 2008/7/7 Edi Weitz : >> (defun cleanup (line) >> "Replace < and & with entities" >> (msg (cl-ppcre:regex-replace-all "&" line "&")) >> (msg (cl-ppcre:regex-replace-all "<" line "<")) >> ) >> > You're applying REGEX-REPLACE-ALL to LINE in both cases. You probably > want to apply it to the result of the first invocation in the second > case instead. Oh dear. Sorry for wasting bandwidth. Thanks Edi. regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From dave.pawson at gmail.com Mon Jul 7 11:22:40 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Mon, 7 Jul 2008 12:22:40 +0100 Subject: [cl-ppcre-devel] Using scan Message-ID: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> file:///files/lisp/email/cl-ppcre/doc/index.html#scan On success returns four values and 2 arrays. But not in a list? How to access these values please? TIA -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From gwking at metabang.com Mon Jul 7 11:39:32 2008 From: gwking at metabang.com (Gary King) Date: Mon, 7 Jul 2008 07:39:32 -0400 Subject: [cl-ppcre-devel] Using scan In-Reply-To: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> References: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> Message-ID: Hi Dave, If I understand your question correctly, you want to look at multiple- value-bind. This lets you bind multiple return values (hence the name :->). E.g., (completely untested code) (mulitple-value-bind (a b c d array-1 array-2) (cl-ppcre:scan ...) ;; now a, b, c, d, array-1 and array-2 are bound to the values returned from scan ...) HTH On Jul 7, 2008, at 7:22 AM, Dave Pawson wrote: > file:///files/lisp/email/cl-ppcre/doc/index.html#scan > > On success returns four values and 2 arrays. > But not in a list? > > How to access these values please? > > > TIA > > -- > Dave Pawson > XSLT XSL-FO FAQ. > http://www.dpawson.co.uk > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel -- Gary Warren King, metabang.com Cell: (413) 559 8738 Fax: (206) 338-4052 gwkkwg on Skype * garethsan on AIM From dave.pawson at gmail.com Mon Jul 7 12:14:31 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Mon, 7 Jul 2008 13:14:31 +0100 Subject: [cl-ppcre-devel] Using scan In-Reply-To: References: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> Message-ID: <711a73df0807070514g7a4eced5jeeae57f758a3ae6f@mail.gmail.com> 2008/7/7 Gary King : > Hi Dave, > > If I understand your question correctly, you want to look at > multiple-value-bind. This lets you bind multiple return values (hence the > name :->). E.g., (completely untested code) > > (mulitple-value-bind (a b c d array-1 array-2) > (cl-ppcre:scan ...) > ;; now a, b, c, d, array-1 and array-2 are bound to the values returned > from scan Simpler question then. How to test for a simple, full match? regex="ABC" testString="ABC" scan or scan-to-strings seems to be the correct choice? regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From gwking at metabang.com Mon Jul 7 13:45:11 2008 From: gwking at metabang.com (Gary King) Date: Mon, 7 Jul 2008 09:45:11 -0400 Subject: [cl-ppcre-devel] Using scan In-Reply-To: <711a73df0807070514g7a4eced5jeeae57f758a3ae6f@mail.gmail.com> References: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> <711a73df0807070514g7a4eced5jeeae57f758a3ae6f@mail.gmail.com> Message-ID: <9F56BA9D-ED4D-4851-A6FA-E04F7E6344D8@metabang.com> Try something like this: (defun full-match-p (regex string) (multiple-value-bind (start end array-1 array-2) (cl-ppcre:scan regex string) (declare (ignore array-1 array-2)) (and (= start 0) (= end (length string))))) On Jul 7, 2008, at 8:14 AM, Dave Pawson wrote: > 2008/7/7 Gary King : >> Hi Dave, >> >> If I understand your question correctly, you want to look at >> multiple-value-bind. This lets you bind multiple return values >> (hence the >> name :->). E.g., (completely untested code) >> >> (mulitple-value-bind (a b c d array-1 array-2) >> (cl-ppcre:scan ...) >> ;; now a, b, c, d, array-1 and array-2 are bound to the values >> returned >> from scan > > Simpler question then. > > How to test for a simple, full match? > > regex="ABC" testString="ABC" > > scan or scan-to-strings seems to be the correct choice? > > regards > > > -- > Dave Pawson > XSLT XSL-FO FAQ. > http://www.dpawson.co.uk > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel -- Gary Warren King, metabang.com Cell: (413) 559 8738 Fax: (206) 338-4052 gwkkwg on Skype * garethsan on AIM From edi at agharta.de Mon Jul 7 13:48:32 2008 From: edi at agharta.de (Edi Weitz) Date: Mon, 07 Jul 2008 15:48:32 +0200 Subject: [cl-ppcre-devel] Using scan In-Reply-To: <9F56BA9D-ED4D-4851-A6FA-E04F7E6344D8@metabang.com> (Gary King's message of "Mon, 7 Jul 2008 09:45:11 -0400") References: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> <711a73df0807070514g7a4eced5jeeae57f758a3ae6f@mail.gmail.com> <9F56BA9D-ED4D-4851-A6FA-E04F7E6344D8@metabang.com> Message-ID: On Mon, 7 Jul 2008 09:45:11 -0400, Gary King wrote: > Try something like this: > > (defun full-match-p (regex string) > (multiple-value-bind (start end array-1 array-2) > (cl-ppcre:scan regex string) > (declare (ignore array-1 array-2)) > (and (= start 0) > (= end (length string))))) Simpler: (scan "^ABC$" "ABC") (scan "^ABC$" "ABCD") The first return value of SCAN serves as a generalized boolean. From edi at agharta.de Mon Jul 7 13:49:42 2008 From: edi at agharta.de (Edi Weitz) Date: Mon, 07 Jul 2008 15:49:42 +0200 Subject: [cl-ppcre-devel] Using scan In-Reply-To: <9F56BA9D-ED4D-4851-A6FA-E04F7E6344D8@metabang.com> (Gary King's message of "Mon, 7 Jul 2008 09:45:11 -0400") References: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> <711a73df0807070514g7a4eced5jeeae57f758a3ae6f@mail.gmail.com> <9F56BA9D-ED4D-4851-A6FA-E04F7E6344D8@metabang.com> Message-ID: Oh, and BTW... On Mon, 7 Jul 2008 09:45:11 -0400, Gary King wrote: > (multiple-value-bind (start end array-1 array-2) > (cl-ppcre:scan regex string) > (declare (ignore array-1 array-2)) > ...) That's equivalent to (multiple-value-bind (start end) (cl-ppcre:scan regex string) ...) Edi. From gwking at metabang.com Mon Jul 7 14:01:07 2008 From: gwking at metabang.com (Gary King) Date: Mon, 7 Jul 2008 10:01:07 -0400 Subject: [cl-ppcre-devel] Using scan In-Reply-To: References: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> <711a73df0807070514g7a4eced5jeeae57f758a3ae6f@mail.gmail.com> <9F56BA9D-ED4D-4851-A6FA-E04F7E6344D8@metabang.com> Message-ID: <7FF14E22-32D7-4021-B6ED-9530E63566DF@metabang.com> Thanks Edi, (I can never remember whether mvb is nice about ignoring "unclaimed" values). The use of ^ and $ is, of couse, also much preferred. On Jul 7, 2008, at 9:49 AM, Edi Weitz wrote: > Oh, and BTW... > > On Mon, 7 Jul 2008 09:45:11 -0400, Gary King > wrote: > >> (multiple-value-bind (start end array-1 array-2) >> (cl-ppcre:scan regex string) >> (declare (ignore array-1 array-2)) >> ...) > > That's equivalent to > > (multiple-value-bind (start end) > (cl-ppcre:scan regex string) > ...) > > Edi. > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel -- Gary Warren King, metabang.com Cell: (413) 559 8738 Fax: (206) 338-4052 gwkkwg on Skype * garethsan on AIM From dave.pawson at gmail.com Mon Jul 7 15:04:47 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Mon, 7 Jul 2008 16:04:47 +0100 Subject: [cl-ppcre-devel] Using scan In-Reply-To: <7FF14E22-32D7-4021-B6ED-9530E63566DF@metabang.com> References: <711a73df0807070422h390a6dd6g8bebdb98efaf65ee@mail.gmail.com> <711a73df0807070514g7a4eced5jeeae57f758a3ae6f@mail.gmail.com> <9F56BA9D-ED4D-4851-A6FA-E04F7E6344D8@metabang.com> <7FF14E22-32D7-4021-B6ED-9530E63566DF@metabang.com> Message-ID: <711a73df0807070804p19a2bac9ub4d20236be05b5de@mail.gmail.com> Getting there. Thanks for the hints (setq res (cl-ppcre:register-groups-bind (first second ) ("^Topics messages ([0-9]+) through ([0-9]+)" "Topics messages 1 through 99" :sharedp t) (list first second ) )) (if res (msg(concatenate 'string "")) ) That gives me what I wanted. Now returns the marked up parse of the string or nil. I can work with that! Much appreciated. -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From ckonstanski at pippiandcarlos.com Thu Jul 17 14:55:23 2008 From: ckonstanski at pippiandcarlos.com (Carlos Konstanski) Date: Thu, 17 Jul 2008 08:55:23 -0600 Subject: [cl-ppcre-devel] slow matching Message-ID: <18559.23899.597167.255731@sphinktop.pippiandcarlos.com> I was not on the list at the time that Edi posted the message "1.4.0/1 performance" which is why I am posting a new one instead of replying. I am using threaded SBCL 1.0.18 on gentoo amd64. On my platform, matching has slowed to a crawl. It takes 1.5 to 2 seconds to perform the following match: (cl-ppcre:all-matches-as-strings "^\\d\\d\\d\\d-\\d\\d-\\d\\d\\d\\d:\\d\\d:\\d\\d\.0$" ) I'm sure I could write this regex better. Nevertheless it runs hundreds of times slower than it did previously. I'll test on the same setup with x86 and give a report. Carlos Konstanski From edi at agharta.de Thu Jul 17 20:01:40 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 17 Jul 2008 22:01:40 +0200 Subject: [cl-ppcre-devel] slow matching In-Reply-To: <18559.23899.597167.255731@sphinktop.pippiandcarlos.com> (Carlos Konstanski's message of "Thu, 17 Jul 2008 08:55:23 -0600") References: <18559.23899.597167.255731@sphinktop.pippiandcarlos.com> Message-ID: On Thu, 17 Jul 2008 08:55:23 -0600, Carlos Konstanski wrote: > I was not on the list at the time that Edi posted the message > "1.4.0/1 performance" which is why I am posting a new one instead of > replying. > > I am using threaded SBCL 1.0.18 on gentoo amd64. On my platform, > matching has slowed to a crawl. It takes 1.5 to 2 seconds to > perform the following match: > > (cl-ppcre:all-matches-as-strings "^\\d\\d\\d\\d-\\d\\d-\\d\\d\\d\\d:\\d\\d:\\d\\d\.0$" ) > > I'm sure I could write this regex better. Nevertheless it runs > hundreds of times slower than it did previously. > > I'll test on the same setup with x86 and give a report. No need to do that, I just tested on x86 and it looks just as bad there. The good news is that I have a completely re-factored CL-PPCRE on my hard disk which will be released in a couple of days and which won't have this problem anymore, so stay tuned. Thanks for the report, Edi. From ckonstanski at pippiandcarlos.com Thu Jul 17 20:09:21 2008 From: ckonstanski at pippiandcarlos.com (Carlos Konstanski) Date: Thu, 17 Jul 2008 14:09:21 -0600 (MDT) Subject: [cl-ppcre-devel] slow matching In-Reply-To: References: <18559.23899.597167.255731@sphinktop.pippiandcarlos.com> Message-ID: On Thu, 17 Jul 2008, Edi Weitz wrote: > Date: Thu, 17 Jul 2008 22:01:40 +0200 > From: Edi Weitz > Reply-To: General interest list about cl-ppcre > > To: General interest list about cl-ppcre > Subject: Re: [cl-ppcre-devel] slow matching > > On Thu, 17 Jul 2008 08:55:23 -0600, Carlos Konstanski wrote: > >> I was not on the list at the time that Edi posted the message >> "1.4.0/1 performance" which is why I am posting a new one instead of >> replying. >> >> I am using threaded SBCL 1.0.18 on gentoo amd64. On my platform, >> matching has slowed to a crawl. It takes 1.5 to 2 seconds to >> perform the following match: >> >> (cl-ppcre:all-matches-as-strings "^\\d\\d\\d\\d-\\d\\d-\\d\\d\\d\\d:\\d\\d:\\d\\d\.0$" ) >> >> I'm sure I could write this regex better. Nevertheless it runs >> hundreds of times slower than it did previously. >> >> I'll test on the same setup with x86 and give a report. > > No need to do that, I just tested on x86 and it looks just as bad > there. The good news is that I have a completely re-factored CL-PPCRE > on my hard disk which will be released in a couple of days and which > won't have this problem anymore, so stay tuned. > > Thanks for the report, > Edi. Thanks for the quick reply. I'll forward this to my supervisor so he can see how much cooler it is to use open source software than . Carlos Konstanski From ckonstanski at pippiandcarlos.com Wed Jul 23 22:52:46 2008 From: ckonstanski at pippiandcarlos.com (Carlos Konstanski) Date: Wed, 23 Jul 2008 16:52:46 -0600 (MDT) Subject: [cl-ppcre-devel] slow matching In-Reply-To: References: <18559.23899.597167.255731@sphinktop.pippiandcarlos.com> Message-ID: On Thu, 17 Jul 2008, Edi Weitz wrote: > Date: Thu, 17 Jul 2008 22:01:40 +0200 > From: Edi Weitz > Reply-To: General interest list about cl-ppcre > > To: General interest list about cl-ppcre > Subject: Re: [cl-ppcre-devel] slow matching > > On Thu, 17 Jul 2008 08:55:23 -0600, Carlos Konstanski wrote: > >> I was not on the list at the time that Edi posted the message >> "1.4.0/1 performance" which is why I am posting a new one instead of >> replying. >> >> I am using threaded SBCL 1.0.18 on gentoo amd64. On my platform, >> matching has slowed to a crawl. It takes 1.5 to 2 seconds to >> perform the following match: >> >> (cl-ppcre:all-matches-as-strings "^\\d\\d\\d\\d-\\d\\d-\\d\\d\\d\\d:\\d\\d:\\d\\d\.0$" ) >> >> I'm sure I could write this regex better. Nevertheless it runs >> hundreds of times slower than it did previously. >> >> I'll test on the same setup with x86 and give a report. > > No need to do that, I just tested on x86 and it looks just as bad > there. The good news is that I have a completely re-factored CL-PPCRE > on my hard disk which will be released in a couple of days and which > won't have this problem anymore, so stay tuned. > > Thanks for the report, > Edi. I just noticed the new 2.0 version, and it works like a champ! Thanks so much for getting that released. I will bug the gentoo lisp herd to get it updated in their overlay. Your English is quite amazing, by the way. I was tempted to write my posts in German, but you can probably read my English more easily. Carlos Konstanski From edi at agharta.de Wed Jul 23 22:53:39 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 24 Jul 2008 00:53:39 +0200 Subject: [cl-ppcre-devel] CL-PPCRE 2.0.0, CL-INTERPOL 0.2.0, CL-UNICODE 0.1.0 Message-ID: This is a joint announcement for two significant updates and one brand-new library which are all somehow related. See the changelogs for all the details. http://weitz.de/cl-ppcre/ http://weitz.de/cl-unicode/ http://weitz.de/cl-interpol/ Edi. From edi at agharta.de Wed Jul 23 22:53:42 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 24 Jul 2008 00:53:42 +0200 Subject: [cl-ppcre-devel] CL-PPCRE 2.0.0, CL-INTERPOL 0.2.0, CL-UNICODE 0.1.0 Message-ID: This is a joint announcement for two significant updates and one brand-new library which are all somehow related. See the changelogs for all the details. http://weitz.de/cl-ppcre/ http://weitz.de/cl-unicode/ http://weitz.de/cl-interpol/ Edi. From edi at agharta.de Thu Jul 24 15:01:32 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 24 Jul 2008 17:01:32 +0200 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 Message-ID: ChangeLog: Version 0.1.1 2008-07-24 Make ADD-HANGUL-NAMES faster for ClozureCL Download: http://weitz.de/files/cl-unicode.tar.gz Edi. From dave.pawson at gmail.com Thu Jul 24 16:18:21 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Thu, 24 Jul 2008 17:18:21 +0100 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: References: Message-ID: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> So when will utf-8 be the natural encoding of ppcre? (Yes, I know I'm greedy) regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From edi at agharta.de Thu Jul 24 16:30:40 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 24 Jul 2008 18:30:40 +0200 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> (Dave Pawson's message of "Thu, 24 Jul 2008 17:18:21 +0100") References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> Message-ID: On Thu, 24 Jul 2008 17:18:21 +0100, "Dave Pawson" wrote: > So when will utf-8 be the natural encoding of ppcre? I don't understand the question. CL-PPCRE deals with strings and not with arrays of octets. From dave.pawson at gmail.com Thu Jul 24 16:39:40 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Thu, 24 Jul 2008 17:39:40 +0100 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> Message-ID: <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> 2008/7/24 Edi Weitz : > On Thu, 24 Jul 2008 17:18:21 +0100, "Dave Pawson" wrote: > >> So when will utf-8 be the natural encoding of ppcre? xml has long dealt with 'strings of characters' encoded in utf-8. That way I can include an umlaut, an arabic glyph or a chinese symbol Any reason lisp should not enjoy that level of internationalisation? regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From edi at agharta.de Thu Jul 24 16:59:17 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 24 Jul 2008 18:59:17 +0200 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> (Dave Pawson's message of "Thu, 24 Jul 2008 17:39:40 +0100") References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> Message-ID: On Thu, 24 Jul 2008 17:39:40 +0100, "Dave Pawson" wrote: > xml has long dealt with 'strings of characters' encoded in utf-8. I think you are confused. In Lisp, characters and strings are really characters and strings. CL-USER 4 > #\? #\? CL-USER 5 > (type-of *) CHARACTER CL-USER 6 > (char-name **) "Latin-Small-Letter-A-With-Diaeresis" If you want to convert between octets and characters (that's where encodings like UTF-8 make sense), most CL implementations have facilities for this out of the box. For portable solutions see for example here: http://weitz.de/flexi-streams/ http://common-lisp.net/project/babel/ > That way I can include an umlaut, an arabic glyph or a chinese > symbol See above. > Any reason lisp should not enjoy that level of internationalisation? It does already. HTH, Edi. From dave.pawson at gmail.com Thu Jul 24 17:09:51 2008 From: dave.pawson at gmail.com (Dave Pawson) Date: Thu, 24 Jul 2008 18:09:51 +0100 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> Message-ID: <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> 2008/7/24 Edi Weitz : > I think you are confused. In Lisp, characters and strings are really > characters and strings. > CL-USER 6 > (char-name **) > "Latin-Small-Letter-A-With-Diaeresis" Sorry ** doesn't look like u00e4 > > If you want to convert between octets and characters (that's where > encodings like UTF-8 make sense), most CL implementations have > facilities for this out of the box. For portable solutions see for > example here: > > http://weitz.de/flexi-streams/ > http://common-lisp.net/project/babel/ I don't want to convert, I want to read utf-8 from a file, work in 'characters', build them into strings and write them back to file, in utf-8 >> Any reason lisp should not enjoy that level of internationalisation? > > It does already. seems we have a different definition of 'working'. regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk From danielgackle at gmail.com Thu Jul 24 17:19:43 2008 From: danielgackle at gmail.com (Daniel Gackle) Date: Thu, 24 Jul 2008 11:19:43 -0600 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> Message-ID: <57952f8b0807241019w147250bdu2c1de41b11908c60@mail.gmail.com> < Sorry ** doesn't look like u00e4 > http://www.supelec.fr/docs/cltl/clm/node181.html Daniel On Thu, Jul 24, 2008 at 11:09 AM, Dave Pawson wrote: > 2008/7/24 Edi Weitz : > > > I think you are confused. In Lisp, characters and strings are really > > characters and strings. > > > CL-USER 6 > (char-name **) > > "Latin-Small-Letter-A-With-Diaeresis" > > Sorry ** doesn't look like u00e4 > > > > > > > If you want to convert between octets and characters (that's where > > encodings like UTF-8 make sense), most CL implementations have > > facilities for this out of the box. For portable solutions see for > > example here: > > > > http://weitz.de/flexi-streams/ > > http://common-lisp.net/project/babel/ > > I don't want to convert, I want to read utf-8 from a file, > work in 'characters', build them into strings > and write them back to file, in utf-8 > > > > > >> Any reason lisp should not enjoy that level of internationalisation? > > > > It does already. > > > seems we have a different definition of 'working'. > > regards > > > > -- > Dave Pawson > XSLT XSL-FO FAQ. > http://www.dpawson.co.uk > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hans at huebner.org Thu Jul 24 19:25:03 2008 From: hans at huebner.org (=?ISO-8859-1?Q?Hans_H=FCbner?=) Date: Thu, 24 Jul 2008 21:25:03 +0200 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> Message-ID: Dave, sorry to be harsh, but the problem here is that you don't understand external formats and how they relate to characters. Most modern Lisps use Unicode as their character set, and most of them represent characters as 16 or 32 bit integers internally. UTF-8, contrasted to that, is an external encoding scheme for Unicode characters, and again, most Lisps support reading and writing characters in UTF-8 encoding. The external format of files read and written is usually specified using the :external-format keyword argument to functions like OPEN, WITH-OPEN-FILE etc. Also, there are portability libraries like BABEL that can be helpful to convert Lisp strings to arbitary external formats, for example when calling foreign functions or reading and writing binary files. CL-PPCRE uses Lisp characters and strings and works with Unicode characters just fine. The CL-UNICODE library is a portability library for working with Unicode directly, but most users never really need to do that. Please read up on external formats in your Lisp implementation's manual. -Hans On Thu, Jul 24, 2008 at 19:09, Dave Pawson wrote: > 2008/7/24 Edi Weitz : > >> I think you are confused. In Lisp, characters and strings are really >> characters and strings. > >> CL-USER 6 > (char-name **) >> "Latin-Small-Letter-A-With-Diaeresis" > > Sorry ** doesn't look like u00e4 > > > >> >> If you want to convert between octets and characters (that's where >> encodings like UTF-8 make sense), most CL implementations have >> facilities for this out of the box. For portable solutions see for >> example here: >> >> http://weitz.de/flexi-streams/ >> http://common-lisp.net/project/babel/ > > I don't want to convert, I want to read utf-8 from a file, > work in 'characters', build them into strings > and write them back to file, in utf-8 > > > > >>> Any reason lisp should not enjoy that level of internationalisation? >> >> It does already. > > > seems we have a different definition of 'working'. > > regards > > > > -- > Dave Pawson > XSLT XSL-FO FAQ. > http://www.dpawson.co.uk > _______________________________________________ > cl-ppcre-devel site list > cl-ppcre-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/cl-ppcre-devel > From ctdean at sokitomi.com Thu Jul 24 19:41:14 2008 From: ctdean at sokitomi.com (Chris Dean) Date: Thu, 24 Jul 2008 12:41:14 -0700 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> (Dave Pawson's message of "Thu, 24 Jul 2008 18:09:51 +0100") References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> Message-ID: "Dave Pawson" writes: > I don't want to convert, I want to read utf-8 from a file, work in > 'characters', build them into strings and write them back to file, > in utf-8 This just works. You probably need to use external-format with OPEN (or more likely WITH-OPEN-FILE) to indicate the encoding you are using. This will read one line of file in LispWorks: (with-open-file (in file-name :external-format :utf-8 :element-type 'character) (read-line in)) > seems we have a different definition of 'working'. Please explain what doesn't work. Maybe a code sample would help. Cheers, Chris Dean From edi at agharta.de Thu Jul 24 19:44:33 2008 From: edi at agharta.de (Edi Weitz) Date: Thu, 24 Jul 2008 21:44:33 +0200 Subject: [cl-ppcre-devel] New CL-UNICODE release 0.1.1 In-Reply-To: <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> (Dave Pawson's message of "Thu, 24 Jul 2008 18:09:51 +0100") References: <711a73df0807240918p56a824bbg261dcdcbc7520520@mail.gmail.com> <711a73df0807240939q8876874ib25a20fb18321ed6@mail.gmail.com> <711a73df0807241009sc031169l4c2168fd7121d43@mail.gmail.com> Message-ID: On Thu, 24 Jul 2008 18:09:51 +0100, "Dave Pawson" wrote: > 2008/7/24 Edi Weitz : > >> I think you are confused. In Lisp, characters and strings are really >> characters and strings. > >> CL-USER 6 > (char-name **) >> "Latin-Small-Letter-A-With-Diaeresis" > > Sorry ** doesn't look like u00e4 Get a good book about Common Lisp and come back once you've understood the basic issues. http://www.lispworks.com/documentation/HyperSpec/Body/v__stst_.htm > I don't want to convert, I want to read utf-8 from a file, work in > 'characters', build them into strings and write them back to file, > in utf-8 Sigh... > seems we have a different definition of 'working'. Humor me - please give me a short description what I need to change to make UTF-8 "the natural encoding" of CL-PPCRE. I'm really looking forward to that.