From johan at riise-data.no Sun Aug 27 21:29:17 2006 From: johan at riise-data.no (Johan Ur Riise) Date: Sun, 27 Aug 2006 23:29:17 +0200 Subject: [pg-cvs] pg.lisp UTF8 SBCL Message-ID: <20060827212917.GA5083@riise-data.no> Hello, this is my first bug report on pg.lisp Hmm, seems I registered on mailing list pg-cvs rather than pg-devel, http://common-lisp.net/mailman/listinfo/pg-devel lets me register on pg-cvs.. Context: I want to use unicode for the Postgresql database. The Postgresql version is 8.1, from Ubuntu. pg.lisp is from cvs The common lisp is SBCL 0.9.8 I have, with psql \l-command: autotest=> \l List of databases Name | Owner | Encoding -----------+----------+---------- autotest | postgres | UTF8 postgres | postgres | UTF8 template0 | postgres | UTF8 template1 | postgres | UTF8 (4 rows) I put "UTF8" in *pg-client-encoding* (setf *pg-client-encoding* "UTF8") With (pg-connect "autotest" "autotest" :password "autotest") I get this: Illegal :UTF-8 character starting at byte position 0. [Condition of type SB-IMPL::INVALID-UTF8-CONTINUATION-BYTE] Restarts: 0: [USE-VALUE] Supply a replacement string designator. 1: [ABORT-REQUEST] Abort handling SLIME request. 2: [TERMINATE-THREAD] Terminate this thread (#) Backtrace: 0: (SB-IMPL::DECODING-ERROR #(231 48 140 144) 0 1 :UTF-8 SB-IMPL::INVALID-UTF8-CONTINUATION-BYTE 1) Locals: SB-DEBUG::ARG-0 = 6 SB-DEBUG::ARG-1 = #(231 48 140 144) SB-DEBUG::ARG-2 = 0 SB-DEBUG::ARG-3 = 1 SB-DEBUG::ARG-4 = :UTF-8 SB-DEBUG::ARG-5 = SB-IMPL::INVALID-UTF8-CONTINUATION-BYTE SB-DEBUG::ARG-6 = 1 1: (SB-IMPL::BYTES-PER-UTF8-CHARACTER-AREF # # #) Locals: SB-DEBUG::ARG-0 = : SB-DEBUG::ARG-1 = : SB-DEBUG::ARG-2 = : 2: (SB-IMPL::UTF8->STRING-AREF #(231 48 140 144) 0 4) Locals: SB-DEBUG::ARG-0 = 3 SB-DEBUG::ARG-1 = #(231 48 140 144) SB-DEBUG::ARG-2 = 0 SB-DEBUG::ARG-3 = 4 3: ((SB-PCL::FAST-METHOD POSTGRESQL::READ-STRING-FROM-PACKET (POSTGRESQL::PG-PACKET INTEGER)) # # # 4) Locals: SB-DEBUG::ARG-0 = : SB-DEBUG::ARG-1 = : SB-DEBUG::ARG-2 = # SB-DEBUG::ARG-3 = 4 4: (POSTGRESQL::PG-CONNECT/V3 "autotest" "autotest" :HOST "localhost" :PORT 5432 :PASSWORD "autotest") 5: (PG-CONNECT # # :HOST # :PORT # :PASSWORD #) 6: (SB-INT:EVAL-IN-LEXENV (PG-CONNECT "autotest" "autotest" :PASSWORD "autotest") #) 7: (SWANK::EVAL-REGION "(pg-connect \"autotest\" \"autotest\" :password \"autotest\") I fooled around a little, and found that this is an R-packet handled in #'pg-connect/v3 in the #\R and (5) case, where "salt" is read with read-string-from-packet. So I fixed the read-string-from-packet method to use conversion from latin1 in case this was an #\R packet. Probably not the right place to do it, but it seems to work. jur at lark:/usr/local/lib/common-lisp/systems/pg-cvs$ diff -u v3-protocol.lisp.orig v3-protocol.lisp --- v3-protocol.lisp.orig 2006-08-27 20:27:51.000000000 +0200 +++ v3-protocol.lisp 2006-08-27 22:05:13.000000000 +0200 @@ -276,8 +276,12 @@ length)) (let* ((octects (read-octets-from-packet packet length)) - (string (convert-string-from-bytes octects))) - string))) + (string (convert-string-from-bytes octects + (if (eql #\R (pg-packet-type packet)) + (implementation-name-for-encoding "LATIN1") + (implementation-name-for-encoding *pg-client-encoding*))))) + string) + )) (defmethod read-octets-from-packet ((packet pg-packet) (length integer)) (let ((result (make-array length :element-type '(unsigned-byte 8)))) That is it works for SBCL, with CLISP I get PG[5]> (setf *pg-client-encoding* "UTF8") "UTF8" PG[6]> (pg-connect "autotest" "autotest" :password "autotest") *** - STRING=: argument # should be a string, a symbol or a character which I have no answer for. -- Hilsen Johan Ur Riise From erik.enge at gmail.com Mon Aug 28 13:39:39 2006 From: erik.enge at gmail.com (Erik Enge) Date: Mon, 28 Aug 2006 09:39:39 -0400 Subject: [pg-devel] Re: [pg-cvs] pg.lisp UTF8 SBCL In-Reply-To: <20060827212917.GA5083@riise-data.no> References: <20060827212917.GA5083@riise-data.no> Message-ID: <58f839b70608280639k25ea0c2ey2ced121b16de1223@mail.gmail.com> On 8/27/06, Johan Ur Riise wrote: > Hmm, seems I registered on mailing list pg-cvs rather than pg-devel, > http://common-lisp.net/mailman/listinfo/pg-devel lets me register on pg-cvs.. Sorry, lingering problem for some lists from an old problem. I fixed it and it shouldn't happen for this list again. Erik. From eric.marsden at free.fr Mon Aug 28 21:51:44 2006 From: eric.marsden at free.fr (Eric Marsden) Date: Mon, 28 Aug 2006 23:51:44 +0200 Subject: [pg-devel] Re: [pg-cvs] pg.lisp UTF8 SBCL In-Reply-To: <20060827212917.GA5083@riise-data.no> (Johan Ur Riise's message of "Sun, 27 Aug 2006 23:29:17 +0200") References: <20060827212917.GA5083@riise-data.no> Message-ID: <878xl84jgv.fsf@free.fr> >>>>> "jur" == Johan Ur Riise writes: jur> Context: I want to use unicode for the Postgresql database. jur> I put "UTF8" in *pg-client-encoding* jur> (setf *pg-client-encoding* "UTF8") jur> jur> With jur> (pg-connect "autotest" "autotest" :password "autotest") jur> I get this: jur> jur> Illegal :UTF-8 character starting at byte position 0. jur> [Condition of type SB-IMPL::INVALID-UTF8-CONTINUATION-BYTE] indeed, the multibyte support in pg-dot-lisp was broken. jur> I fooled around a little, and found that this is an R-packet jur> handled in #'pg-connect/v3 in the #\R and (5) case, where jur> "salt" is read with read-string-from-packet. jur> jur> So I fixed the read-string-from-packet method to use conversion jur> from latin1 in case this was an #\R packet. Probably not the jur> right place to do it, but it seems to work. I have updated the CVS repository with a change similar to yours, that also uses LATIN1 for a #\E packet and fixes the problem with CLISP (you were causing implementation-name-for-encoding to be called twice). I have tested it lightly in UTF8 mode with unicode-enabled SBCL and with CLISP, but I am far from confident that all aspects of the protocol parsing really know when they should be doing multibyte decoding, and when they should be assuming a latin-1 encoding. Thanks for the report! -- Eric Marsden From johan at riise-data.no Mon Aug 28 23:46:20 2006 From: johan at riise-data.no (Johan Ur Riise) Date: Tue, 29 Aug 2006 01:46:20 +0200 Subject: [pg-devel] Ref to anonymous cvs on http://common-lisp.net/project/pg/ Message-ID: <20060828234620.GA22573@riise-data.no> The reference "here" in "You can browse our CVS repository or download the current development tree via anonymous cvs, as described here" points to the page itself, that is http://common-lisp.net/project/pg/. -- Hilsen Johan Ur Riise From eric.marsden at free.fr Tue Aug 29 21:14:49 2006 From: eric.marsden at free.fr (Eric Marsden) Date: Tue, 29 Aug 2006 23:14:49 +0200 Subject: [pg-devel] Ref to anonymous cvs on http://common-lisp.net/project/pg/ In-Reply-To: <20060828234620.GA22573@riise-data.no> (Johan Ur Riise's message of "Tue, 29 Aug 2006 01:46:20 +0200") References: <20060828234620.GA22573@riise-data.no> Message-ID: <87bqq31bxy.fsf@free.fr> >>>>> "jur" == Johan Ur Riise writes: jur> The reference "here" in "You can browse our CVS repository jur> or download the current development tree via anonymous cvs, jur> as described here" points to the page itself, that is jur> http://common-lisp.net/project/pg/. thanks, fixed. -- Eric Marsden From johan at riise-data.no Thu Aug 31 23:32:45 2006 From: johan at riise-data.no (Johan Ur Riise) Date: Fri, 01 Sep 2006 01:32:45 +0200 Subject: [pg-devel] Non-unicode characters are present in pg.lisp Message-ID: <20060831233245.GA3653@riise-data.no> When I am at it... When I use my CLISP with no special configuration, it assumes files are in utf-8 encoding. Since the name of Johannes Gr??dem has #\CEDILLA and #\LATIN_CAPITAL_LETTER_A_WITH_TILDE in some iso-8559-variant (#xc3 #xb8), CLISP will not load it. I think that if you change these characters to utf-8, it will still be readable in a single-octet-per-letter encoding. By the way, I think he uses #\LATIN_SMALL_LETTER_O_WITH_STROKE to write his name. At least in my iso-8859-1 terminal, the characters do not look right anyway. I think it should be written (hex) 47 72 f8 64 65 6d in that encoding. -- Hilsen Johan Ur Riise