From edi at agharta.de Tue Jan 9 23:45:57 2007 From: edi at agharta.de (Edi Weitz) Date: Wed, 10 Jan 2007 00:45:57 +0100 Subject: [drakma-devel] New Chunga release 0.2.2 Message-ID: ChangeLog: Version 0.2.2 2007-01-10 Faster vesion of READ-LINE* (provided by G?bor Melis) Download: http://weitz.de/files/chunga.tar.gz Cheers, Edi. From edi at agharta.de Wed Jan 17 00:52:59 2007 From: edi at agharta.de (Edi Weitz) Date: Wed, 17 Jan 2007 01:52:59 +0100 Subject: [drakma-devel] New Chunga release 0.2.3 Message-ID: Changelog: Version 0.2.3 2007-01-17 Guard against stray semicolons when reading name/value pairs (thanks to B?lent Murtezaoglu) Download: http://weitz.de/files/chunga.tar.gz Cheers, Edi. From ctdean at sokitomi.com Mon Jan 29 23:23:58 2007 From: ctdean at sokitomi.com (Chris Dean) Date: Mon, 29 Jan 2007 15:23:58 -0800 Subject: [drakma-devel] drakma vs. http://popurls.com Message-ID: I have what will probably end up being an obvious and foolish question. When I run (http-request "http://popurls.com/") I get an error from flexi-streams that says: Unexpected value #x20 in UTF-8 sequence. [Condition of type FLEXI-STREAM-ENCODING-ERROR] What's going on? Do I need to set external-format-in ? An abbreviated stack trace is below. Cheers, Chris Dean 1: (METHOD STREAM:STREAM-READ-CHAR (FLEXI-STREAMS::FLEXI-UTF-8-INPUT-STREAM)) (#) Locals: STREAM = # CLOS::.ISL. = #(#(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ...) #(FLEXI-STREAMS::LAST-CHAR-CODE FLEXI-STREAMS::LAST-OCTET) 126 0) CLOS::.PV. = #(5 6) FLEXI-STREAMS::FIRST-OCTET-SEEN = T OCTET = 194 FLEXI-STREAMS::START = 2 COUNT = 1 DBG::EXTRA-VALS = :DONT-KNOW FLEXI-STREAMS::RESULT = 2 DBG::|repeat-counter-| = 0 OCTET = 32 2: (METHOD TRIVIAL-GRAY-STREAMS:STREAM-READ-SEQUENCE (FLEXI-INPUT-STREAM T T T)) (# ... 3: CLOS::GENERIC-FUNCTION-DISCRIMINATOR NIL 4: DRAKMA::READ-BODY (# ((:DATE . "Mon, 29 Jan 2007 23:02:31 GMT") (:SERVER . "Apache") (:EXPIRES . "Mon, 26 Jul 1997 05:00:00 GMT") (:CACHE-CONTROL . "no-store, no-cache, must-revalidate,post-check=0, pre-check=0") (:PRAGMA . "no-cache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=UTF-8")) T #) Locals: STREAM = # DRAKMA::HEADERS = ((:DATE . "Mon, 29 Jan 2007 23:02:31 GMT") (:SERVER . "Apache") (:EXPIRES . "Mon, 26 Jul 1997 05:00:00 GMT") (:CACHE-CONTROL . "no-store, no-cache, must-revalidate,post-check=0, pre-check=0") (:PRAGMA . "no-cache") (:CONNECTION . "close") (:TRANSFER-ENCODING . "chunked") (:CONTENT-TYPE . "text/html; charset=UTF-8")) DRAKMA::MUST-CLOSE = T DRAKMA::TEXTP = # DRAKMA::CONTENT-LENGTH = NIL DRAKMA::ELEMENT-TYPE = LISPWORKS:SIMPLE-CHAR DRAKMA::CHUNKEDP = T DRAKMA::BUFFER = ... DRAKMA::RESULT = ... DRAKMA::INDEX = 49152 DRAKMA::POS = 8192 From edi at agharta.de Mon Jan 29 23:40:01 2007 From: edi at agharta.de (Edi Weitz) Date: Tue, 30 Jan 2007 00:40:01 +0100 Subject: [drakma-devel] drakma vs. http://popurls.com In-Reply-To: (Chris Dean's message of "Mon, 29 Jan 2007 15:23:58 -0800") References: Message-ID: On Mon, 29 Jan 2007 15:23:58 -0800, Chris Dean wrote: > I have what will probably end up being an obvious and foolish > question. No, that's not a foolish question. It's just that the website you're trying to visit has errors - see below. > When I run (http-request "http://popurls.com/") I get an error from > flexi-streams that says: > > Unexpected value #x20 in UTF-8 sequence. > [Condition of type FLEXI-STREAM-ENCODING-ERROR] > > What's going on? Do I need to set external-format-in ? According to http://validator.w3.org/check?uri=http%3A%2F%2Fpopurls.com%2F the website claims to be encoded as UTF-8 but contains octet sequences that are illegal in UTF-8. And that's why you get errors - Drakma looks at the headers sent by the server, believes what the server says, and tries to decode the body accordingly. You can work around this by using the FORCE-BINARY keyword argument, but then you end up with a bunch of octets... You should probably ask the operators of popurls.com to fix their site. Cheers, Edi. From ctdean at sokitomi.com Tue Jan 30 02:20:17 2007 From: ctdean at sokitomi.com (Chris Dean) Date: Mon, 29 Jan 2007 18:20:17 -0800 Subject: [drakma-devel] drakma vs. http://popurls.com In-Reply-To: (Edi Weitz's message of "Tue, 30 Jan 2007 00:40:01 +0100") References: Message-ID: Edi Weitz writes: > On Mon, 29 Jan 2007 15:23:58 -0800, Chris Dean wrote: > According to > > http://validator.w3.org/check?uri=http%3A%2F%2Fpopurls.com%2F > > the website claims to be encoded as UTF-8 but contains octet sequences > that are illegal in UTF-8. And that's why you get errors - That makes sense, and I'm glad to know that the error is on their end. > You should probably ask the operators of popurls.com to fix their > site. I certainly will do that, but I now have a larger problem. The problem is that I regularly download web pages and many of them are poorly formed. I'd like my software to be permissive and return something reasonable. Drakma is nicely designed and I'd like to keep using it. If I were to add this "feature" of less-strict UTF-8 where should I do that? I could modify (define-char-reader (stream flexi-utf-8-input-stream) ...) in some clever way I suppose. Cheers, Chris Dean From edi at agharta.de Tue Jan 30 07:52:45 2007 From: edi at agharta.de (Edi Weitz) Date: Tue, 30 Jan 2007 08:52:45 +0100 Subject: [drakma-devel] drakma vs. http://popurls.com In-Reply-To: (Chris Dean's message of "Mon, 29 Jan 2007 18:20:17 -0800") References: Message-ID: On Mon, 29 Jan 2007 18:20:17 -0800, Chris Dean wrote: > The problem is that I regularly download web pages and many of them > are poorly formed. I'd like my software to be permissive and return > something reasonable. Sure, I agree. > Drakma is nicely designed and I'd like to keep using it. If I were > to add this "feature" of less-strict UTF-8 where should I do that? > > I could modify (define-char-reader (stream flexi-utf-8-input-stream) > ...) in some clever way I suppose. My hope is that FLEXI-STREAMS is already "flexible" enough to deal with this: CL-USER 22 > (drakma:http-request "http://zappa.agharta.de/test.html") Error: Unexpected value #xF6 in UTF-8 sequence. 1 (abort) Return to level 0. 2 Return to top loop level 0. Type :b for backtrace, :c