[cxml-devel] Corrupted UTF-8 input

Raymond Wiker rwiker at gmail.com
Wed Oct 16 15:35:13 UTC 2013


On Oct 16, 2013, at 17:12 , Patrick May <patrick.may at mac.com> wrote:
> Hi,
> 
> 	I'm using chtml for a simple experimental web crawler.  I'm occasionally getting this error (Slime output):
> 
> 0: (RUNES-ENCODING::XERROR "Corrupted UTF-8 input (initial byte was #b~8,'0B)" 255)
> 1: (#<STANDARD-METHOD RUNES-ENCODING:DECODE-SEQUENCE ((EQL :UTF-8) T T T T T ...)> :UTF-8 #(255 216 255 0 0 0 ...) 0 3 #(65535 0 0 0 0 0 ...) 0 8191 NIL)
> 2: (NIL #<Unknown Arguments>)
> 3: (#<STANDARD-METHOD RUNES::XSTREAM-UNDERFLOW (RUNES:XSTREAM)> #<RUNES:XSTREAM NIL>)
> 4: (SGML::READ-TOKEN #<RUNES:XSTREAM NIL> #<SGML::DTD (:PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN")>)
> 5: (SGML::READ-TOKEN* #<RUNES:XSTREAM NIL> #<SGML::DTD (:PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN")>)
> 6: (SGML:SGML-PARSE #<SGML::DTD (:PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN")> #<RUNES:XSTREAM NIL>)
> 7: (CLOSURE-HTML::PARSE-XSTREAM #<RUNES:XSTREAM NIL> #<CLOSURE-HTML:LHTML-BUILDER #x3020032CBF3D>)
> 
> Choosing the restart continuation seems to get past it, but I'd like to understand what's going on and how to automatically detect and work around it.
> 
> 	Any input appreciated.


How sure are you that the input is actually in UTF-8 format? What does the "restart" do?





More information about the cxml-devel mailing list