[hunchentoot-devel] Charset assumptions, in particular in POST bodies

Hans Hübner hans.huebner at gmail.com
Sat Jun 5 22:28:16 UTC 2010


Red,

I'd suggest that you make yourself thoroughly familiar with the
relevant RFCs and supply a patch once you are sure that Hunchentoot is
buggy.  I know that there are some places in Hunchentoot that assume
Latin 1 encoding, but I also faintly remember that I have checked RFC
conformance in some of these cases years ago.  Additionally, before
changing Hunchentoot, it'd be very nice to have a case that exposes
non-conformant behavior.  I'm not saying that Hunchentoot is bug free,
but clients are generally buggy as well and we don't want to cater for
buggy clients in general.

-Hans

On Sat, Jun 5, 2010 at 20:57, Red Daly <reddaly at gmail.com> wrote:
> About 6 months ago I got some strange encoding errors with a
> Hunchentoot web server.  There are a few of places in Hunchentoot
> where the +latin-1+ character encoding is used as the external format
> regardless of headers received from the client:
>
> - GET-POST-DATA returns a +latin-1+ externally encoded stream no
> matter what when the WANT-STREAM parameter is true.
> - PARSE-MULTIPART-FORM-DATA creates a +latin-1+ stream from the
> CONTENT-STREAM of the request.  (relevant RFC: 2388)
> - MAYBE-READ-POST-PARAMETERS uses +latin-1+ to process
> "application/x-www-form-urlencoded" content-type POST bodies
>
> In addition, RECOMPUTE-REQUEST-PARAMETERS seems to interpret both the
> message body and the query string according to a charset in the
> request header.  I thought that Content-Type was only supposed to
> affect the message body, not the headers (which are assumed to be in
> ASCII).  Then shouldn't the URL and query string always be read as
> ASCII?  RFC2047 discusses non-ascii headers for MIME, but I don't know
> if that is relevant except for parsing multipart forms.
>
> I'm not thoroughly versed in the HTTP protocol, but it seems that
> these are bugs in Hunchentoot.  I have a half-completed patch but I
> want to get some more opinions before I go any further.  There may
> also be other lurking encoding issues in Hunchentoot, or I may be
> entirely mistaken.
>
> Proposed solution:
> - GET-POST-DATA, PARSE-MULTIPART-FORM-DATA, and
> MAYBE-READ-POST-PARAMETERS should respect the Content-Type header in
> the request and use that to define the external-format of the stream
> used to parse
> - RECOMPUTE-REQUEST-PARAMETERS should only use the Content-Type
> external format to parse the post parameters
> - PARSE-MULTIPART-FORM-DATA may need additional review to be in
> accordance with RFC2047 and RFC2388
>
> Feedback, please.
>
> Thanks,
> Red
>
> _______________________________________________
> tbnl-devel site list
> tbnl-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/tbnl-devel
>




More information about the Tbnl-devel mailing list