[flexi-streams-devel] *substitution-char* does not suppress external-format-encoding-error

Anton Vodonosov avodonosov at yandex.ru
Thu Feb 9 22:19:43 UTC 2012


To make these two aspects - length calculation and error recovery - consistent,
the following approach may be good:

Length calculation never signals encoding error. Instead, it takes into
account that wrong byte sequences may be replaced by a character,
provided via *substitution-char* or use-value restart. I.e. every wrong
byte sequence is counted as one character.

In decoding process which follows the length calculation two cases
are possible:
1. some error is not recovered (no *substitution-char* provided
or use-value 
    restait doesn't matter what length was calculated
2.



10.02.2012, 01:21, "Edi Weitz" <edi at agharta.de>:
> Sorry for the delay.  I think this is more or less "on purpose."
> (It's been a while since I wrote that stuff...)
>
> The recover-from-encoding-error helper function is used when during
> decoding we encounter something which "looks like" a character (so to
> say) but isn't one - in which case we can e.g. replace it with the
> substitution character.
>
> I think the error you mention happens earlier - when the length is checked.
>
> Of course, one could argue that one could just as well use the same
> restart here.  Maybe you can just submit a patch (including
> documentation if needed and ideally with new tests) and convince Hans
> to make a new release?
>
> Thanks,
> Edi.
>
> On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov <divanov11 at gmail.com> wrote:
>
>>  Hello folks,
>>
>>  I have bumped into the following error while playing with Hunchentoot.
>>  (It is originated from url-decoding GET parameters with
>>   *hunchentoot-default-external-format*.)
>>
>>  (let ((flex:*substitution-char* #\?))
>>   (flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8))
>>  => "??"
>>
>>  (let ((flex:*substitution-char* #\?))
>>   (flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format
>>  :utf-8))
>>  -> signals: This sequence can't be decoded using UTF-8 as it is too short.
>>  1
>>  octet missing at then end.
>>
>>  The reason is rather "simple": the decoder invokes the following chain of calls:
>>   compute-number-of-chars -> check-end -> signal-encoding-error
>>
>>  This contrasts to the most of decoder code, which directly calls
>>    recover-from-encoding-error
>>  instead of
>>   signal-encoding-error.
>>  --
>>  Sincerely,
>>  Dmitriy Ivanov
>>  lisp.ystok.ru
>>
>>  _______________________________________________
>>  flexi-streams-devel mailing list
>>  flexi-streams-devel at common-lisp.net
>>  http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
>
> _______________________________________________
> flexi-streams-devel mailing list
> flexi-streams-devel at common-lisp.net
> http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel




More information about the Flexi-streams-devel mailing list