[flexi-streams-devel] *substitution-char* does not suppress external-format-encoding-error

Anton Vodonosov avodonosov at yandex.ru
Thu Feb 9 22:25:17 UTC 2012


[Sorry, accidentially hit Enter and sent unfinished letter.
 So, once again: ]

To make these two aspects - length calculation and error recovery - consistent,
the following approach may be good:

Length calculation never signals encoding error. Instead, it takes into
account that wrong byte sequences may be replaced by a character,
provided via *substitution-char* or use-value restart. I.e. every wrong
byte sequence is counted as one character.

In decoding process which follows the length calculation two cases
are possible:
1. some error is not recovered (no *substitution-char* provided
   or use-value invoked). The decoding fails completely and it 
   doesn't matter what length was calculated.
2. All the wrong sequences were substituted. In this case
   the length where all the wrong sequences are counted as
   one character exactly matches the need of decoding process.

Unfortunately I can not work on patch for this now and in the near future.

Best regards,
- Anton

10.02.2012, 01:21, "Edi Weitz" <edi at agharta.de>:
> Sorry for the delay.  I think this is more or less "on purpose."
> (It's been a while since I wrote that stuff...)
>
> The recover-from-encoding-error helper function is used when during
> decoding we encounter something which "looks like" a character (so to
> say) but isn't one - in which case we can e.g. replace it with the
> substitution character.
>
> I think the error you mention happens earlier - when the length is checked.
>
> Of course, one could argue that one could just as well use the same
> restart here.  Maybe you can just submit a patch (including
> documentation if needed and ideally with new tests) and convince Hans
> to make a new release?
>
> Thanks,
> Edi.
>
> On Sat, Jan 21, 2012 at 1:06 PM, Dmitriy Ivanov <divanov11 at gmail.com> wrote:
>
>>  Hello folks,
>>
>>  I have bumped into the following error while playing with Hunchentoot.
>>  (It is originated from url-decoding GET parameters with
>>   *hunchentoot-default-external-format*.)
>>
>>  (let ((flex:*substitution-char* #\?))
>>   (flex:octets-to-string #(#xC1 #xC2 #xC3 #xC4) :external-format :utf-8))
>>  => "??"
>>
>>  (let ((flex:*substitution-char* #\?))
>>   (flex:octets-to-string #(#xC0 #xC1 #xC2 #xC3 #xC4) :external-format
>>  :utf-8))
>>  -> signals: This sequence can't be decoded using UTF-8 as it is too short.
>>  1
>>  octet missing at then end.
>>
>>  The reason is rather "simple": the decoder invokes the following chain of calls:
>>   compute-number-of-chars -> check-end -> signal-encoding-error
>>
>>  This contrasts to the most of decoder code, which directly calls
>>    recover-from-encoding-error
>>  instead of
>>   signal-encoding-error.
>>  --
>>  Sincerely,
>>  Dmitriy Ivanov
>>  lisp.ystok.ru
>>
>>  _______________________________________________
>>  flexi-streams-devel mailing list
>>  flexi-streams-devel at common-lisp.net
>>  http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel
>
> _______________________________________________
> flexi-streams-devel mailing list
> flexi-streams-devel at common-lisp.net
> http://lists.common-lisp.net/cgi-bin/mailman/listinfo/flexi-streams-devel





More information about the Flexi-streams-devel mailing list