[Ecls-list] Unicode 16-bits

Sun Feb 20 07:59:11 UTC 2011

On Sat, 19 Feb 2011 23:43:33 +0000
Juan Jose Garcia-Ripoll <juanjose.garciaripoll at googlemail.com> wrote:

> Would you find it useful to have an ECL that only supports character codes 0
> - 65535? That would make it probably easier to embed the part of the Unicode
> database associated to it (< 65535 bytes) and have a standalone executable.
> Executables would also be a bit faster and use less memory (16-bits vs
> 32-bits per character)

Would this be an option or would ECL internally use a 16-bit character
representation all the time when unicode support is enabled for the
build?

Also I understand that the representation would take less memory but
would it really be faster on 32-bit+ processors?  I know that some
processors (including older x86) have faster access times for 32-bit
than 16-bit or 8-bit values (i.e. some time back I had to adapt an
arcfour implementation to use 32-bit words rather than 8-bit ones for
the internal state, despite it only holding values between 0 and 255,
to enhance its performance).

I also admit that I have some code assuming a 32-bit representation,
but it's also ECL specific and could be adapted easily; I don't think
that I make use of any character above 65535 myself.  That said I have
no idea what input I might have to deal with eventually, it's
unpredictable.

As for the 65535 bytes output file limitation, is that more difficult
to fix?  Is it a toolchain-dependent issue which ECL has no control
over?

Thanks,
-- 
Matt