[cffi-devel] Road to 0.9.3 (and encoding issues)

Luis Oliveira luismbo at gmail.com
Tue Apr 17 00:10:33 UTC 2007


Hello,

The cffi-newtypes branch[1] is getting huge (~30 patches), and since it
seems nobody has complained much about the new type system, I'd like to
push it into the main branch along with the encoding support and other
minor features and bugfixes that have accumulated over last ~2 months.
Any objections?

Then, after some remaining issues are fixed, I'd like to release 0.9.3.

So here's my TODO list for 0.9.3:

  - <http://article.gmane.org/gmane.lisp.cffi.devel/1033> Figure out
    whether any of those issues are critical.  The only change I've made
    to the typesystem since I wrote that message was a new :class option
    to DEFSTRUCT;

  - iron out some issues with the new encoding support (see below, this
    might take a while);

  - finish the documentation of the new features;

  - integrate cffi-grovel into the CFFI tree and document it.


There are some issues with the current encoding support though:

  - James' original code had a CFFI-SYS:DEFAULT-ENCODING function which
    would use some implemention-specific way of determining what the
    default encoding should be.  Every implementation had a different
    way of determining that, so I though that simply picking one
    ourselves (say, :utf-8) would be better.

    Also, I changed that to be a special variable instead.  That way
    *DEFAULT-FOREIGN-ENCODING* can be bound to something else and affect
    the behaviour of the :string type and other string operations that
    don't explicitely specify an encoding.  (E.g., the behaviour of
    :STRING would be influenced at run-time by the value of *D-F-E*
    whereas (:STRING :ENCODING :UTF-8) would not.)

  - Allegro's %lisp-string-into-foreign overflows.  Allegro's
    EXCL:STRING-TO-NATIVE doesn't take a bufsize argument.  Another
    problem is that Allegro's %lisp-string-octet-length isn't very
    effective otherwise it'd be easier to check for an overflow.
    Anyway, this is not the worst issue.

  - Corman, SCL and ECL support is broken, not necessarily because of
    the new string stuff.  Also, a recent 1.1 snapshot of OpenMCL with
    unicode support is necessary;  is this a problem?


Last but definitely not least, there's a huge problem with error
semantics.  Everything works fine until you try to, e.g. convert a #\λ
into iso-8859-1.  Or #\ç into ascii.

Some Lisps treat ascii as a synonym for iso-8859-1.  Some silently
substitute inconvertible characters with #\? (or #\Sub) while others
will raise an exception.  Of those that do raise exceptions, some
provide a use-value substitution restart, others don't.  To sum it up,
it's horribly inconsistent.

Before forward porting James's cffi-encondings branch, we discussed the
possibility of doing the enconding conversion in portable CL, like what
flexi-streams does for instance.  I'm beginning to reconsider this idea.
One the one hand it would mean introducing a dependency for CFFI, on the
other hand it would provide consistent semantics across the various
Lisps (including useful semantics for lisps that don't support unicode)
we support and simplify CFFI-SYS.  Any thoughts?


[1] http://common-lisp.net/~loliveira/darcs/cffi-newtypes/

-- 
Luís Oliveira
http://student.dei.uc.pt/~lmoliv/




More information about the cffi-devel mailing list