[elephant-devel] 64 bit issues

Fri Aug 25 19:58:28 UTC 2006

I've attached at the end of the e-mail the approach taken by Rucksack. 
The theory is to, entirely within lisp, reduce all objects to a byte
sequence and only call out to C to do byte-writes into memory vectors. 
It might be a little slower, but it solves all these problems.  However,
including it is not backward compatible but philosophically strikes me
as the right way to address all these issues. 

Implementing option #1 means testing in lisp whether the
cl:most-positive-fixnum constant is > 2^32 and if so use the bignum tag
to implement a bignum style store.  This maintains backwards
compatibility but upgrades 64-bit fixnums to bignums during
serialization.  We can address a better serialization philosophy in
0.7.0.  In fact, I want a totally new serializer in 0.7.0 to clean up
these annoying issues and improve performance on non-persistent objects.

On the 64-bit lisp is buffer-write-uint writing 64-bits to the byte
stream or 32?  To address the %bignum-ref problem, you could go ahead
and use the consing ldb operation used for allegro and other lisps. 
That's easy.  The other way to go is to detect when you get 64-bit
values back from %bignum-ref and then write two uints in a row so 32-bit
lisps can read out the same stream as 64-bit lisps.  When deserializing
just know that all uints pulled from the stream are 32-bits, if the
buffer-write-uint function is in fact serializing 4 bytes and not 8.

Cheers,
Ian

>From serialize.lisp in Rucksack

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Integers
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defun serialize-byte-16 (integer stream)
  (serialize-byte (ldb (byte 8 0) integer) stream)
  (serialize-byte (ldb (byte 8 8) integer) stream))

(defun serialize-byte-24 (integer stream)
  (serialize-byte (ldb (byte 8 0) integer) stream)
  (serialize-byte (ldb (byte 8 8) integer) stream)
  (serialize-byte (ldb (byte 8 16) integer) stream))

(defun serialize-byte-32 (integer stream)
  (serialize-byte (ldb (byte 8 0) integer) stream)
  (serialize-byte (ldb (byte 8 8) integer) stream)
  (serialize-byte (ldb (byte 8 16) integer) stream)
  (serialize-byte (ldb (byte 8 24) integer) stream))

(defun serialize-byte-48 (integer stream)
  (multiple-value-bind (most-significant least-significant)
      (floor integer #x1000000)
    (serialize-byte-24 least-significant stream)
    (serialize-byte-24 most-significant stream)))

(defun serialize-byte-64 (integer stream)
  (multiple-value-bind (most-significant least-significant)
      (floor integer #x100000000)
    (serialize-byte-32 least-significant stream)
    (serialize-byte-32 most-significant stream)))

(defun deserialize-byte-16 (stream)
  (+ (deserialize-byte stream)
     (* (deserialize-byte stream) 256)))

(defun deserialize-byte-24 (stream)
  (+ (deserialize-byte stream)
     (* (deserialize-byte stream) #x100)
     (* (deserialize-byte stream) #x10000)))

(defun deserialize-byte-32 (stream)
  (+ (deserialize-byte stream)
     (* (deserialize-byte stream) #x100)
     (* (deserialize-byte stream) #x10000)
     (* (deserialize-byte stream) #x1000000)))

(defun deserialize-byte-48 (stream)
  (+ (deserialize-byte-24 stream)
     (* (deserialize-byte-24 stream) #x1000000)))

(defun deserialize-byte-64 (stream)
  (+ (deserialize-byte-32 stream)
     (* (deserialize-byte-32 stream) #x100000000)))

(defmethod serialize ((obj integer) stream)
  ;; Serialize integers with least-significant bytes first.
  (cond ((zerop obj) (serialize-marker +zero+ stream))
        ((= obj 1) (serialize-marker +one+ stream))
        ((= obj -1) (serialize-marker +minus-one+ stream))
        ((= obj 2) (serialize-marker +two+ stream))
        (t (let* ((positive-p (>= obj 0) )
                  (unsigned (abs obj))
                  (nr-octets (nr-octets unsigned)))
             (serialize-integer positive-p unsigned nr-octets stream)))))

(defun serialize-integer (positive-p unsigned nr-octets stream)
  (case nr-octets
    (1 (serialize-marker (if positive-p +positive-byte-8+ +negative-byte-8+)
                         stream)
       (serialize-byte unsigned stream))
    (2 (serialize-marker (if positive-p +positive-byte-16+
+negative-byte-16+)
                         stream)
       (serialize-byte-16 unsigned stream))
    (3 (serialize-marker (if positive-p +positive-byte-24+
+negative-byte-24+)
                         stream)
       (serialize-byte-24 unsigned stream))
    (4 (serialize-marker (if positive-p +positive-byte-32+
+negative-byte-32+)
                         stream)
       (serialize-byte-32 unsigned stream))
    ((5 6)
     (serialize-marker (if positive-p +positive-byte-48+ +negative-byte-48+)
                       stream)
     (serialize-byte-48 unsigned stream))
    ((7 8)
     (serialize-marker (if positive-p +positive-byte-64+ +negative-byte-64+)
                       stream)
     (serialize-byte-64 unsigned stream))
    (otherwise
     (let ((nr-bits (* 8 nr-octets)))
       (serialize-marker (if positive-p +positive-integer+
+negative-integer+)
                         stream)
       (serialize nr-octets stream)
       (loop for position from (- nr-bits 8) downto 0 by 8
             do (serialize-byte (ldb (byte 8 position) unsigned)
stream))))))

(defun nr-octets (n)
  (ceiling (integer-length n) 8))

(defmethod deserialize-contents ((marker (eql +minus-one+)) stream)
  (declare (ignore stream))
  -1)

(defmethod deserialize-contents ((marker (eql +zero+)) stream)
  (declare (ignore stream))
  0)

(defmethod deserialize-contents ((marker (eql +one+)) stream)
  (declare (ignore stream))
  1)

(defmethod deserialize-contents ((marker (eql +two+)) stream)
  (declare (ignore stream))
  2)

(defmethod deserialize-contents ((marker (eql +positive-byte-8+)) stream)
  (deserialize-byte stream))

(defmethod deserialize-contents ((marker (eql +negative-byte-8+)) stream)
  (- (deserialize-byte stream)))

(defmethod deserialize-contents ((marker (eql +positive-byte-16+)) stream)
  (deserialize-byte-16 stream))

(defmethod deserialize-contents ((marker (eql +negative-byte-16+)) stream)
  (- (deserialize-byte-16 stream)))

(defmethod deserialize-contents ((marker (eql +positive-byte-24+)) stream)
  (deserialize-byte-24 stream))

(defmethod deserialize-contents ((marker (eql +negative-byte-24+)) stream)
  (- (deserialize-byte-24 stream)))

(defmethod deserialize-contents ((marker (eql +positive-byte-32+)) stream)
  (deserialize-byte-32 stream))

(defmethod deserialize-contents ((marker (eql +negative-byte-32+)) stream)
  (- (deserialize-byte-32 stream)))

(defmethod deserialize-contents ((marker (eql +positive-byte-48+)) stream)
  (deserialize-byte-48 stream))

(defmethod deserialize-contents ((marker (eql +negative-byte-48+)) stream)
  (- (deserialize-byte-48 stream)))

(defmethod deserialize-contents ((marker (eql +positive-byte-64+)) stream)
  (deserialize-byte-64 stream))

(defmethod deserialize-contents ((marker (eql +negative-byte-64+)) stream)
  (- (deserialize-byte-64 stream)))

(defmethod deserialize-contents ((marker (eql +positive-integer+)) stream)
  (let ((nr-bytes (deserialize stream)))
    (assert (integerp nr-bytes))
    (let ((result 0))
      (loop for i below nr-bytes
            do (setf result (+ (ash result 8) (deserialize-byte stream))))
      result)))

(defmethod deserialize-contents ((marker (eql +negative-integer+)) stream)
  (- (deserialize-contents +positive-integer+ stream)))

Robert L. Read wrote:
> On Thu, 2006-08-24 at 12:32 +0200, Petter Egesund wrote:
>> Some thoughts;
>>
>> I am getting a little closer, passing all 110 test except 6. Storing of
>> stings seems to work now. What does not work is storing of integers, and
>> thereby rationals, bignums, and so on.
>>     
> Thank you!
>> This will not work without some larger rewriting. As far as I can see
>> there are two options:
>>
>> 1. Force serializing of integers to 32-bit, not dependent on the
>> underlying OS. This seems to be the easy way out. What must be rewritten
>> is:
>>
>> - storing of fixnums (which can be larger then 32-bit on 64-bit os).
>> Fixnums must be stored the same way as integers are stored today.
>> - storing of bignums, as reading directly from the bignum using
>> %bignum-ref can return 32-bit or 64-bit on different platsforms. A
>> rewrite of storing bignums can be done on two ways, either by testing on
>> platform, or by doing the work ourself, by doing some bitwise operations
>> (something like shifting bytes doing some bitwise and, and then repeat
>> until 0) I would wote for the last option, as this makes the codebase
>> less platform-dependent (might be a small speed-penalty though (I don't
>> know)).
>>
>> 2. We can start storing integers as 64-bits on 64-bits platform.
>> Probably means a lot of rewring? This will probably give the fastest
>> benchmark-results?
>>
>> I would personally go for suggestion 1, I do not think the differnce in
>> speed will visible at all? Any opinions?
>>     
> IMHO, your solution #1 is an OK patch as a stopgap.  However, we
> definitely eventually want
> to take full advantage of a 64 bit architecture --- this will be the
> wave of the future, and there is
> no serious reason why we can't rewrite it.  Having said that, I'm in
> no position to do it for a while,
> so unless you or someone else wants to rewrite it, I propose the
> following:
>
> Let's produce a patch that works as much as possible on the 64 bit
> architecture, and document
> the defficiency that on that architecture, for example, fixnums don't
> work.  As long as this
> continues to pass all tests on 32-bit architectures, we can release it
> as 0.6.1, for example.
>
> When Ian and I work on 0.7.0, if you or someone else will help us test
> on a 64-bit architecture, then
> we will try to make sure that we fully support things in that release.
>
> I'll write up the documentation based on what we decide.
>
>> Cheers,
>>
>> Petter Egesund
>>