From kom at narihara-lab.jp Mon Aug 3 20:14:28 2009 From: kom at narihara-lab.jp (Hiroyuki Komatsu) Date: Tue, 04 Aug 2009 05:14:28 +0900 (JST) Subject: [elephant-devel] UTF seriazer/desiriali patch Message-ID: <20090804.051428.71108054.kom@narihara-lab.jp> Sorry, I'm not familiar to English. BDB btree stores utf16/utf32 string into illegal sort order. There is 2 problems in string serializer: UTF serializers serialize into big endian UTF32 compator in libberkeley-db.c does not work correctly attached patch fix these problems. -------------- next part -------------- diff -rN -u old-elephant/src/db-bdb/libberkeley-db.c new-elephant/src/db-bdb/libberkeley-db.c --- old-elephant/src/db-bdb/libberkeley-db.c 2009-08-04 04:34:01.000000000 +0900 +++ new-elephant/src/db-bdb/libberkeley-db.c 2009-08-04 04:34:01.000000000 +0900 @@ -1122,7 +1122,7 @@ /***** printf("Doing a 32-bit compare\n"); *****/ - return wcs_cmp((wchar_t*)ad+5+offset, read_int32(ad+offset, 1), (wchar_t*)bd+5+offset, read_int32(bd+offset, 1)); + return wcs_cmp((wchar_t*)(ad+5+offset), read_int32(ad+offset, 1), (wchar_t*)(bd+5+offset), read_int32(bd+offset, 1)); default: /***** printf("Doing a lex compare\n"); @@ -1313,7 +1313,7 @@ int min, sizediff, diff; sizediff = length1 - length2; min = sizediff > 0 ? length2 : length1; - diff = wcsncmp(a, b, min /4); + diff = wcsncmp(a, b, min); if (diff == 0) return sizediff; return diff; } diff -rN -u old-elephant/src/elephant/unicode.lisp new-elephant/src/elephant/unicode.lisp --- old-elephant/src/elephant/unicode.lisp 2009-08-04 04:34:01.000000000 +0900 +++ new-elephant/src/elephant/unicode.lisp 2009-08-04 04:34:01.000000000 +0900 @@ -145,10 +145,10 @@ (loop for i fixnum from 0 below characters do (let ((code (char-code (funcall char string i)))) (when (> code #xFFFF) (fail)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 2) size)) + (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 2) size 1)) ;; (coerce (ldb (byte 8 8) code) '(signed 8))) (ldb (byte 8 8) code)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 2) size 1)) + (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 2) size 0)) ;; (coerce (ldb (byte 8 0) code) '(signed 8)))))) (ldb (byte 8 0) code)))) (incf size (* characters 2)) @@ -174,13 +174,13 @@ (loop for i fixnum from 0 below characters do (let ((code (char-code (funcall char string i)))) (when (> code #x10FFFF) (error "Invalid unicode code type")) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 0)) + (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 3)) (ldb (byte 8 24) code)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 1)) - (ldb (byte 8 16) code)) (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 2)) + (ldb (byte 8 16) code)) + (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 1)) (ldb (byte 8 8) code)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 3)) + (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 0)) (ldb (byte 8 0) code))))) (incf size (* characters 4)) t))) @@ -274,8 +274,8 @@ (assert (subtypep (type-of string) 'simple-string)) (assert (compatible-unicode-support-p :utf16le)) (loop for i fixnum from 0 below length do - (setf code (dpb (next-byte 0) (byte 8 8) 0)) - (setf code (dpb (next-byte 1) (byte 8 0) code)) + (setf code (dpb (next-byte 1) (byte 8 8) 0)) + (setf code (dpb (next-byte 0) (byte 8 0) code)) (setf (schar string i) (code-char code))) (incf (elephant-memutil::buffer-stream-position bstream) (* length 2))) @@ -294,10 +294,10 @@ (assert (subtypep (type-of string) 'simple-string)) (assert (compatible-unicode-support-p :utf32le)) (loop for i fixnum from 0 below length do - (setf code (dpb (next-byte 0) (byte 8 24) 0)) - (setf code (dpb (next-byte 1) (byte 8 16) code)) - (setf code (dpb (next-byte 2) (byte 8 8) code)) - (setf code (dpb (next-byte 3) (byte 8 0) code)) + (setf code (dpb (next-byte 3) (byte 8 24) 0)) + (setf code (dpb (next-byte 2) (byte 8 16) code)) + (setf code (dpb (next-byte 1) (byte 8 8) code)) + (setf code (dpb (next-byte 0) (byte 8 0) code)) (setf (char string i) (code-char code))) (incf (elephant-memutil::buffer-stream-position bstream) (* length 4)) From sky at viridian-project.de Tue Aug 4 06:58:03 2009 From: sky at viridian-project.de (Leslie P. Polzer) Date: Tue, 4 Aug 2009 08:58:03 +0200 (CEST) Subject: [elephant-devel] UTF seriazer/desiriali patch In-Reply-To: <20090804.051428.71108054.kom@narihara-lab.jp> References: <20090804.051428.71108054.kom@narihara-lab.jp> Message-ID: Hiroyuki Komatsu wrote: > Sorry, I'm not familiar to English. > > BDB btree stores utf16/utf32 string into illegal sort order. > > There is 2 problems in string serializer: > UTF serializers serialize into big endian > UTF32 compator in libberkeley-db.c does not work correctly > > attached patch fix these problems. Thank you! Two questions: Could you also add new tests that show the problem? Is the change compatible with existing incorrectly sorted databases? Leslie -- http://www.linkedin.com/in/polzer From sky at viridian-project.de Tue Aug 4 18:28:45 2009 From: sky at viridian-project.de (Leslie P. Polzer) Date: Tue, 4 Aug 2009 20:28:45 +0200 (CEST) Subject: [elephant-devel] Deferred schema sync Message-ID: (open-store *testbdb-spec*) (defpclass foobar () ()) (make-instance 'foobar) (close-store) (defpclass foobar () ((slot :accessor slot :initform nil)) (:index t)) (open-store *testbdb-spec*) (describe (car (get-instances-by-class 'foobar))) # [standard-object] Slots with :DATABASE allocation: SLOT = # Slots with :INSTANCE allocation: OID = 2 SPEC = (:BDB "/home/sky/mystic/packages/elephant-1.0/tests/testdb/") (defpclass foobar () ((slot :accessor slot :initform nil)) (:index t)) Synchronizing FOOBAR in ... # Bottom line: class schemas changed while a store is closed won't sync when that store is opened later. There are several ways to approach this. We could just sync all db classes when a store is opened or take note of which redefined classes have synced to which stores. Opinions? Leslie -- http://www.linkedin.com/in/polzer From kom at narihara-lab.jp Wed Aug 5 03:21:06 2009 From: kom at narihara-lab.jp (Hiroyuki Komatsu) Date: Wed, 05 Aug 2009 12:21:06 +0900 (JST) Subject: [elephant-devel] UTF seriazer/desiriali patch In-Reply-To: References: <20090804.051428.71108054.kom@narihara-lab.jp> Message-ID: <20090805.122106.104051337.kom@narihara-lab.jp> From: "Leslie P. Polzer" Date: Tue, 4 Aug 2009 08:58:03 +0200 (CEST) > Could you also add new tests that show the problem? Below listing is test code, uses attached utf-8 encoded file. File was constructed by six lines, have below format. #\:UTF-(8|16|32) char-code:)> -------------------- >8 -- >8 -------------------- (require :elephant) (use-package :elephant) (defpclass c () ((l :initarg :l :accessor l :index t))) (defun test (path) (with-open-store (`(:bdb ,(ensure-directories-exist "/var/tmp/test-db/"))) (print 'x) (with-open-file (f path :external-format :utf-8) (print 'y) (loop for line = (read-line f nil nil) while line do (print (make-instance 'c :l line))) (let* ((un-sorted (mapcar #'l (get-instances-by-range 'c 'l nil nil))) (sorted (sort (copy-list un-sorted) #'string<))) (if (equal un-sorted sorted) (print "pass") (print "error")))))) (test #p"path-to-attached-file") -------------------- >8 -- >8 -------------------- > Is the change compatible with existing incorrectly sorted > databases? In my experience; GET-ININSTACE-BY-VALUE and GET-ININSTACES-BY-VALUE are works correctly. GET-INSTACES-BY-RANGE does not work correctly with incorrectly sorted data. -------------- next part -------------- a:UTF-8 char-code:61 x:UTF-8 char-code:78 ?:UTF-16 char-code:3042 ?:UTF-16 char-code:6F22 ?:UTF-32 char-code:2A38C ?:UTF-32 char-code:2A437 From kom at narihara-lab.jp Wed Aug 5 22:34:20 2009 From: kom at narihara-lab.jp (Hiroyuki Komatsu) Date: Thu, 06 Aug 2009 07:34:20 +0900 (JST) Subject: [elephant-devel] UTF seriazer/desiriali patch In-Reply-To: <20090805.122106.104051337.kom@narihara-lab.jp> References: <20090804.051428.71108054.kom@narihara-lab.jp> <20090805.122106.104051337.kom@narihara-lab.jp> Message-ID: <20090806.073420.208935360.kom@narihara-lab.jp> From: Hiroyuki Komatsu Date: Wed, 05 Aug 2009 12:21:06 +0900 (JST) > In my experience; > GET-ININSTACE-BY-VALUE and GET-ININSTACES-BY-VALUE are > works correctly. > > GET-INSTACES-BY-RANGE does not work correctly with > incorrectly sorted data. These are my mistake. I have re-test old db with my patch. Any UTF-16/UTF-32 string seems broken in incorrectly sorted data with my patch. From kom at narihara-lab.jp Sat Aug 8 02:15:46 2009 From: kom at narihara-lab.jp (Hiroyuki Komatsu) Date: Sat, 08 Aug 2009 11:15:46 +0900 (JST) Subject: [elephant-devel] revised UTF seriazer/desirializer patch Message-ID: <20090808.111546.193724442.kom@narihara-lab.jp> This patch does these things; o Maybe, big endian machines are nothing affected by this patch. I do not have any big endian machine. o little endian machines; + UTF strings are serialized into UTF16le or UTF32le with BOM + deserializers are test existency of BOM and choice deserialize from big endian or little endian. + comparators in libberkeley-db are also test BOM, create temporally buffer when the string is serialize into big endian. o old store image preserved o sort order is corrected when migrate old store to new store. I did not test any other backing store. -------------- next part -------------- diff -rN -u old-elephant/src/db-bdb/libberkeley-db.c new-elephant/src/db-bdb/libberkeley-db.c --- old-elephant/src/db-bdb/libberkeley-db.c 2009-08-08 10:51:25.000000000 +0900 +++ new-elephant/src/db-bdb/libberkeley-db.c 2009-08-08 10:51:25.000000000 +0900 @@ -25,6 +25,7 @@ #include #include #include +#include /* Some utility stuff used to be here but has been placed in libmemutil.c */ @@ -920,7 +921,7 @@ case S1_UCS4_SYMBOL: case S1_UCS4_STRING: case S1_UCS4_PATHNAME: - return wcs_cmp((wchar_t*)ad+9, read_int(ad, 5), (wchar_t*)bd+9, read_int(bd, 5)); + return wcs_cmp((wchar_t*)(ad+9), read_int(ad, 5), (wchar_t*)(bd+9), read_int(bd, 5)); default: return lex_cmp(ad+5, (a->size)-5, bd+5, (b->size)-5); } @@ -1130,7 +1131,7 @@ /***** printf("Doing a 32-bit compare\n"); *****/ - return wcs_cmp((wchar_t*)ad+5+offset, read_int32(ad+offset, 1), (wchar_t*)bd+5+offset, read_int32(bd+offset, 1)); + return wcs_cmp((wchar_t*)(ad+5+offset), read_int32(ad+offset, 1), (wchar_t*)(bd+5+offset), read_int32(bd+offset, 1)); default: /***** printf("Doing a lex compare\n"); @@ -1306,6 +1307,18 @@ #define strncasecmp _strnicmp typedef unsigned short uint16_t; #endif +#define ENDIAN_BIG 0 +#define ENDIAN_LITTLE 1 + +int machine_endian() +{ + uint32_t x = 0x01020304; + uint8_t *xp = (uint8_t *)&x; + if (*xp == 0x01) + return ENDIAN_BIG; + else + return ENDIAN_LITTLE; +} int case_cmp(const unsigned char *a, int32_t length1, const unsigned char *b, int32_t length2) { int min, sizediff, diff; @@ -1316,12 +1329,72 @@ return diff; } +wchar_t utf32_char(const wchar_t *c) +{ + uint8_t *cp = (uint8_t *)c; + return (cp[3] << 24) | (cp[2] << 16) | (cp[1] << 8) | cp[0]; +} + +wchar_t *swap32_string(const wchar_t *str, int32_t length) +{ + int i; + wchar_t *swap_buff = malloc(4 * length); + for (i = 0; i < length; ++i) { + uint8_t *sp = (uint8_t *)&str[i], + *dp = (uint8_t *)&swap_buff[i]; + sp[0] = dp[3]; + sp[1] = dp[2]; + sp[2] = dp[1]; + sp[3] = dp[0]; + } + return swap_buff; +} + +#if 0 +void dump_string(int size, uint8_t *str, int32_t length, char *prefix) +{ + int i; + printf("%s: ", prefix); + for (i = 0; i < length * size; i += 2) + printf("%02x%02x ", str[i], str[i + 1]); + printf("\n"); +} +#endif + int wcs_cmp(const wchar_t *a, int32_t length1, const wchar_t *b, int32_t length2) { int min, sizediff, diff; + wchar_t *swap_a = NULL, *swap_b = NULL; + +#if 0 + dump_string(4, a, length1, "A"); + dump_string(4, b, length2, "B"); +#endif + if (machine_endian() == ENDIAN_LITTLE) { + if (utf32_char(a) != 0xfffe) {/* BIG-ENDIAN */ + swap_a = swap32_string(a, length1); + if (swap_a) + a = swap_a; + } else { /* LITTLE-ENDIAN */ + ++a; + --length1; + } + if (utf32_char(b) != 0xfffe) {/* BIG-ENDIAN */ + swap_b = swap32_string(b, length2); + if (swap_b) + b = swap_b; + } else { /* LITTLE-ENDIAN */ + ++b; + --length2; + } + } sizediff = length1 - length2; min = sizediff > 0 ? length2 : length1; - diff = wcsncmp(a, b, min /4); + diff = wcsncmp(a, b, min); + if (swap_a) + free(swap_a); + if (swap_b) + free(swap_b); if (diff == 0) return sizediff; return diff; } @@ -1351,6 +1424,22 @@ #define UTF_IS_LEAD(c) (((c)&0xfffffc00)==0xd800) #define UTF_IS_TRAIL(c) (((c)&0xfffffc00)==0xdc00) +uint16_t utf16_char(const uint8_t *str) +{ + return (str[1] << 8) | str[0]; +} + +uint8_t *swap16_string(const uint8_t *src, int32_t length) +{ + int i; + uint8_t *swap_buff = malloc(2 * length); + for (i = 0; i < length * 2; i += 2) { + swap_buff[i + 0] = src[i + 1]; + swap_buff[i + 1] = src[i + 1]; + } + return swap_buff; +} + /* compare UTF-16 strings */ /* memcmp/UnicodeString style, both length-specified */ /* don't assume byte-aligned! */ @@ -1359,7 +1448,29 @@ const unsigned char *start1, *start2, *limit1, *limit2; UChar c1, c2; int32_t lengthResult; - + uint8_t *swap_s1 = NULL, *swap_s2 = NULL; +#if 0 + dump_string(2, s1, length1, "S1"); + dump_string(2, s2, length2, "S2"); +#endif + if (machine_endian() == ENDIAN_LITTLE) { + if (utf16_char(s1) != 0xfffe) {/* BIG-ENDIAN */ + swap_s1 = swap16_string(s1, length1); + if (swap_s1) + s1 = swap_s1; + } else { /* LITTLE-ENDIAN */ + s1 += 2; + length1 -= 1; + } + if (utf16_char(s2) != 0xfffe) {/* BIG-ENDIAN */ + swap_s2 = swap16_string(s2, length2); + if (swap_s2) + s2 = swap_s2; + } else { /* LITTLE-ENDIAN */ + s2 += 2; + length2 -= 1; + } + } if(length1 (char-code (char string 0)) #xFFFF)) - (serialize-to-utf32le string bstream)) + (serialize-to-utf32 string bstream)) ;; Accelerate the common case where a character set is not Latin-1 ((and (not (equal "" string)) (> (char-code (char string 0)) #xFF)) - (or (serialize-to-utf16le string bstream) - (serialize-to-utf32le string bstream))) + (or (serialize-to-utf16 string bstream) + (serialize-to-utf32 string bstream))) ;; Actually code pages > 0 are rare; so we can pay an extra cost (t (or (serialize-to-utf8 string bstream) - (serialize-to-utf16le string bstream) - (serialize-to-utf32le string bstream))))) + (serialize-to-utf16 string bstream) + (serialize-to-utf32 string bstream))))) (defun serialize-to-utf8 (string bstream) "Standard serialization" (declare (type buffer-stream bstream) - (type string string)) + (type simple-string string)) (elephant-memutil::with-struct-slots ((buffer buffer-stream-buffer) (size buffer-stream-size) (allocated buffer-stream-length)) @@ -117,73 +117,105 @@ (setf (buffer-stream-size bstream) needed) (succeed)))))) -(defun serialize-to-utf16le (string bstream) - "Serialize to utf16le compliant format unless contains code pages > 0" +(defvar *machine-endian* + (let* ((bstream (make-buffer-stream)) + (buffer (buffer-stream-buffer bstream)) + (size (buffer-stream-size bstream))) + (buffer-write-int32 #x01020304 bstream) + (let ((byte-image + (loop for i from 0 to 3 + collect (uffi:deref-array buffer '(:array :unsigned-char) + (the fixnum (+ size i)))))) + (cond ((equal byte-image '(4 3 2 1)) 'endian-little) + ((equal byte-image '(1 2 3 4)) 'endian-big) + (t 'unknown))))) + +(defun machine-endian () + *machine-endian*) + +(defun write-utf-char-to-buffer (char char-index char-size buffer base endian) + (declare (type (signed-byte 31) char-index) + (type (integer 1 4) char-size)) + (loop for i from 0 below char-size do + (setf (uffi:deref-array buffer '(:array :unsigned-char) + (+ (* char-index char-size) base + (the (integer 0 3) + (if (eq endian 'endian-little) + i + (- char-size 1 i))))) + (ldb (byte 8 (* 8 i)) char)))) + +(defun serialize-to-utf16 (string bstream) + "Serialize to utf16 compliant format unless contains code pages > 0" (declare (type buffer-stream bstream) (type string string)) + (progn + (format *debug-io* "LSIP-ENTER: ") + (loop for i from 0 below (length string) + do (format *debug-io* "~4,'0X " (char-code (char string i)))) + (format *debug-io* "~%")) (elephant-memutil::with-struct-slots ((buffer buffer-stream-buffer) (size buffer-stream-size) (allocated buffer-stream-length)) bstream (let* ((saved-size (buffer-stream-size bstream)) (saved-pos (elephant-memutil::buffer-stream-position bstream)) - (characters (length string))) + (characters (length string)) + (endian (machine-endian)) + (bom-length (if (eq endian 'endian-big) 0 1))) (labels ((fail () (setf (buffer-stream-size bstream) saved-size) (setf (elephant-memutil::buffer-stream-position bstream) saved-pos) - (return-from serialize-to-utf16le nil)) + (return-from serialize-to-utf16 nil)) (succeed () - (return-from serialize-to-utf16le t))) + (return-from serialize-to-utf16 t))) (buffer-write-byte +utf16-string+ bstream) - (buffer-write-int32 characters bstream) - (let ((needed (+ size (* characters 2))) - (char (etypecase string + (buffer-write-int32 (+ characters bom-length) bstream) + (let ((needed (+ size (* (+ characters bom-length) 2))) + (char (etypecase string (simple-string #'schar) (string #'char)))) (when (> needed allocated) (resize-buffer-stream bstream needed)) - (loop for i fixnum from 0 below characters do - (let ((code (char-code (funcall char string i)))) - (when (> code #xFFFF) (fail)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 2) size)) - ;; (coerce (ldb (byte 8 8) code) '(signed 8))) - (ldb (byte 8 8) code)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 2) size 1)) - ;; (coerce (ldb (byte 8 0) code) '(signed 8)))))) - (ldb (byte 8 0) code)))) + (when (eq endian 'endian-little) + (write-utf-char-to-buffer #xfffe 0 2 buffer size endian) + (incf size 2)) + (loop for i fixnum from 0 below characters + do (let ((code (char-code (funcall char string i)))) + (when (> code #xFFFF) (fail)) + (write-utf-char-to-buffer code i 2 buffer size endian))) (incf size (* characters 2)) (succeed)))))) -(defun serialize-to-utf32le (string bstream) +(defun serialize-to-utf32 (string bstream) "Serialize to utf32 compliant format unless contains code pages > 0" - (declare (type buffer-stream bstream) - (type string string)) + (declare (type buffer-stream bstream) + (type string string)) (elephant-memutil::with-struct-slots ((buffer buffer-stream-buffer) (size buffer-stream-size) (allocated buffer-stream-length)) bstream - (let* ((characters (length string))) - (buffer-write-byte +utf32-string+ bstream) - (buffer-write-int32 characters bstream) - (let ((needed (+ size (* 4 characters))) - (char (etypecase string - (simple-string #'schar) - (string #'char)))) - (when (> needed allocated) - (resize-buffer-stream bstream needed)) - (loop for i fixnum from 0 below characters do - (let ((code (char-code (funcall char string i)))) - (when (> code #x10FFFF) (error "Invalid unicode code type")) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 0)) - (ldb (byte 8 24) code)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 1)) - (ldb (byte 8 16) code)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 2)) - (ldb (byte 8 8) code)) - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* i 4) size 3)) - (ldb (byte 8 0) code))))) + (let* ((characters (length string)) + (endian (machine-endian)) + (bom-length (if (eq endian 'endian-big) 0 1))) + (buffer-write-byte +utf32-string+ bstream) + (buffer-write-int32 (+ characters bom-length) bstream) + (let ((needed (+ size (* 4 (+ characters bom-length)))) + (char (etypecase string + (simple-string #'schar) + (string #'char)))) + (when (> needed allocated) + (resize-buffer-stream bstream needed)) + (when (eq endian 'endian-little) + (write-utf-char-to-buffer #xfffe 0 4 buffer size endian) + (incf size 4)) + (loop for i fixnum from 0 below characters + do (let ((code (char-code (funcall char string i)))) + (when (> code #x10FFFF) + (error "Invalid unicode code type")) + (write-utf-char-to-buffer code i 4 buffer size endian))) (incf size (* characters 4)) - t))) + t)))) ;; ;; Deserialization of Strings @@ -260,50 +292,67 @@ (+ pos i))))))) string)))) +(defun read-utf-char-from-buffer (char-index char-size buffer position endian) + (declare (type (integer 1 4) char-size) + (type (signed-byte 31) char-index) + (type fixnum position)) + (let ((code 0)) + (macrolet ((next-byte (offset) + `(uffi:deref-array buffer + '(:array :unsigned-byte) + (+ (* char-index 2) position ,offset)))) + (loop for i from 0 below char-size + do (setf code (dpb (next-byte (if (eq endian 'endian-little) + i (- char-size i 1))) + (byte 8 (* i 8)) code))) + code))) + (defmethod deserialize-string ((type (eql :utf16le)) bstream &optional temp-string) "All returned strings are simple-strings for, uh, simplicity" (declare (type buffer-stream bstream)) (let* ((length (buffer-read-int32 bstream)) (string (or temp-string (make-string length :element-type 'character))) (pos (elephant-memutil::buffer-stream-position bstream)) - (code 0)) - (macrolet ((next-byte (offset) - `(uffi:deref-array (buffer-stream-buffer bstream) '(:array :unsigned-byte) (+ (* i 2) pos ,offset)))) - (declare (type simple-string string) - (type fixnum length pos code)) - (assert (subtypep (type-of string) 'simple-string)) - (assert (compatible-unicode-support-p :utf16le)) - (loop for i fixnum from 0 below length do - (setf code (dpb (next-byte 0) (byte 8 8) 0)) - (setf code (dpb (next-byte 1) (byte 8 0) code)) - (setf (schar string i) (code-char code))) - (incf (elephant-memutil::buffer-stream-position bstream) - (* length 2))) - (the simple-string string))) + (code 0) (endian 'endian-big)) + (declare (type simple-string string) + (type fixnum length pos code)) + (assert (subtypep (type-of string) 'simple-string)) + (assert (compatible-unicode-support-p :utf16le)) + (when (= (read-utf-char-from-buffer 0 2 (buffer-stream-buffer bstream) + pos (machine-endian)) #xfffe) + (setf endian 'endian-little) + (decf length) + (incf pos 2) + (incf (elephant-memutil::buffer-stream-position bstream) 2)) + (loop for i fixnum from 0 below length + do (setf code + (read-utf-char-from-buffer i 2 (buffer-stream-buffer bstream) + pos endian)) + (setf (schar string i) (code-char code))) + (incf (elephant-memutil::buffer-stream-position bstream) + (* length 2)) + (the simple-string (subseq string 0 length)))) (defmethod deserialize-string ((type (eql :utf32le)) bstream &optional temp-string) (declare (type buffer-stream bstream)) - (macrolet ((next-byte (offset) - `(uffi:deref-array (buffer-stream-buffer bstream) '(:array :unsigned-byte) (+ (* i 4) pos ,offset)))) (let* ((length (buffer-read-int32 bstream)) (string (or temp-string (make-string length :element-type 'character))) (pos (elephant-memutil::buffer-stream-position bstream)) - (code 0)) + (code 0) (endian 'endian-big)) (declare (type string string) (type fixnum length pos code)) (assert (subtypep (type-of string) 'simple-string)) (assert (compatible-unicode-support-p :utf32le)) + (when (= (read-utf-char-from-buffer 0 4 (buffer-stream-buffer bstream) + pos (machine-endian)) #xfffe) + (setf endian 'endian-little) + (decf length) + (incf pos 4) + (incf (elephant-memutil::buffer-stream-position bstream) 4)) (loop for i fixnum from 0 below length do - (setf code (dpb (next-byte 0) (byte 8 24) 0)) - (setf code (dpb (next-byte 1) (byte 8 16) code)) - (setf code (dpb (next-byte 2) (byte 8 8) code)) - (setf code (dpb (next-byte 3) (byte 8 0) code)) - (setf (char string i) (code-char code))) + (setf code (read-utf-char-from-buffer i 4 (buffer-stream-buffer bstream) + pos endian)) + (setf (char string i) (code-char code))) (incf (elephant-memutil::buffer-stream-position bstream) (* length 4)) - (the simple-string string)))) - - - - - + (the simple-string (subseq string 0 length)))) From reddaly at gmail.com Thu Aug 13 19:07:59 2009 From: reddaly at gmail.com (Red Daly) Date: Thu, 13 Aug 2009 12:07:59 -0700 Subject: [elephant-devel] Upgrading from version .9ish Message-ID: Hi, I have an elephant database from either the 0.8 or 0.9 era of elephant and I was wondering what I can do to properly upgrade. Using the latest version and upgrading bdb from 4.5 seems to have wiped my data (which is backed up). Any suggestions? Thanks, Red -------------- next part -------------- An HTML attachment was scrubbed... URL: From henrik at evahjelte.com Fri Aug 14 11:04:09 2009 From: henrik at evahjelte.com (Henrik Hjelte) Date: Fri, 14 Aug 2009 13:04:09 +0200 Subject: [elephant-devel] Upgrading from version .9ish In-Reply-To: References: Message-ID: <50e8e4f60908140404u48324ca6re09dffaff1d3fd82@mail.gmail.com> On Thu, Aug 13, 2009 at 9:07 PM, Red Daly wrote: > Hi, > > I have an elephant database from either the 0.8 or 0.9 era of elephant and I > was wondering what I can do to properly upgrade.? Using the latest version > and upgrading bdb from 4.5 seems to have wiped my data (which is backed > up).? Any suggestions? You can probably use gp-export which is intended to dump and restore elephant databases. See this discussion: http://www.mail-archive.com/elephant-devel at common-lisp.net/msg02179.html darcs get http://common-lisp.net/project/grand-prix/darcs/gp-export/ /Henrik From smanek at gmail.com Sat Aug 15 04:54:29 2009 From: smanek at gmail.com (Shaneal Manek) Date: Sat, 15 Aug 2009 00:54:29 -0400 Subject: [elephant-devel] How to create a derived index? Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I was wondering what the preferred way to add a derived index to a persistent class? The elephant:add-class-derived-index function doesn't seem to exist in new versions of elephant. Thanks, Shaneal -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkqGP04ACgkQhkBc25UmCW7+SwCfacOSkBz3FBNTVf6Mafj5LKeE mrYAnRAAyPQfeZK7NqTapTSYhqT+ToMm =s1NT -----END PGP SIGNATURE----- From eslick at media.mit.edu Sun Aug 16 03:50:46 2009 From: eslick at media.mit.edu (Ian Eslick) Date: Sat, 15 Aug 2009 20:50:46 -0700 Subject: [elephant-devel] revised UTF seriazer/desirializer patch In-Reply-To: <20090808.111546.193724442.kom@narihara-lab.jp> References: <20090808.111546.193724442.kom@narihara-lab.jp> Message-ID: <4F56E7E7-220B-4693-93D8-DFA83C9A9DFB@media.mit.edu> Thank you, this looks great. I'll review it and promote it in the next week or so unless Leslie beats me to it. Thank you, Ian On Aug 7, 2009, at 7:15 PM, Hiroyuki Komatsu wrote: > This patch does these things; > > o Maybe, big endian machines are nothing affected by this > patch. I do not have any big endian machine. > > o little endian machines; > + UTF strings are serialized into UTF16le or UTF32le with BOM > + deserializers are test existency of BOM and choice deserialize from > big endian or little endian. > + comparators in libberkeley-db are also test BOM, > create temporally buffer when the string is serialize into big > endian. > > o old store image preserved > o sort order is corrected when migrate old store to new store. > > I did not test any other backing store. > > diff -rN -u old-elephant/src/db-bdb/libberkeley-db.c new-elephant/ > src/db-bdb/libberkeley-db.c > --- old-elephant/src/db-bdb/libberkeley-db.c 2009-08-08 > 10:51:25.000000000 +0900 > +++ new-elephant/src/db-bdb/libberkeley-db.c 2009-08-08 > 10:51:25.000000000 +0900 > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > > /* Some utility stuff used to be here but has been placed in > libmemutil.c */ > @@ -920,7 +921,7 @@ > case S1_UCS4_SYMBOL: > case S1_UCS4_STRING: > case S1_UCS4_PATHNAME: > - return wcs_cmp((wchar_t*)ad+9, read_int(ad, 5), (wchar_t*)bd+9, > read_int(bd, 5)); > + return wcs_cmp((wchar_t*)(ad+9), read_int(ad, 5), (wchar_t*)(bd > +9), read_int(bd, 5)); > default: > return lex_cmp(ad+5, (a->size)-5, bd+5, (b->size)-5); > } > @@ -1130,7 +1131,7 @@ > /***** > printf("Doing a 32-bit compare\n"); > *****/ > - return wcs_cmp((wchar_t*)ad+5+offset, read_int32(ad+offset, 1), > (wchar_t*)bd+5+offset, read_int32(bd+offset, 1)); > + return wcs_cmp((wchar_t*)(ad+5+offset), read_int32(ad+offset, > 1), (wchar_t*)(bd+5+offset), read_int32(bd+offset, 1)); > default: > /***** > printf("Doing a lex compare\n"); > @@ -1306,6 +1307,18 @@ > #define strncasecmp _strnicmp > typedef unsigned short uint16_t; > #endif > +#define ENDIAN_BIG 0 > +#define ENDIAN_LITTLE 1 > + > +int machine_endian() > +{ > + uint32_t x = 0x01020304; > + uint8_t *xp = (uint8_t *)&x; > + if (*xp == 0x01) > + return ENDIAN_BIG; > + else > + return ENDIAN_LITTLE; > +} > > int case_cmp(const unsigned char *a, int32_t length1, const unsigned > char *b, int32_t length2) { > int min, sizediff, diff; > @@ -1316,12 +1329,72 @@ > return diff; > } > > +wchar_t utf32_char(const wchar_t *c) > +{ > + uint8_t *cp = (uint8_t *)c; > + return (cp[3] << 24) | (cp[2] << 16) | (cp[1] << 8) | cp[0]; > +} > + > +wchar_t *swap32_string(const wchar_t *str, int32_t length) > +{ > + int i; > + wchar_t *swap_buff = malloc(4 * length); > + for (i = 0; i < length; ++i) { > + uint8_t *sp = (uint8_t *)&str[i], > + *dp = (uint8_t *)&swap_buff[i]; > + sp[0] = dp[3]; > + sp[1] = dp[2]; > + sp[2] = dp[1]; > + sp[3] = dp[0]; > + } > + return swap_buff; > +} > + > +#if 0 > +void dump_string(int size, uint8_t *str, int32_t length, char > *prefix) > +{ > + int i; > + printf("%s: ", prefix); > + for (i = 0; i < length * size; i += 2) > + printf("%02x%02x ", str[i], str[i + 1]); > + printf("\n"); > +} > +#endif > + > int wcs_cmp(const wchar_t *a, int32_t length1, > const wchar_t *b, int32_t length2) { > int min, sizediff, diff; > + wchar_t *swap_a = NULL, *swap_b = NULL; > + > +#if 0 > + dump_string(4, a, length1, "A"); > + dump_string(4, b, length2, "B"); > +#endif > + if (machine_endian() == ENDIAN_LITTLE) { > + if (utf32_char(a) != 0xfffe) {/* BIG-ENDIAN */ > + swap_a = swap32_string(a, length1); > + if (swap_a) > + a = swap_a; > + } else { /* LITTLE-ENDIAN */ > + ++a; > + --length1; > + } > + if (utf32_char(b) != 0xfffe) {/* BIG-ENDIAN */ > + swap_b = swap32_string(b, length2); > + if (swap_b) > + b = swap_b; > + } else { /* LITTLE-ENDIAN */ > + ++b; > + --length2; > + } > + } > sizediff = length1 - length2; > min = sizediff > 0 ? length2 : length1; > - diff = wcsncmp(a, b, min /4); > + diff = wcsncmp(a, b, min); > + if (swap_a) > + free(swap_a); > + if (swap_b) > + free(swap_b); > if (diff == 0) return sizediff; > return diff; > } > @@ -1351,6 +1424,22 @@ > #define UTF_IS_LEAD(c) (((c)&0xfffffc00)==0xd800) > #define UTF_IS_TRAIL(c) (((c)&0xfffffc00)==0xdc00) > > +uint16_t utf16_char(const uint8_t *str) > +{ > + return (str[1] << 8) | str[0]; > +} > + > +uint8_t *swap16_string(const uint8_t *src, int32_t length) > +{ > + int i; > + uint8_t *swap_buff = malloc(2 * length); > + for (i = 0; i < length * 2; i += 2) { > + swap_buff[i + 0] = src[i + 1]; > + swap_buff[i + 1] = src[i + 1]; > + } > + return swap_buff; > +} > + > /* compare UTF-16 strings */ > /* memcmp/UnicodeString style, both length-specified */ > /* don't assume byte-aligned! */ > @@ -1359,7 +1448,29 @@ > const unsigned char *start1, *start2, *limit1, *limit2; > UChar c1, c2; > int32_t lengthResult; > - > + uint8_t *swap_s1 = NULL, *swap_s2 = NULL; > +#if 0 > + dump_string(2, s1, length1, "S1"); > + dump_string(2, s2, length2, "S2"); > +#endif > + if (machine_endian() == ENDIAN_LITTLE) { > + if (utf16_char(s1) != 0xfffe) {/* BIG-ENDIAN */ > + swap_s1 = swap16_string(s1, length1); > + if (swap_s1) > + s1 = swap_s1; > + } else { /* LITTLE-ENDIAN */ > + s1 += 2; > + length1 -= 1; > + } > + if (utf16_char(s2) != 0xfffe) {/* BIG-ENDIAN */ > + swap_s2 = swap16_string(s2, length2); > + if (swap_s2) > + s2 = swap_s2; > + } else { /* LITTLE-ENDIAN */ > + s2 += 2; > + length2 -= 1; > + } > + } > if(length1 lengthResult=-1; > limit1=s1+2*length1; > @@ -1415,6 +1526,10 @@ > }*/ > } > > + if (swap_s1) > + free(swap_s1); > + if (swap_s2) > + free(swap_s2); > return (int32_t)c1-(int32_t)c2; > } > > diff -rN -u old-elephant/src/elephant/unicode.lisp new-elephant/src/ > elephant/unicode.lisp > --- old-elephant/src/elephant/unicode.lisp 2009-08-08 > 10:51:25.000000000 +0900 > +++ new-elephant/src/elephant/unicode.lisp 2009-08-08 > 10:51:25.000000000 +0900 > @@ -41,7 +41,7 @@ > > ;; #+allegro > ;; (defun serialize-string (string bstream) > -;; (elephant-memutil::with-struct-slots ((buffer buffer-stream- > buffer) > +;; (e(lephant-memutil::with-struct-slots ((buffer buffer-stream- > buffer) > ;; (size buffer-stream-size) > ;; (allocated buffer-stream-length)) > ;; bstream > @@ -59,20 +59,20 @@ > (declare (type buffer-stream bstream) > (type string string)) > (cond ((and (not (equal "" string)) (> (char-code (char string 0)) > #xFFFF)) > - (serialize-to-utf32le string bstream)) > + (serialize-to-utf32 string bstream)) > ;; Accelerate the common case where a character set is not Latin-1 > ((and (not (equal "" string)) (> (char-code (char string 0)) #xFF)) > - (or (serialize-to-utf16le string bstream) > - (serialize-to-utf32le string bstream))) > + (or (serialize-to-utf16 string bstream) > + (serialize-to-utf32 string bstream))) > ;; Actually code pages > 0 are rare; so we can pay an extra cost > (t (or (serialize-to-utf8 string bstream) > - (serialize-to-utf16le string bstream) > - (serialize-to-utf32le string bstream))))) > + (serialize-to-utf16 string bstream) > + (serialize-to-utf32 string bstream))))) > > (defun serialize-to-utf8 (string bstream) > "Standard serialization" > (declare (type buffer-stream bstream) > - (type string string)) > + (type simple-string string)) > (elephant-memutil::with-struct-slots ((buffer buffer-stream-buffer) > (size buffer-stream-size) > (allocated buffer-stream-length)) > @@ -117,73 +117,105 @@ > (setf (buffer-stream-size bstream) needed) > (succeed)))))) > > -(defun serialize-to-utf16le (string bstream) > - "Serialize to utf16le compliant format unless contains code pages > > 0" > +(defvar *machine-endian* > + (let* ((bstream (make-buffer-stream)) > + (buffer (buffer-stream-buffer bstream)) > + (size (buffer-stream-size bstream))) > + (buffer-write-int32 #x01020304 bstream) > + (let ((byte-image > + (loop for i from 0 to 3 > + collect (uffi:deref-array buffer '(:array :unsigned-char) > + (the fixnum (+ size i)))))) > + (cond ((equal byte-image '(4 3 2 1)) 'endian-little) > + ((equal byte-image '(1 2 3 4)) 'endian-big) > + (t 'unknown))))) > + > +(defun machine-endian () > + *machine-endian*) > + > +(defun write-utf-char-to-buffer (char char-index char-size buffer > base endian) > + (declare (type (signed-byte 31) char-index) > + (type (integer 1 4) char-size)) > + (loop for i from 0 below char-size do > + (setf (uffi:deref-array buffer '(:array :unsigned-char) > + (+ (* char-index char-size) base > + (the (integer 0 3) > + (if (eq endian 'endian-little) > + i > + (- char-size 1 i))))) > + (ldb (byte 8 (* 8 i)) char)))) > + > +(defun serialize-to-utf16 (string bstream) > + "Serialize to utf16 compliant format unless contains code pages > > 0" > (declare (type buffer-stream bstream) > (type string string)) > + (progn > + (format *debug-io* "LSIP-ENTER: ") > + (loop for i from 0 below (length string) > + do (format *debug-io* "~4,'0X " (char-code (char string i)))) > + (format *debug-io* "~%")) > (elephant-memutil::with-struct-slots ((buffer buffer-stream-buffer) > (size buffer-stream-size) > (allocated buffer-stream-length)) > bstream > (let* ((saved-size (buffer-stream-size bstream)) > (saved-pos (elephant-memutil::buffer-stream-position bstream)) > - (characters (length string))) > + (characters (length string)) > + (endian (machine-endian)) > + (bom-length (if (eq endian 'endian-big) 0 1))) > (labels ((fail () > (setf (buffer-stream-size bstream) saved-size) > (setf (elephant-memutil::buffer-stream-position bstream) saved- > pos) > - (return-from serialize-to-utf16le nil)) > + (return-from serialize-to-utf16 nil)) > (succeed () > - (return-from serialize-to-utf16le t))) > + (return-from serialize-to-utf16 t))) > (buffer-write-byte +utf16-string+ bstream) > - (buffer-write-int32 characters bstream) > - (let ((needed (+ size (* characters 2))) > - (char (etypecase string > + (buffer-write-int32 (+ characters bom-length) bstream) > + (let ((needed (+ size (* (+ characters bom-length) 2))) > + (char (etypecase string > (simple-string #'schar) > (string #'char)))) > (when (> needed allocated) > (resize-buffer-stream bstream needed)) > - (loop for i fixnum from 0 below characters do > - (let ((code (char-code (funcall char string i)))) > - (when (> code #xFFFF) (fail)) > - (setf (uffi:deref-array buffer > '(:array :unsigned-char) (+ (* i 2) size)) > - ;; (coerce (ldb (byte 8 8) code) > '(signed 8))) > - (ldb (byte 8 8) code)) > - (setf (uffi:deref-array buffer > '(:array :unsigned-char) (+ (* i 2) size 1)) > - ;; (coerce (ldb (byte 8 0) code) > '(signed 8)))))) > - (ldb (byte 8 0) code)))) > + (when (eq endian 'endian-little) > + (write-utf-char-to-buffer #xfffe 0 2 buffer size endian) > + (incf size 2)) > + (loop for i fixnum from 0 below characters > + do (let ((code (char-code (funcall char string i)))) > + (when (> code #xFFFF) (fail)) > + (write-utf-char-to-buffer code i 2 buffer size endian))) > (incf size (* characters 2)) > (succeed)))))) > > -(defun serialize-to-utf32le (string bstream) > +(defun serialize-to-utf32 (string bstream) > "Serialize to utf32 compliant format unless contains code pages > 0" > - (declare (type buffer-stream bstream) > - (type string string)) > + (declare (type buffer-stream bstream) > + (type string string)) > (elephant-memutil::with-struct-slots ((buffer buffer-stream-buffer) > (size buffer-stream-size) > (allocated buffer-stream-length)) > bstream > - (let* ((characters (length string))) > - (buffer-write-byte +utf32-string+ bstream) > - (buffer-write-int32 characters bstream) > - (let ((needed (+ size (* 4 characters))) > - (char (etypecase string > - (simple-string #'schar) > - (string #'char)))) > - (when (> needed allocated) > - (resize-buffer-stream bstream needed)) > - (loop for i fixnum from 0 below characters do > - (let ((code (char-code (funcall char string i)))) > - (when (> code #x10FFFF) (error "Invalid unicode code type")) > - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* > i 4) size 0)) > - (ldb (byte 8 24) code)) > - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* > i 4) size 1)) > - (ldb (byte 8 16) code)) > - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* > i 4) size 2)) > - (ldb (byte 8 8) code)) > - (setf (uffi:deref-array buffer '(:array :unsigned-char) (+ (* > i 4) size 3)) > - (ldb (byte 8 0) code))))) > + (let* ((characters (length string)) > + (endian (machine-endian)) > + (bom-length (if (eq endian 'endian-big) 0 1))) > + (buffer-write-byte +utf32-string+ bstream) > + (buffer-write-int32 (+ characters bom-length) bstream) > + (let ((needed (+ size (* 4 (+ characters bom-length)))) > + (char (etypecase string > + (simple-string #'schar) > + (string #'char)))) > + (when (> needed allocated) > + (resize-buffer-stream bstream needed)) > + (when (eq endian 'endian-little) > + (write-utf-char-to-buffer #xfffe 0 4 buffer size endian) > + (incf size 4)) > + (loop for i fixnum from 0 below characters > + do (let ((code (char-code (funcall char string i)))) > + (when (> code #x10FFFF) > + (error "Invalid unicode code type")) > + (write-utf-char-to-buffer code i 4 buffer size endian))) > (incf size (* characters 4)) > - t))) > + t)))) > > ;; > ;; Deserialization of Strings > @@ -260,50 +292,67 @@ > (+ pos i))))))) > string)))) > > +(defun read-utf-char-from-buffer (char-index char-size buffer > position endian) > + (declare (type (integer 1 4) char-size) > + (type (signed-byte 31) char-index) > + (type fixnum position)) > + (let ((code 0)) > + (macrolet ((next-byte (offset) > + `(uffi:deref-array buffer > + '(:array :unsigned-byte) > + (+ (* char-index 2) position ,offset)))) > + (loop for i from 0 below char-size > + do (setf code (dpb (next-byte (if (eq endian 'endian-little) > + i (- char-size i 1))) > + (byte 8 (* i 8)) code))) > + code))) > + > (defmethod deserialize-string ((type (eql :utf16le)) bstream > &optional temp-string) > "All returned strings are simple-strings for, uh, simplicity" > (declare (type buffer-stream bstream)) > (let* ((length (buffer-read-int32 bstream)) > (string (or temp-string (make-string length :element-type > 'character))) > (pos (elephant-memutil::buffer-stream-position bstream)) > - (code 0)) > - (macrolet ((next-byte (offset) > - `(uffi:deref-array (buffer-stream-buffer bstream) > '(:array :unsigned-byte) (+ (* i 2) pos ,offset)))) > - (declare (type simple-string string) > - (type fixnum length pos code)) > - (assert (subtypep (type-of string) 'simple-string)) > - (assert (compatible-unicode-support-p :utf16le)) > - (loop for i fixnum from 0 below length do > - (setf code (dpb (next-byte 0) (byte 8 8) 0)) > - (setf code (dpb (next-byte 1) (byte 8 0) code)) > - (setf (schar string i) (code-char code))) > - (incf (elephant-memutil::buffer-stream-position bstream) > - (* length 2))) > - (the simple-string string))) > + (code 0) (endian 'endian-big)) > + (declare (type simple-string string) > + (type fixnum length pos code)) > + (assert (subtypep (type-of string) 'simple-string)) > + (assert (compatible-unicode-support-p :utf16le)) > + (when (= (read-utf-char-from-buffer 0 2 (buffer-stream-buffer > bstream) > + pos (machine-endian)) #xfffe) > + (setf endian 'endian-little) > + (decf length) > + (incf pos 2) > + (incf (elephant-memutil::buffer-stream-position bstream) 2)) > + (loop for i fixnum from 0 below length > + do (setf code > + (read-utf-char-from-buffer i 2 (buffer-stream-buffer bstream) > + pos endian)) > + (setf (schar string i) (code-char code))) > + (incf (elephant-memutil::buffer-stream-position bstream) > + (* length 2)) > + (the simple-string (subseq string 0 length)))) > > (defmethod deserialize-string ((type (eql :utf32le)) bstream > &optional temp-string) > (declare (type buffer-stream bstream)) > - (macrolet ((next-byte (offset) > - `(uffi:deref-array (buffer-stream-buffer bstream) > '(:array :unsigned-byte) (+ (* i 4) pos ,offset)))) > (let* ((length (buffer-read-int32 bstream)) > (string (or temp-string (make-string length :element-type > 'character))) > (pos (elephant-memutil::buffer-stream-position bstream)) > - (code 0)) > + (code 0) (endian 'endian-big)) > (declare (type string string) > (type fixnum length pos code)) > (assert (subtypep (type-of string) 'simple-string)) > (assert (compatible-unicode-support-p :utf32le)) > + (when (= (read-utf-char-from-buffer 0 4 (buffer-stream-buffer > bstream) > + pos (machine-endian)) #xfffe) > + (setf endian 'endian-little) > + (decf length) > + (incf pos 4) > + (incf (elephant-memutil::buffer-stream-position bstream) 4)) > (loop for i fixnum from 0 below length do > - (setf code (dpb (next-byte 0) (byte 8 24) 0)) > - (setf code (dpb (next-byte 1) (byte 8 16) code)) > - (setf code (dpb (next-byte 2) (byte 8 8) code)) > - (setf code (dpb (next-byte 3) (byte 8 0) code)) > - (setf (char string i) (code-char code))) > + (setf code (read-utf-char-from-buffer i 4 (buffer-stream- > buffer bstream) > + pos endian)) > + (setf (char string i) (code-char code))) > (incf (elephant-memutil::buffer-stream-position bstream) > (* length 4)) > - (the simple-string string)))) > - > - > - > - > - > + (the simple-string (subseq string 0 length)))) > > _______________________________________________ > elephant-devel site list > elephant-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/elephant-devel From smanek at gmail.com Sun Aug 16 09:48:53 2009 From: smanek at gmail.com (Shaneal Manek) Date: Sun, 16 Aug 2009 05:48:53 -0400 Subject: [elephant-devel] How to create a derived index? - Addendum Message-ID: Hello, my last message was a little sparse, so I thought I would provide some more detail. In addition to looking into the the elephant:add-class-derived-index function, I also tried using the :derived-fn syntax (see: http://paste.lisp.org/display/85431 for an example of what I've attempted). I want to be able to use ele:get-instances-by-range on the derived slot ... Thanks, Shaneal From sky at viridian-project.de Mon Aug 17 07:53:41 2009 From: sky at viridian-project.de (Leslie P. Polzer) Date: Mon, 17 Aug 2009 09:53:41 +0200 (CEST) Subject: [elephant-devel] revised UTF seriazer/desirializer patch In-Reply-To: <4F56E7E7-220B-4693-93D8-DFA83C9A9DFB@media.mit.edu> References: <20090808.111546.193724442.kom@narihara-lab.jp> <4F56E7E7-220B-4693-93D8-DFA83C9A9DFB@media.mit.edu> Message-ID: Ian Eslick wrote: > Thank you, this looks great. I'll review it and promote it in the > next week or so unless Leslie beats me to it. Xiangjun Plato Wu has kindly offered his help with this patch. He's added a test case and provided an analysis of the patch. Plato, could you post your results to the list? Leslie -- http://www.linkedin.com/in/polzer From netawater at gmail.com Mon Aug 17 12:56:32 2009 From: netawater at gmail.com (Xiangjun Wu) Date: Mon, 17 Aug 2009 20:56:32 +0800 Subject: [elephant-devel] Fwd: An Elephant task In-Reply-To: <6f8d23640908160718j6a3406cdwa6db43aa3b65124a@mail.gmail.com> References: <010054dab30c9ff281e12a2fecf11ff2.squirrel@mail.stardawn.org> <6f8d23640908050058m169b70a8vecb75ea51f40a744@mail.gmail.com> <6f8d23640908060218q1638db0fjf4e75be44f3149ef@mail.gmail.com> <74286f3e1ab6d78916460d8cf100b742.squirrel@mail.stardawn.org> <6f8d23640908060643l6e30b3d1l106cf0912769f70e@mail.gmail.com> <6f8d23640908080642r57a76f01o924221176e4d3394@mail.gmail.com> <8d8e0eab7f435ffc8fa6763c6c54be84.squirrel@mail.stardawn.org> <6f8d23640908160718j6a3406cdwa6db43aa3b65124a@mail.gmail.com> Message-ID: <6f8d23640908170556r5401cae8u6f71a82b2d006fb5@mail.gmail.com> ?????????? ---------- Forwarded message ---------- From: Xiangjun Wu Date: Sun, Aug 16, 2009 at 10:18 PM Subject: Re: An Elephant task To: leslie.polzer at gmx.net I have added the test case with his patch, please note the test data file is UTF-8 file. In my test, GET-INSTACES-BY-RANG is OK, but GET-ININSTACE-BY-VALUE and GET-ININSTACES-BY-VALUE are NG when new code run in old store. Please check it. ?????????? On Thu, Aug 13, 2009 at 4:44 AM, Leslie P. Polzer wrote: > > Xiangjun Wu wrote: > > I have studied his patch and here is my conclusion: > > > > libberkeley-db.c > > - return wcs_cmp((wchar_t*)ad+5+offset, read_int32(ad+offset, 1), > > (wchar_t*)bd+5+offset, read_int32(bd+offset, 1)); > > > > + return wcs_cmp((wchar_t*)(ad+5+offset), read_int32(ad+offset, 1), > > (wchar_t*)(bd+5+offset), read_int32(bd+offset, 1)); > >>>>> It want to type cast, but I believe the compiler also do it. > > I think gcc does, but the compiler is not required to upgrade/cast > the type of the other arguments. The second form seems more concise > so I guess it's better. > > > > unicode.lisp > > he modified serialize-to-utf16le, serialize-to-utf32le and > > deserialize-string ((type (eql :utf16le)), deserialize-string ((type > > (eql :utf32le)) > > with respect to serialize and deserialize utf16 or utf32 string. I'm > > not sure which is big endian or little endian, however he only do a > > revert operation, It is OK. > > What about backwards compatibility? Say I've got a store with > the old sorting order and load the code that uses the new one. > > Will the data sorted in the old way be read correctly? > > > > his test file is OK which based the rightness of string< for utf16 and > utf32 > > string. > > > > BerkeleyDB-tests is OK after I apply his patch in a clean repository. > > That's great! However we should also add the test cases he provided > to the test suite. Can you do that? > > Thanks a lot! :) > > Leslie > > -- > http://www.linkedin.com/in/polzer > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: unicode.patch Type: application/octet-stream Size: 3596 bytes Desc: not available URL: From eslick at media.mit.edu Thu Aug 20 19:28:32 2009 From: eslick at media.mit.edu (Ian Eslick) Date: Thu, 20 Aug 2009 12:28:32 -0700 Subject: [elephant-devel] How to create a derived index? - Addendum In-Reply-To: References: Message-ID: <41C51169-8C3D-4371-9387-654B11A4885A@media.mit.edu> I'm sorry for not replying earlier. I noticed a problem with that when I did an experiment while I was composing a reply to you and I haven't had a chance to look into it yet. Shamefully, I don't have good regression coverage of that feature yet, although I know some folks use it extensively. Leslie, can you take a quick look? Ian On Aug 16, 2009, at 2:48 AM, Shaneal Manek wrote: > Hello, my last message was a little sparse, so I thought I would > provide some more detail. > > In addition to looking into the the elephant:add-class-derived-index > function, I also tried using the :derived-fn syntax (see: > http://paste.lisp.org/display/85431 for an example of what I've > attempted). I want to be able to use ele:get-instances-by-range on the > derived slot ... > > Thanks, > Shaneal > > _______________________________________________ > elephant-devel site list > elephant-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/elephant-devel From eslick at media.mit.edu Thu Aug 20 19:37:24 2009 From: eslick at media.mit.edu (Ian Eslick) Date: Thu, 20 Aug 2009 12:37:24 -0700 Subject: [elephant-devel] Upgrading from version .9ish In-Reply-To: References: Message-ID: <0A26E47E-693E-416D-A0B6-C3BD5B3D7840@media.mit.edu> There should be an upgrade path through previous versions. If you use the upgrade procedure in the 0.9 release on an 0.8 database you should be good then you should then be able to open that 0.9 DB in the latest. I can't recall if we need another upgrade to go from 0.9 to 1.0, but I don't believe so. If you use gp-export and it works for you, it would be interesting to know that result. Thank you, Ian On Aug 13, 2009, at 12:07 PM, Red Daly wrote: > Hi, > > I have an elephant database from either the 0.8 or 0.9 era of > elephant and I was wondering what I can do to properly upgrade. > Using the latest version and upgrading bdb from 4.5 seems to have > wiped my data (which is backed up). Any suggestions? > > Thanks, > Red > _______________________________________________ > elephant-devel site list > elephant-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/elephant-devel From j.k.cunningham at comcast.net Fri Aug 21 05:05:25 2009 From: j.k.cunningham at comcast.net (Jeff Cunningham) Date: Thu, 20 Aug 2009 22:05:25 -0700 Subject: [elephant-devel] libmemutil.so isn't being built on my system Message-ID: <4A8E2B15.5090106@comcast.net> Hi; New here. I'm trying to get elephant installed on a 64-bit SBCL system and ran into this problem when right off the bat: Output file /usr/src/clbuild/source/elephant/libmemutil.so not found in elephant root [Condition of type SIMPLE-ERROR] Restarts: 0: [RETRY] Retry performing # on #. 1: [ACCEPT] Continue, treating # on # as having been successful. 2: [ABORT] Return to SLIME's top level. 3: [TERMINATE-THREAD] Terminate this thread (#) Backtrace: 0: ((SB-PCL::FAST-METHOD ASDF:PERFORM (ASDF:LOAD-OP ELEPHANT-SYSTEM:ELEPHANT-C-SOURCE)) ..) 1: ((LAMBDA (SB-PCL::.PV. SB-PCL::.NEXT-METHOD-CALL. SB-PCL::.ARG0. SB-PCL::.ARG1.)) ..) 2: ((LAMBDA ())) 3: ((FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK)) 4: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-RECURSIVE-LOCK]291)) 5: (SB-THREAD::CALL-WITH-RECURSIVE-LOCK ..) 6: (SB-C::%WITH-COMPILATION-UNIT #)[:EXTERNAL] 7: (ASDF:OPERATE ASDF:LOAD-OP :ELEPHANT)[:EXTERNAL] 8: (SB-FASL::LOAD-FASL-GROUP #) 9: ((FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK)) 10: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-RECURSIVE-LOCK]291)) 11: I figured it would be a noob situation and was surprised not to find anything I thought helpful in your archives (I googled them). I installed elephant through clbuild, and when I look in /usr/src/clbuild/source/elephant/src/ there is libmemutil.c but no corresponding library that probably should have been built. But I don't see any Makefile to build it with either. uname: Linux golum 2.6.28-13-generic #45-Ubuntu SMP Tue Jun 30 19:49:51 UTC 2009 i686 SBCL is 1.0.29.54.rc5 BerkeleyDB is 4.7 How should I proceed? Thanks for any assistance. Jeff Cunningham From elliottslaughter at gmail.com Fri Aug 21 05:25:34 2009 From: elliottslaughter at gmail.com (Elliott Slaughter) Date: Thu, 20 Aug 2009 22:25:34 -0700 Subject: [elephant-devel] libmemutil.so isn't being built on my system In-Reply-To: <4A8E2B15.5090106@comcast.net> References: <4A8E2B15.5090106@comcast.net> Message-ID: <42c0ab790908202225n6e577d8aqccf2921273fa429f@mail.gmail.com> What do you have in your my-config.sexp? Do you have (:prebuilt-libraries . t) ? Otherwise it will expect to find the binary pre-built and won't try to build it. Hope this helps. On Thu, Aug 20, 2009 at 10:05 PM, Jeff Cunningham < j.k.cunningham at comcast.net> wrote: > Hi; > > New here. I'm trying to get elephant installed on a 64-bit SBCL system > and ran into this problem when right off the bat: > > Output file /usr/src/clbuild/source/elephant/libmemutil.so not found in > elephant root > [Condition of type SIMPLE-ERROR] > > Restarts: > 0: [RETRY] Retry performing # on > #. > 1: [ACCEPT] Continue, treating # on > # as having > been successful. > 2: [ABORT] Return to SLIME's top level. > 3: [TERMINATE-THREAD] Terminate this thread (# {D673369}>) > > Backtrace: > 0: ((SB-PCL::FAST-METHOD ASDF:PERFORM (ASDF:LOAD-OP > ELEPHANT-SYSTEM:ELEPHANT-C-SOURCE)) ..) > 1: ((LAMBDA (SB-PCL::.PV. SB-PCL::.NEXT-METHOD-CALL. SB-PCL::.ARG0. > SB-PCL::.ARG1.)) ..) > 2: ((LAMBDA ())) > 3: ((FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK)) > 4: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-RECURSIVE-LOCK]291)) > 5: (SB-THREAD::CALL-WITH-RECURSIVE-LOCK ..) > 6: (SB-C::%WITH-COMPILATION-UNIT # {D8CDD3D}>)[:EXTERNAL] > 7: (ASDF:OPERATE ASDF:LOAD-OP :ELEPHANT)[:EXTERNAL] > 8: (SB-FASL::LOAD-FASL-GROUP # /tmp/file7W7Xqn.fasl" {D680301}>) > 9: ((FLET SB-THREAD::WITH-RECURSIVE-LOCK-THUNK)) > 10: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-RECURSIVE-LOCK]291)) > 11: > > I figured it would be a noob situation and was surprised not to find > anything I thought helpful in your archives (I googled them). I > installed elephant through clbuild, and when I look in > /usr/src/clbuild/source/elephant/src/ there is libmemutil.c but no > corresponding library that probably should have been built. But I don't > see any Makefile to build it with either. > > uname: Linux golum 2.6.28-13-generic #45-Ubuntu SMP Tue Jun 30 19:49:51 > UTC 2009 i686 > SBCL is 1.0.29.54.rc5 > BerkeleyDB is 4.7 > > How should I proceed? > > Thanks for any assistance. > > Jeff Cunningham > > _______________________________________________ > elephant-devel site list > elephant-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/elephant-devel > -- Elliott Slaughter "Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay -------------- next part -------------- An HTML attachment was scrubbed... URL: From sky at viridian-project.de Fri Aug 21 06:48:35 2009 From: sky at viridian-project.de (Leslie P. Polzer) Date: Fri, 21 Aug 2009 08:48:35 +0200 (CEST) Subject: [elephant-devel] How to create a derived index? - Addendum In-Reply-To: <41C51169-8C3D-4371-9387-654B11A4885A@media.mit.edu> References: <41C51169-8C3D-4371-9387-654B11A4885A@media.mit.edu> Message-ID: Ian Eslick wrote: > I'm sorry for not replying earlier. I noticed a problem with that > when I did an experiment while I was composing a reply to you and I > haven't had a chance to look into it yet. Shamefully, I don't have > good regression coverage of that feature yet, although I know some > folks use it extensively. > > Leslie, can you take a quick look? I already did but the conversation went private. The bottom line was that derived indices seem to be broken right now. The OP agrees with me that it should be fixed and has volunteered to try it with my help. Otherwise I'm going to fix them with whatever spare time I can find. Leslie -- http://www.linkedin.com/in/polzer From henrik at evahjelte.com Fri Aug 21 07:37:37 2009 From: henrik at evahjelte.com (Henrik Hjelte) Date: Fri, 21 Aug 2009 09:37:37 +0200 Subject: [elephant-devel] libmemutil.so isn't being built on my system In-Reply-To: <4A8E2B15.5090106@comcast.net> References: <4A8E2B15.5090106@comcast.net> Message-ID: <50e8e4f60908210037q37e0c5cds69d6c22b03a40e0f@mail.gmail.com> On Fri, Aug 21, 2009 at 7:05 AM, Jeff Cunningham wrote: > Hi; > > New here. I'm trying to get elephant installed on a 64-bit SBCL system > and ran into this problem when right off the bat: An error that often happens is that cffi is installed. Someone made an evil patch that renamed uffi-compat.asd to uffi.asd, so now by accident sometimes the cffi uffi compatiblity thing is picked up when compiling elephant. I would look for an cffi installation, find the uffi.asd file there and rename it back to cffi-uffi-compat.asd. I am not sure this is your problem, but it is the first thing I would check. /Henrik From j.k.cunningham at comcast.net Fri Aug 21 14:19:20 2009 From: j.k.cunningham at comcast.net (Jeff Cunningham) Date: Fri, 21 Aug 2009 07:19:20 -0700 Subject: [elephant-devel] libmemutil.so isn't being built on my system In-Reply-To: <50e8e4f60908210037q37e0c5cds69d6c22b03a40e0f@mail.gmail.com> References: <4A8E2B15.5090106@comcast.net> <50e8e4f60908210037q37e0c5cds69d6c22b03a40e0f@mail.gmail.com> Message-ID: <4A8EACE8.9080602@comcast.net> Henrik Hjelte wrote: > > > An error that often happens is that cffi is installed. Someone made an > evil patch that renamed uffi-compat.asd to uffi.asd, so now by > accident sometimes the cffi uffi compatiblity thing is picked up when > compiling elephant. I would look for an cffi installation, find the > uffi.asd file there and rename it back to cffi-uffi-compat.asd. I am > not sure this is your problem, but it is the first thing I would > check. > > /Henrik > > > Thanks, Henrik. That was exactly my problem. So, before I contact the CFFI developers and ask them to fix this annoying case of identity theft, do you know if anyone has already done so? --Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From henrik at evahjelte.com Sat Aug 22 07:28:08 2009 From: henrik at evahjelte.com (Henrik Hjelte) Date: Sat, 22 Aug 2009 09:28:08 +0200 Subject: [elephant-devel] libmemutil.so isn't being built on my system In-Reply-To: <4A8EACE8.9080602@comcast.net> References: <4A8E2B15.5090106@comcast.net> <50e8e4f60908210037q37e0c5cds69d6c22b03a40e0f@mail.gmail.com> <4A8EACE8.9080602@comcast.net> Message-ID: <50e8e4f60908220028h3f38f49cqbe1ab9cf1051585e@mail.gmail.com> On Fri, Aug 21, 2009 at 4:19 PM, Jeff Cunningham wrote: > Thanks, Henrik. That was exactly my problem. > > So, before I contact the CFFI developers and ask them to fix this annoying > case of identity theft, do you know if anyone has already done so? I think that someone contacted the clbuild maintainers, but I don't think someone has complained to the cffi developers which I think would be the right thing to do. It would be one thing if cffi-compat could compile elephant, but until that time this problem steals a lot of time for people, you are not the first victim. /Henrik From j.k.cunningham at comcast.net Sat Aug 22 14:38:47 2009 From: j.k.cunningham at comcast.net (Jeff Cunningham) Date: Sat, 22 Aug 2009 07:38:47 -0700 Subject: [elephant-devel] libmemutil.so isn't being built on my system In-Reply-To: <50e8e4f60908220028h3f38f49cqbe1ab9cf1051585e@mail.gmail.com> References: <4A8E2B15.5090106@comcast.net> <50e8e4f60908210037q37e0c5cds69d6c22b03a40e0f@mail.gmail.com> <4A8EACE8.9080602@comcast.net> <50e8e4f60908220028h3f38f49cqbe1ab9cf1051585e@mail.gmail.com> Message-ID: <4A9002F7.3060405@comcast.net> Henrik Hjelte wrote: > On Fri, Aug 21, 2009 at 4:19 PM, Jeff > Cunningham wrote: > >> Thanks, Henrik. That was exactly my problem. >> >> So, before I contact the CFFI developers and ask them to fix this annoying >> case of identity theft, do you know if anyone has already done so? >> > > I think that someone contacted the clbuild maintainers, but I don't > think someone has complained to the cffi developers which I think > would be the right thing to do. It would be one thing if cffi-compat > could compile elephant, but until that time this problem steals a lot > of time for people, you are not the first victim. > > /Henrik > > _______________________________________________ > elephant-devel site list > elephant-devel at common-lisp.net > http://common-lisp.net/mailman/listinfo/elephant-devel > > I'm working on them, but I could use some help. Hans Hubner weighed in on my side, but mostly they want go this route: Elephant breaks UFFI's abstractions and assumes SB-ALIEN is being used behind the scenes. That naturally breaks with cffi-uffi-compat. They want you to either fix this, or better yet, convert over entirely to CFFI. It seems clear to me that whether they are right or wrong, you ought to be able to expect that another package not break your package via the ASDF system, when yours has no dependency on theirs. This is a basic Lisp ecosystem issue that has dogged the Lisp community for years. Somehow we have got to get past this stuff or we'll never get anywhere. Its like the rule of law: people will never venture very much if they can't count on a system supporting them to protect what they venture. --Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: