[elephant-devel] Sorting

Ian Eslick eslick at csail.mit.edu
Mon Oct 22 17:18:36 UTC 2007


There have been a number of discussions about sorting issues and  
guarantees among the various data store implementations.  This is  
difficult due to different backend serialized representations.  This  
is my understanding of the current guarantees.  Let's commit to this  
as part of locking down 0.9.1.

Any ordered data structure, such as btree or indexed-btrees, has to  
make some guarantees about the function that defines the ordering.   
Currently, the BDB data store and Elephant core implements the  
following policy (I believe this is also true for CL-SQL stores as  
well):

General policies:

In a BTree with multiple types, each type is sorted as an independent  
subsequence.  Types are guaranteed to be separated, but the order of  
the type sub-sequences is not guaranteed.

In an Index with multiple values, there is no guarantee about the  
ordering of duplicate values, the the index is treated as a sorted  
sequence of unordered equivalence classes.  Tests and users should  
not depend on this.

Specific types:

Numeric types (fixnum, bignum, float, double, rational and complex):
Sorted properly over the continuous real numbers, smallest to biggest

Characters:
Currently considered and sorted in with all the above numeric types  
(this is probably wrong)

Strings:
Sorted alphabetically, case-INSENSITIVE

Symbols/Pathnames:
Sorted as strings above, but symbols and paths are sorted  
independently as type classes.

Persistent objects are sorted according to OID, but the OID ordering  
may change at any time, so long as object equivalency is maintained  
(for example, if we implement GC).  Users should not depend on the  
OID for anything other than uniqueness unless they are implemented a  
new low-level persistent-collection data structure like (N:1 and N:N)  
associations that is GC aware.  Otherwise the order can only be  
guaranteed within a given transaction.

All other types have no user-visible ordering guarantees.  They have  
an arbitrary ordering within the data store, but there is no lisp  
ordering that makes sense for hash tables, standard objects, etc.   
(You can create derived indices of an indexed-btree that compute a  
number, string, etc that can serve as an ordering for arbitrary  
objects).

Open issues:
- Characters as independent class, or as numeric codes?
- If/how to map over a type subset in an index or btree


Anything I missed?

Ian





More information about the elephant-devel mailing list