[elephant-devel] Sorting

Robert L. Read read at robertlread.net
Mon Oct 22 19:55:42 UTC 2007


I think this is the best policy that we can generate. 

I personally don't like it, as I've mentioned before.  I don't think 
we should have an arbitrary order. If you don't have a specified order,
then you don't really have a btree in meaningful sense.  You can't do a
range query over something that is arbitrary, and if you don't have a
range query, why call it a btree? You should just call it a dictionary,
(which happens to be implemented with btrees) equip it with an
enumeration, and have something just as efficient and simpler than a
btree, which furthermore decreases the chances of someone accidentally
depending on the order, since you never provide a way to utilize it
(although of course humans, being human, will accidentally depend on the
ordering of objects returned by the enumeration, so the problem never
goes away from a software engineering point of view.)

However, I don't have time to reorganize things to present a
"dictionary" in addition to a "btree" abstraction; and even if someone
volunteered to do this work, it would be worthy of a new release, rather
than a discussion of the current release.  And since Ian has done much
more work than I have in the last year, I will defer to his judgment.

Is there a reason that string sorting is case insensitive?  This seems
like asking for trouble --- that's something that people are likely to 
depend on and be surprised when they switch repositories and find it is
different.


On Mon, 2007-10-22 at 13:18 -0400, Ian Eslick wrote:
> There have been a number of discussions about sorting issues and  
> guarantees among the various data store implementations.  This is  
> difficult due to different backend serialized representations.  This  
> is my understanding of the current guarantees.  Let's commit to this  
> as part of locking down 0.9.1.
> 
> Any ordered data structure, such as btree or indexed-btrees, has to  
> make some guarantees about the function that defines the ordering.   
> Currently, the BDB data store and Elephant core implements the  
> following policy (I believe this is also true for CL-SQL stores as  
> well):
> 
> General policies:
> 
> In a BTree with multiple types, each type is sorted as an independent  
> subsequence.  Types are guaranteed to be separated, but the order of  
> the type sub-sequences is not guaranteed.
> 
> In an Index with multiple values, there is no guarantee about the  
> ordering of duplicate values, the the index is treated as a sorted  
> sequence of unordered equivalence classes.  Tests and users should  
> not depend on this.
> 
> Specific types:
> 
> Numeric types (fixnum, bignum, float, double, rational and complex):
> Sorted properly over the continuous real numbers, smallest to biggest
> 
> Characters:
> Currently considered and sorted in with all the above numeric types  
> (this is probably wrong)
> 
> Strings:
> Sorted alphabetically, case-INSENSITIVE
> 
> Symbols/Pathnames:
> Sorted as strings above, but symbols and paths are sorted  
> independently as type classes.
> 
> Persistent objects are sorted according to OID, but the OID ordering  
> may change at any time, so long as object equivalency is maintained  
> (for example, if we implement GC).  Users should not depend on the  
> OID for anything other than uniqueness unless they are implemented a  
> new low-level persistent-collection data structure like (N:1 and N:N)  
> associations that is GC aware.  Otherwise the order can only be  
> guaranteed within a given transaction.
> 
> All other types have no user-visible ordering guarantees.  They have  
> an arbitrary ordering within the data store, but there is no lisp  
> ordering that makes sense for hash tables, standard objects, etc.   
> (You can create derived indices of an indexed-btree that compute a  
> number, string, etc that can serve as an ordering for arbitrary  
> objects).
> 
> Open issues:
> - Characters as independent class, or as numeric codes?
> - If/how to map over a type subset in an index or btree
> 
> 
> Anything I missed?
> 
> Ian
> 
> 
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list