[elephant-devel] BTREE Sorting on Symbol Strings?

Ian Eslick eslick at csail.mit.edu
Sun Jan 21 22:21:02 UTC 2007


How often do users of Elephant rely on BTrees where the symbols are  
ordered according to the symbol's name?

Would it be terribly inconvenient to you if you had to convert symbol  
keys to strings to get an alphabetical ordering, but were still  
assured of contiguity of identical symbol keys in secondary btrees?

The argument is that we serialize symbols to strings all the time  
(slot access, etc) and this engenders a great deal of overhead.  Most  
symbol serialization is highly redundant and can be factored out by  
assigning a persistent ID to each symbol as we do with persistent  
objects.  This results in significantly less disk space (a constant  
vs. N*char_width bits for every slot value), reduced IO bandwidth  
(and increased locality), and less serialization/deserialization time.

The downside is that the C function passed to BDB which is used to  
compare two strings so that the BTree is ordered does not have access  
to the persistent table.  Thus we can't order strings according to  
their characters, but only according to their ID which means a random  
order.  Of course symbols will be identical to themselves and so will  
be grouped together in duplicate indices.

Sorting according to the characters of the symbol may be possible,  
but there are a number of implications that require some thinking  
about and I want to put this off for now.

As this is a user configurable option (my-config.sexp) and will be  
enabled by default I don't think there is any harm in promoting this.

Comments?

Thank you,
Ian



More information about the elephant-devel mailing list