[elephant-devel] Querying Advice [w/code example]

Daniel Salama lists at infoway.net
Mon Nov 13 05:19:40 UTC 2006


Wow! I wasn't expecting to hear back from you until Monday :)

I haven't been able to run your code yet. I downloaded SBCL for PPC  
but have not been able to successfully compile Elephant with it. I  
will continue trying. I used to have it working before, but stopped  
using SBCL a while ago for OpenMCL.

Anyway, some comments before I can run your code and provide more  
feedback:

100% agree with all this. However, one thing that might have been  
oversimplified is when it comes down to complex relational queries.  
There are two main issues to address here:

1) The nature of dynamic queries as presented in my original email:  
your explanation and code seems to address this in a subtle way and,  
although I believe you when you say it works, I'd say that it could  
be enhanced in a way to make it generic enough. I remember the how  
Peter Siebel presents such a nice and simple interface to querying a  
CD/MP3 database in Chapter 3 of his Practical Common Lisp book. In a  
way, he kind of developed a small domain language for querying and I  
think something like that can be implemented with the sample code you  
provided below.

2) What happens when you have multiple relational tables... or more  
appropriate in our case, when you want to query an object model based  
on different object and object relational hierarchies. Say, for  
example, you have class person which has references to an object  
called demographics and an object postal-code, which, in turns, has a  
reference to an object state. How would you apply your proposal to a  
query such as "give me all 'males' under the age of 30 who live in  
'NY' sorted by age in descending order and then by last, first  
names". It obviously is very simple to do this in SQL (if the proper  
relations exist between the tables), but may not be (or at least  
doesn't seem to be initially) as trivial in Elephant or applying your  
code below. Your dynamically constructed list of query-able columns  
approach may be a bit limited. Also, class indices may not  
necessarily be appropriate either in these situation.

Maybe I haven't delved deeply enough into your proposal and/or  
haven't given it sufficient thought (I apologize for that if I  
belittled your proposal).

I definitely understand the potential overhead of using secondary  
indices and agree that their use should be determined by how often  
you really need to search by those slots.

I will continue trying to make SBCL work again in my PPC and then run  
your code. Maybe play with it and try to simulate different scenarios  
such as I mentioned above. Will then send more comments.

Thanks again,
Daniel

P.S. I don't mean to distract anyone from the main thread of this  
email. But I noticed how you make use of format to append values to  
strings (e.g. random-password). Is that like the only way to do so,  
or maybe the more efficient way of doing so? Thanks again

On Nov 12, 2006, at 6:45 PM, Robert L. Read wrote:

> Dear Daniel and Team,
>
>     I think the code below, which I have tested on SBCL,  
> illustrated a typical problem that Daniel Salama introduces.  To  
> paraphrase, you have a datatype (perhaps compound) which has a lot  
> of slots; you have a GUI, perhaps web-based, that you use to both  
> select or filter the large database, and to decide how to present  
> sort the results.  I've written the below example as if you  
> operating directly on the slots.  The fact that there are often  
> intervening functions does not fundamentally change the problem.   
> (An example of this is storing a timestamp as an integer, but  
> presenting it in a human-readable format.)
>     SQL supports a powerful querying ability based on both  
> selection and sorting.  One might think that this is an advantage  
> of SQL; it is conventional reason that this is actually an  
> advantage of using a relational database.  However, since LISP  
> treats functions as first-class citizens that can be constructed  
> dynamically, you actually have a full Turing-complete capabilities  
> in doing queries that SQL cannot match.  This same ability applies  
> to sorting; you can sort on any lexical order that you can program.
>     In practice, however, one doesn't always need this power.  More  
> typically, a user will select fields that they want to use to  
> filter the results (that is, construct a query from), and perhaps  
> how they would like the results to be sorted.  I assume that you  
> know how to interpret an HTTP query or a McClim user interface or  
> something to associate the GUI with underlying functions.  (My  
> personal framework has a way to do this, and UCW is probably the  
> most common or famous way to do it now.)
>     The code below generated 100 random "users".  The bare act of  
> defining this class defines accessor-functions that we can use in  
> dynamically constructed lists as below: (list 'username-of 'balance- 
> of).  I have written very small functions that use such lists  
> either to define define "lexicographic" sort orders based on the  
> order of the functions within the list.  That is, the primary sort  
> criteria is the first function in the list, but of that function is  
> equal for two values, the next is used and so on.  If you load the  
> below code and execute (show-off) several times I think you will  
> see what I mean.  You can then see how easily you can change the  
> list of functions that are either in the selector or the sort  
> criteria. If this is a web-based app, these list will be generated  
> from the http-query, which is generated by the user's clicks.
>     That is a "columnar" based approach; but it one can do  
> something similar but more powerful based on computed functions  
> that aren't based on individual columns, but on the entire data  
> element.  For example "find users whose username is equal to their  
> password" cannot be done in this way --- but can be done by just  
> using a function #(lambda (x) (equal (username-of x) (password-of  
> x)).  SQL can do this --- but LISP can could use any function  
> there, such as "find users who have both short usernames and  
> passwords that can be cracked by routine-x".
>     Instead of adding things to the root or the store-controller  
> directly, one would generally prefer
> to use consistent classes:
> http://common-lisp.net/project/elephant/doc/Persistent- 
> Classes.html#Persistent-Classes
>
> This doesn't change the nature of the problem.  If you like, you  
> can create an index on any slot in a very convenient way: http:// 
> common-lisp.net/project/elephant/doc/Class-Indices.html#Class-Indices
> This holds out the possibility of NOT having to iterate over the  
> entire data-set, but rather honing in directly upon
> matching values. (You can in fact create functional indexes based  
> on any function at all in Elephant, which is something that SQL  
> can't do conveniently, but the times you will need to do this are  
> rare.)
>
> It would take maybe 20 minutes more coding to uses Ian's "get- 
> instances-by-range" directly, and in a very efficient manner for  
> performing the query (if the GUI elements correspond to the class's  
> slots.)   This would be very efficient; but of course you should  
> not do this until you know that this is really the bottleneck in  
> your system.  By using cursors, you can avoid reading the entire  
> data into memory, and thus process huge datasets.
>     However, one should note that Ian's code makes creating indices  
> on slots zero-effort; but indexes always have overhead.  The real  
> question is: when will your queries actually utilize the index?   
> (That is, if you always select on one column/slot, then that one  
> should be indexed....but if your query pattern is more complicated,  
> it becomes fuzzy.)
>     Let me know if you find this useful;  after I get feedback from  
> you and Ian has made his post, perhaps we will put
> this in the documentation.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061113/bd29b38a/attachment.html>


More information about the elephant-devel mailing list