[elephant-devel] Querying Advice

Daniel Salama lists at infoway.net
Mon Nov 13 04:17:36 UTC 2006


On Nov 12, 2006, at 3:48 PM, Robert L. Read wrote:

> This requires a philosophical response.  In general, I think it  
> will be way easier than
> you image, once you have been pointed in the right direction.  Take  
> my advice with
> a grain of salt.

I certainly hope so. As I have been learning lisp, I have full  
confidence that with the proper knowledge and the right guidance,  
this is definitely a manageable task.

>
> First of all, ask yourself, what is the size of your dataset?  Can  
> you fit it all into memory?
> If so, you have the full power of lisp at your command in dealing  
> with the querying.  You
> will not have to write any macros to do this.  You might find the  
> DCM package, in the "contrib"
> directory, a useful package, although it does not address querying;  
> it is more of a cache handling
> issue.  (DCM has only been tested under SBCL, as far as I know.)

In general, I do think that the dataset fits in memory. However, we  
have not fully loaded all of our data into Elephant. Simply looking  
at the MySQL file storage for the database, it occupies 900MB of disk  
space, including indices. However, when we did loads of some of the  
tables into Elephant, the size of the Elephant data files were, at  
least, 5 times the size. I don't know the reason why. I don't know if  
that's the nature of BDB when it stores "arbitrarily" any type of  
object in the k,v pair. I don't know if we had some circular  
references when we loaded our model and that just simply increased  
the data file by that much (although I wouldn't think so, since if  
that was the case, I would expect that it only stored references to  
objects and not duplicating the objects). Regardless, our dev server  
currently has 4GB of RAM. I think that once properly loaded, all the  
data should be able to fit in memory.

Now, for the target application, I prefer not to rely in the data  
fitting in memory. Reason being is that the nature of the application  
requires the data to be available for several years. This 900MB is  
the results of only 1 year for one company. As we get more companies  
to use the application and keep the data online for several years,  
the assumption of the data fitting into memory will no longer be  
applicable.

I have been looking into the DCM package and I think that it  
certainly looks promising. We haven't used it yet, but certainly hope  
that sooner, rather than later, will be made part of Elephant  
permanently. I also hope that it's not targeted mainly at the in- 
memory database type of application, but rather, as an efficient  
caching mechanism for persistent data (regardless of where it's being  
permanently stored).

With regards to: "...If so, you have the full power of lisp at your  
command in dealing with the querying...", I agree with you. However,  
where I'm trying to get at is how "easy" would it be to generate  
these type of dynamic queries in a generic way. Of course, we could  
always hard code all the cases for each of our different searchable  
screens, but the thought of that simply just makes me vomit :)

>
> Under SBCL, when it comes to sorting you have "sort" and "stable- 
> sort"; I think these are build in.
> I'll eat a candle if you don't find them to blazingly fast  
> (although the predicates that you pass them
> might take some time.)

I thought the answer to my sorting question is exactly addressed by  
your comment. I suppose that once I have the resulting dataset, I  
could run it by "sort". They key would be how to make it arbitrarily  
sortable (in a similar way as the dynamic query)

>
> I think really the only good way to answer this question in a  
> deeper way is to provide some
> example code.  I do exactly what you are talking about in my  
> application (http://konsenti.com),
> (although I use DCM), so I ought to be able to produce an example  
> program relatively quickly.
> You'll have to figure out how to map the GUI into those requests  
> yourself, however.
>

I believe (and hope) that we should have no problem mapping the GUI  
to the requests. Just out of curiosity (and I don't mean to divert  
from the topic of this thread): if you're using DCM for your konsenti  
(BTW, nice concept) site, how do you protect your in-memory data? Do  
you just write an image to disk every once in a while for back ups?  
How resilient is this to hardware failure and you loosing data since  
the last image (if that's your approach)?

> I'll try to post an example by Monday.
>

Thanks,
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/elephant-devel/attachments/20061112/b8bd6c88/attachment.html>


More information about the elephant-devel mailing list