[elephant-devel] Garbage collection problem

Ian Eslick eslick at csail.mit.edu
Thu Sep 27 15:57:38 UTC 2007


On Sep 27, 2007, at 11:46 AM, Chris Laux wrote:

> Thanks Ian, that is some good analysis.
>
>> - If you are doing a significant amount of deserialization with  
>> lots of
>> threads than you should know that each deserialization requires a  
>> call
>> to (with-lock ...) to ensure that the shared pool of buffer  
>> streams is
>> thread safe (a problem with elephant < 0.9).  This could conceivably
>> cause a lockup if there are lots of small deserializations happening
>> concurrently across threads mapping over the same Btrees.
>
> I had a vague suspicion of something like that, but only looked at
> transactions. I guess I would have to modify elephant to allow me  
> to do
> the locking to solve such a problem.

By lockup I really meant bottleneck rather than deadlock.  Elephant  
really should be thread-safe now but it's always possible there is  
some weird case we haven't seen yet.

>> Are you sure
>> it's GC that's eating all the time, or non-lisp CPU time in general?
>
> Well, the 99% CPU is reported for the sbcl process. I only know that
> manually invoking a gc will trigger the problem.
>
>> Although it breaks the abstraction barrier, using IDs will be a  
>> definite
>> gain.  You'd just make that second BTree pairs of word-freq / obj- 
>> oid.
>> Then you use the OID and object type to grab the object directly from
>> elephant: (elephant::get-cached-instance oid classname)
>
> I have also been considering doing away with the second layer of  
> BTrees,
> and using my own, more "linear" structures. Not sure what that could
> look like exactly though.

Updates are the real problem and you'd have to load the entire 2nd  
level data structure to do any processing on it.

>> You might be better off, performance
>> wise, doing this in a C full-text indexing system and wrapping an
>> interface to it.
>
> I hadn't thought of that yet. Can you recommend any?
>
> Anyway, I guess I was asking for trouble a bit with my setup. I'm not
> sure how I'll proceed yet, but if I stick to the two-level BTree setup
> and use id's I know what to look out for.

I'd suggest you try this and see if it helps if the overhead isn't  
too insane.

Ian


> Thanks again,
>
> Chris
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list