[elephant-devel] Strange behavior with indexed-btree

Kevin Raison raison at chatsubo.net
Sun Jan 11 04:46:01 UTC 2009


Ian, luckily I am in the development stages of my app, so I can freely 
throw away my data whenever I like (it is just syslog data from some 
very chatty networking  hardware, so there is always more to be had).  I 
have experienced these issues on 32 and 64 bit systems, though my main 
system is 64 bit.  I just realized that while I have rebuilt elephant 
several times, I have not recompiled my own code in a while.  I will try 
that, and if it fails, I'll figure out some way to to get you access to 
a test case.  Thanks again for all of your help with this;  I hope I am 
contributing to the stabilization of 1.0, and not just chasing my own bugs!

-Kevin

Ian Eslick wrote:
> This is very odd behavior.  I have a live webapp that uses all this  
> stuff pretty heavily.
> 
> I did see this active cursor message earlier, but it was due to some  
> debugging churn in the BDB transaction handler a day or two ago.   
> Those issues should all be resolved now, but it's possible something  
> isn't working right.  Since I did make some changes to the main  
> transaction macros and that means that everything depending on those  
> macros needs to be compiled - asdf should do that automatically but  
> you may want to force a rebuild on the new elephant tree and your app.
> 
> Sometimes I find it useful to manually run a full, recover-fatal pass  
> manually just to be sure BDB itself is happy.  (db_recover -f)
> 
> Deserialization errors have become pretty rare.  I suppose it is  
> possible that you have a database that has a value that somehow got  
> corrupted.  Have you tried your app on a fresh database?  It might be  
> worth doing that to see if you can reproduce the problem.  (How  
> critical is the data you have in there now?)
> 
> By the way:
> - Do you have a 32 or 64 bit BDB/SBCL?
> - Have you double checked your my-config.sexp to make sure everything  
> is set right?
> 
> Can you produce a stand-alone test case I can try locally?  If there  
> aren't too many dependencies, I'd be happy to try reproduce this  
> locally or login to a machine and look at it remotely.
> 
> By the way, stack traces should be a bit more meaningful now if you  
> want to send me the latest full stack trace.
> 
> Another thing you can try is tracing 'get-value' and (setf get-value)  
> to see all the data you are reading/writing; perhaps it's one  
> particular value that is causing the serializer to choke?
> 
> Ian
> 
> On Jan 10, 2009, at 6:27 PM, Kevin Raison wrote:
> 
>> Unfortunately, things remain as before.  Here is some additional debug
>> output that I found in the *inferior-lisp* buffer of my emacs session:
>>
>> Deserialization error in map: returning nil for element
>> transaction has active cursors
>> PANIC: Invalid argument
>> PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
>> PANIC: fatal region error detected; run recovery
>>
>>
>>
>> The deserialization error happens in one of two spots:
>> - when trying to get-from-root a btree or indexed btree
>> - when trying to get a pset via a call to get-value on the btree
>>
>> Running recovery does not help;  the errors start as soon as I load  
>> and
>> execute my code post-recovery.
>>
>> Ian, can you enlighten me as to where in the code you believe these
>> errors to originate?  Also, can you think of anything that I might be
>> doing incorrectly in my code that would cause this sort of behavior?
>> Here is a brief outline of what I am doing:
>>
>> There is
>>  - a waitqueue (via sbcl's sb-thread:make-waitqueue)
>>  - a tcp listener thread that writes its input to an incoming work  
>> queue
>>  - 3 reader threads who use the waitqueue to get work items from the
>> work queue
>> The readers parse the incoming data and create a persistent object  
>> from
>> it.  They also index the individual words of one of the persistent
>> object's slots using a btree.  These actions are contained in a
>> with-transaction block.
>> There is also a hunchentoot web server thread running.  Users can  
>> search
>> the indexed text via this web interface.
>>
>> It is really rather simple.  The errors crop up randomly, sometimes  
>> when
>> I am running searches via the web interface while the listener and
>> reader threads are active, and sometimes while the listener and reader
>> threads are doing their thing without any web-based reads happening.
>>
>> Cheers,
>> Kevin
>>
>>
>> Kevin Raison wrote:
>>> Thanks, Ian.  All tests pass for me as well.  I am running my
>>> application with the new code and will let you know how it goes.
>>>
>>> Thanks again!
>>> Kevin
>>>
>>> Ian Eslick wrote:
>>>> I just checked in a few more fixes a few minutes ago.  I finally was
>>>> able to reproduce some of this locally.  Try cleaning out the test
>>>> database and re-running the tests.  Everything passes for me on a
>>>> fresh DB and my own application is running fine too.
>>>>
>>>> Ian
>>>>
>>>> On Jan 10, 2009, at 2:04 PM, Kevin Raison wrote:
>>>>
>>>>> After pulling the latest patches and rebuilding without
>>>>> optimizations as
>>>>> you suggest, I get the following errors, the second of which is
>>>>> slightly
>>>>> different than what I was receiving previously.
>>>>>
>>>>> First, a deserialization error:
>>>>> Condition ELEPHANT-TYPE-DESERIALIZATION-ERROR was signalled.
>>>>>   [Condition of type ELEPHANT-TYPE-DESERIALIZATION-ERROR]
>>>>>
>>>>> Restarts:
>>>>> 0: [RETRY] Retry SLIME REPL evaluation request.
>>>>> 1: [ABORT] Return to SLIME's top level.
>>>>> 2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
>>>>> RUNNING {10034F31F1}>)
>>>>>
>>>>> Backtrace:
>>>>>  0: ((LAMBDA (SWANK-BACKEND::DEBUGGER-LOOP-FN)) #<FUNCTION  
>>>>> (LAMBDA #)
>>>>> {10040B2389}>)
>>>>>  1: (SWANK::DEBUG-IN-EMACS #<ELEPHANT-TYPE-DESERIALIZATION-ERROR
>>>>> {1002CB3531}>)
>>>>>  2: (SWANK:INVOKE-SLIME-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION-
>>>>> ERROR
>>>>> {1002CB3531}>)
>>>>>  3: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {1002CB3AC9}>)
>>>>>  4: (INVOKE-DEBUGGER #<ELEPHANT-TYPE-DESERIALIZATION-ERROR
>>>>> {1002CB3531}>)
>>>>>  5: ((FLET #:FUN23) #<ELEPHANT-TYPE-DESERIALIZATION-ERROR
>>>>> {1002CB3531}>)
>>>>>  6: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) ..)[:EXTERNAL]
>>>>>  7: (SB-INT:SIMPLE-EVAL-IN-LEXENV (REINDEX-LOG-ENTRIES) #<NULL-
>>>>> LEXENV>)
>>>>>  8: (SWANK::EVAL-REGION "(reindex-log-entries)\n")
>>>>>  9: ((LAMBDA ()))
>>>>> 10: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {1002C596F9}>)
>>>>> 11: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation
>>>>> request." #<CLOSURE (LAMBDA #) {1002C59619}>)
>>>>> 12: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #)
>>>>> {1002C595E9}>)
>>>>> 13: (SWANK::REPL-EVAL "(reindex-log-entries)\n")
>>>>>
>>>>> And then this one with any subsequent access to the bdb:
>>>>>
>>>>> Bad type argument:
>>>>>
>>>>>
>>>>>  BDB-DB-ERROR
>>>>>   [Condition of type SIMPLE-TYPE-ERROR]
>>>>>
>>>>> Restarts:
>>>>> 0: [RETRY] Retry SLIME REPL evaluation request.
>>>>> 1: [ABORT] Return to SLIME's top level.
>>>>> 2: [TERMINATE-THREAD] Terminate this thread (#<THREAD "repl-thread"
>>>>> RUNNING {10034F31F1}>)
>>>>>
>>>>> Backtrace:
>>>>>  0: (MAKE-CONDITION BDB-DB-ERROR)[:EXTERNAL]
>>>>>  1: (ERROR BDB-DB-ERROR)[:EXTERNAL]
>>>>>  2: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unused argument> #<unused
>>>>> argument>
>>>>> #<BDB-STORE-CONTROLLER /srv/console/db/> #<CLOSURE (LAMBDA #)
>>>>> {10037C53D9$
>>>>>  3: (ADD-LOG-ENTRY)[:EXTERNAL]
>>>>>  4: (SB-INT:SIMPLE-EVAL-IN-LEXENV (ADD-LOG-ENTRY :SEVERITY 2
>>>>> :HOST-UUID "123" :TIMESTAMP ...) #<NULL-LEXENV>)
>>>>>  5: (SWANK::EVAL-REGION "(add-log-entry :severity 2 :host-uuid
>>>>> \"123\"
>>>>> :timestamp (get-universal-time)\n                       :program
>>>>> \"blah\" :text \"test entry\")\n")
>>>>>  6: ((LAMBDA ()))
>>>>>  7: (SWANK::TRACK-PACKAGE #<CLOSURE (LAMBDA #) {10037C1BF9}>)
>>>>>  8: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME REPL evaluation
>>>>> request." #<CLOSURE (LAMBDA #) {1003636499}>)
>>>>>  9: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA #)
>>>>> {1003636469}>)
>>>>> 10: (SWANK::REPL-EVAL "(add-log-entry :severity 2 :host-uuid  
>>>>> \"123\"
>>>>> :timestamp (get-universal-time)\n                 :program \"blah\"
>>>>> :text \"test entry\")\n")
>>>>> 11: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:LISTENER-EVAL "(add-log-
>>>>> entry
>>>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n
>>>>>                :program \"blah\" :text \"test entry\")\n"$
>>>>> 12: (SWANK::EVAL-FOR-EMACS (SWANK:LISTENER-EVAL "(add-log-entry
>>>>> :severity 2 :host-uuid \"123\" :timestamp (get-universal-time)\n
>>>>>               :program \"blah\" :text \"test entry\")\n") "CONSO$
>>>>> 13: (SWANK::PROCESS-REQUESTS NIL)
>>>>> 14: ((LAMBDA ()))
>>>>> 15: ((LAMBDA (SWANK-BACKEND::HOOK SWANK-BACKEND::FUN)) #<FUNCTION
>>>>> SWANK:SWANK-DEBUGGER-HOOK> #<CLOSURE (LAMBDA #) {10038AD269}>)
>>>>> 16: (SWANK::CALL-WITH-REDIRECTED-IO #<SWANK::CONNECTION  
>>>>> {10032FB281}>
>>>>> #<CLOSURE (LAMBDA #) {10038AD289}>)
>>>>> 17: (SWANK::CALL-WITH-CONNECTION #<SWANK::CONNECTION {10032FB281}>
>>>>> #<CLOSURE (LAMBDA #) {10038AD269}>)
>>>>> 18: (SWANK::HANDLE-REQUESTS #<SWANK::CONNECTION {10032FB281}> NIL)
>>>>> 19: (SWANK::CALL-WITH-BINDINGS NIL #<CLOSURE (LAMBDA #)
>>>>> {10034F5189}>)
>>>>> 20: ((FLET SB-THREAD::WITH-MUTEX-THUNK))
>>>>> 21: ((FLET #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-MUTEX]477))
>>>>> 22: (SB-THREAD::CALL-WITH-MUTEX ..)
>>>>> 23: ((LAMBDA ()))
>>>>> 24: ("foreign function: call_into_lisp")
>>>>> 25: ("foreign function: new_thread_trampoline")
>>>>>
>>>>>
>>>>> Ian Eslick wrote:
>>>>>> Also try pushing :elephant-without-optimize onto *features*  
>>>>>> prior to
>>>>>> doing a force rebuild of elephant.  This should give you some more
>>>>>> information and rule out any optimization/declaration related  
>>>>>> bugs.
>>>>>>
>>>>>> Thank you,
>>>>>> Ian
>>>>>>
>>>>>>
>>>>>> On Jan 7, 2009, at 9:10 PM, Kevin Raison wrote:
>>>>>>
>>>>>>> I was finally able to recreate this in the repl (as opposed to
>>>>>>> seeing it
>>>>>>> in my error logs), so here is a trace:
>>>>>>>
>>>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object
>>>>>>> #<BDB-INDEXED-BTREE oid:2>.
>>>>>>>  [Condition of type UNBOUND-SLOT]
>>>>>>>
>>>>>>> Backtrace:
>>>>>>> 0: ((SB-PCL::FAST-METHOD SLOT-UNBOUND (T T T)) #<unavailable
>>>>>>> argument> #<unavailable argument> #<unavailable argument>
>>>>>>> #<BDB-INDEXED-BTREE oid:2> DB-BDB::INDICES-CACHE)
>>>>>>> 1: (SB-PCL::SLOT-UNBOUND-INTERNAL #<BDB-INDEXED-BTREE oid:2> 2)
>>>>>>> 2: ((SB-PCL::FAST-METHOD (SETF GET-VALUE) (T T
>>>>>>> DB-BDB::BDB-INDEXED-BTREE)) #<unavailable lambda list>)
>>>>>>> 3: ((LAMBDA (WORD)) "34766")
>>>>>>> 4: (SB-IMPL::MAP1 #<CLOSURE (LAMBDA #) {1003CA4ED9}> (("" "asa"
>>>>>>> "106007" "deny" "inbound" "udp" ...)) :LIST T)
>>>>>>> 5: (MAPCAR #<CLOSURE (LAMBDA #) {1003CA4ED9}> ("" "asa" "106007"
>>>>>>> "deny" "inbound" "udp" ...))[:EXTERNAL]
>>>>>>> 6: ((SB-PCL::FAST-METHOD INDEX-LOG-ENTRY (LOG-ENTRY))  
>>>>>>> #<unavailable
>>>>>>> argument> #<unavailable argument> #<LOG-ENTRY oid:6373>)
>>>>>>> 7: ((LAMBDA ()))
>>>>>>> 8: ((SB-PCL::FAST-METHOD ELEPHANT::EXECUTE-TRANSACTION
>>>>>>> (DB-BDB::BDB-STORE-CONTROLLER T)) #<unavailable argument>
>>>>>>> #<unavailable
>>>>>>> argument> #<unavailable argument> #<unavailable argument>)
>>>>>>> [:EXTERNAL]
>>>>>>> 9: (ADD-LOG-ENTRY ..)
>>>>>>> 10: (SB-INT:SIMPLE-EVAL-IN-LEXENV ..)
>>>>>>>
>>>>>>> And here is the code that sometimes causes the issue:
>>>>>>> (defmethod index-log-entry ((log-entry log-entry))
>>>>>>> (let ((*store-controller* *syslog-controller*))
>>>>>>>   (let ((btree (get-from-root "log-entry-index" :sc
>>>>>>> *syslog-controller*)))
>>>>>>>     (map-words #'(lambda (word)
>>>>>>>                    (unless (or (member word *superfluous-
>>>>>>> words* :test
>>>>>>> #'string-equal)
>>>>>>>                                (< (length word) 2))
>>>>>>>                      (unless (existsp word btree)
>>>>>>>                        (setf (get-value word btree) (make- 
>>>>>>> pset :sc
>>>>>>> *syslog-controller*)))
>>>>>>>                      (insert-item log-entry
>>>>>>>                                   (get-value word btree))))
>>>>>>>                (text log-entry)))))
>>>>>>>
>>>>>>>
>>>>>>> If I drop and recreate the btree, everything runs smoothly (no
>>>>>>> errors)
>>>>>>> for a few hours and then the above error message starts showing  
>>>>>>> up
>>>>>>> about
>>>>>>> half to three quarters of the time.
>>>>>>>
>>>>>>> Thanks again,
>>>>>>> Kevin
>>>>>>>
>>>>>>>
>>>>>>> Kevin Raison wrote:
>>>>>>>> I am seeing an intermittent error with 1.0 alpha when trying to
>>>>>>>> write to
>>>>>>>> an indexed btree (using BerkeleyDB 4.7 as provided by Ubuntu's
>>>>>>>> package
>>>>>>>> repositories):
>>>>>>>>
>>>>>>>> The slot DB-BDB::INDICES-CACHE is unbound in the object
>>>>>>>> #<BDB-INDEXED-BTREE oid:2>
>>>>>>>>
>>>>>>>> Within the same thread, sometimes this happens and sometimes I  
>>>>>>>> am
>>>>>>>> able
>>>>>>>> to read and write to the btree.  I am using sbcl 1.0.24 on 32  
>>>>>>>> bit
>>>>>>>> intel
>>>>>>>> linux.  Is there something obvious that might help alleviate  
>>>>>>>> this
>>>>>>>> or
>>>>>>>> should I provide more context?
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Kevin
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> elephant-devel site list
>>>>>>>> elephant-devel at common-lisp.net
>>>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>>>>> _______________________________________________
>>>>>>> elephant-devel site list
>>>>>>> elephant-devel at common-lisp.net
>>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>>>> _______________________________________________
>>>>>> elephant-devel site list
>>>>>> elephant-devel at common-lisp.net
>>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>>> _______________________________________________
>>>>> elephant-devel site list
>>>>> elephant-devel at common-lisp.net
>>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>>> _______________________________________________
>>>> elephant-devel site list
>>>> elephant-devel at common-lisp.net
>>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>> _______________________________________________
>>> elephant-devel site list
>>> elephant-devel at common-lisp.net
>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
> 
> 
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list