[cells-devel] Handling not-to-be'd kids and how to do an input slot for kids

Sat Apr 12 22:16:23 UTC 2008

Peter Hildebrandt wrote:

[A great article! <g>]

> Ken,
> 
> thanks a lot for all the insight.  First things first, using c?n for
> the kids works like a charm.  setf on a c? cell still produces an
> error, suggesting to initialize the cell with c-in. 

OK, I thought that was only in debug mode, but I did not actually check.

> Anyway, that's
> settled now.
> 
> As to the not-to-be issue,...

Ohhhh, it's in the not-to-be. Understood.

Meanwhile over here working on OpenAIR I ended up trying to read dead 
instances. Have not figured out why, but I have a guess. Still, this 
comes up so often I think we might want to make this less painful for 
developers.

>...  everything can be reproduced using
> test-gtk.  I can share that in the cells cvs, just let me know whether
> it would be ok to restructure the cvs to match the old cells-gtk
> directory structure (I'd rather keep it the way it is in the old
> cells-gtk, so that I can commit it in there one day).

Sure, go for it.

> 
> I did some narrowing it down and going through the back trace. So let
> us look at an example:
> 
> root
>   node 1     ---- observer 1
>     node 1.1 ---- observer 1.1
>   node 2     ---- observer 2
> 
> The basic code for the observer is
> 
> (defmodel family-observer (family)
>   ;; we'll use the "value" slot for the observed
>   ((row :reader row :initarg :row)
>   (:default-initargs
>       :kids (kids-list?
>                   (bwhen (val (^value))
>                       (mapcar #'(lambda (src) (mk-observer self src))
> (kids val))))))))
>       :row
>        (c? (when-bind* ((parent (upper self)) (pos (position self
> (kids parent))))
>          (let ((new-row (tree-row-create (row parent) (id parent))))
>            (when (tree-row-valid new-row)
>               (tree-row-set-path new-row (row parent) pos)
>                  new-row)))))
> 
> Where mk-observer is a method specializing on both parameters, so that
> we can have different kinds of observers on the same type of targets.

May I just observe at this juncture that I wish you had chosen a name 
other than observer for this class given that Cells already has the 
observer name in play. :)

> 
> The row is some gui object to be kept in sync.
> 
> Now we remove node1:
> 
> (with-integrity (:change 'tv-del-node)
>   (setf (kids (upper node1)) (remove node1 (kids (upper node1)))))
> 
> Then first node2 and observer2 die, ok.  Then node1 dies, and so does
> observer1.
> 
> The interesting part:
> 
> not-to-be :before on observer 1.1 is called -- and at this point
> observer 1.1 itself is already :eternal-rest, in other words,
> not-to-be is called on a dead object.  Now not-to-be of the observer
> wishes to do something, so it accesses a (ruled) slot of the passed
> object:
> 
> (defmethod not-to-be :before ((self cells-tree-node))
>    (tree-row-destroy (row self))
> 
> The call to the accessor (row self) with the dead self triggers a
> bunch of cells calls:
> 
> Backtrace:
>   0: (CELLS::ENSURE-VALUE-IS-CURRENT NIL #<unused argument> #<unused argument>)
>   1: ((LABELS CELLS::CHECK-REVERSED) (NIL))
>   2: (CELLS::ENSURE-VALUE-IS-CURRENT (NIL .
> <vld>)=492/OPTIMIZED-AWAY/ROW/DEAD!NODE-TREE-NODE3440] #<unused
> argument> #<unused argument>)
>   3: ((LAMBDA (CELLS::OPCODE CELLS::DEFER-INFO)) #<unused argument>
> #<unused argument>)
>   4: (CELLS::CALL-WITH-INTEGRITY NIL NIL #<CLOSURE (LAMBDA
> (CELLS::OPCODE CELLS::DEFER-INFO)) {BBBD3DD}>)
>   5: (CELLS::CELL-READ (NIL .
> <vld>)=492/OPTIMIZED-AWAY/ROW/DEAD!NODE-TREE-NODE3440])
>   6: (CELLS::MD-SLOT-VALUE DEAD!NODE-TREE-NODE3440 CELLS-GTK::ROW)
> 
> The last call is in
> 
> (defun ensure-value-is-current (c debug-id ensurer)
>    (declare (ignorable debug-id ensurer))
>   (count-it :ensure-value-is-current)
>    (when (and (not (symbolp (c-model c)))(eq :eternal-rest (md-state
> (c-model c))))
>     (break "model ~a of cell ~a is dead" (c-model c) c))
> 
> .... in particular the form:
> 
> (c-model c)
> 
> which breaks with:
> 
>  The value NIL is not of type CELL.
>    [Condition of type TYPE-ERROR]
> 
> To sum up, I believe the problem is that at a change of the kids list
> - First the kids are declared dead
> - Then not-to-be is called recursively
> and thus not-to-be is passed a dead self.

Yeah, I ran into this once in an observer.

> 
> However, I wish to do some cleanup work when kids are kicked out, and
> for this I need to access a few slots.

Yep.

> 
> What I'd like to have is
> - not-to-be being called before the object is declared dead
> or
> - another method (last-will?) to be executed right before the kids die
> or
> - an interims state (:zombie?) in which cell slots are still
> accessible with their last cached value
> 
> Or is the solution to have an observer on the kids slot instead of a
> not-to-be-method, which does the cleanup work for the kids?

I used to do that with kids, but not-to-be is a good place for it.

I might have just made a mistake declaring instances dead before 
not-to-be. And I am not even sure why I am so unpleasant about allowing 
access to slot-values of dead stuff. More on this below (you guessed right).

> 
> 
>> I was thinking while doing the dishes. I can image the observer class being
>>interesting in its own right, and slots over there ending up dependent on
> 
>  > the same original kids-list in some way, as well of course as the value of
> 
>>the observer. Propagation would then try to update this slot and when it got
>>to the value of the observer find a dead instance. Any rule that got to the
> 
>  > value of the observer by accessing the list of observers (say something
> 
>>iterating over them) would not encounter such an observer/value, but rules
>>lower down that get at the value directly will.
> 
> 
> This is an interesting idea, but I doubt that this is my problem here.
>  It really seems to be the relation between ruled kids slots,
> declaring cells dead, and not-to-be.

Yeah, I did not know it was a not-to-be.

> 
>  >  Now normally this is not a problem because such lower down rules would tend
> 
>>not to depend also on the original list, and indeed the reason I have left
>>this unaddressed is that in most cases I have seen a simple way to rewrite
> 
>  > my rules that was even better and which did not end up with these widespread
> 
>>dependencies (if you have followed me so far on that, and if I hasten to add
>>all this guesswork is on the money). I like to hold out for Real Problems
> 
>  > before whacking away at the code, I think that is a slippery slope.
> 
> That is surely true, as this case proves.  I feel what we're looking
> for here is something like family-finalizers which are specified to be
> called everytime a kid dies.
> 
> 
>> btw, all that stuff in their that worries about dead instances is
>>preemptive safeguard stuff -- I think if you disable that most things will
> 
>  > just work. The rules that are failing now will run harmlessly and in a few
> 
>>cycles everything gets cleaned up anyway. Cells ran for /years/ with this
>>happening to no ill effect (until RoboCup, of all things).
> 
> 
> Yep, which is why it broke after introducing cells3 :-)
> 
> OTOH, I see why it is good to have these safe guards.

When this all happened I was days away from having to demo RoboCells at 
an ILC and I just did what I had to to deal with the many attempts to 
get to dead instances under Cells2. I think some consequent paranoia 
carried over to this code.

In 2001, Hal when challenged on his determination of a fault in some 
device suggests putting it back in and allowing it to fail. I suggest we 
do the same, but not use the opportunity to send an astronaut drifting 
off into space to eventually suffocate.

>  I hacked a
> solution today to the fm-other tree searches which were all over
> cells-gtk -- now we have with-widget and with-widget-value which do
> the right thing without kicking off tree searches (I introduced an
> automatically maintained hashtable of active instances hashing by
> md-name, like I did in cells-ode).

haha, almost every day now I think about implementing namespace search 
that way. :) It is kind of cool to have this automatic location-relative 
search that Just Works when we have repeating structures and lots of 
widgets with the same name, but so often a simple hash lookup would suffice.

Part of the magic of the existing scheme, btw, is that it does not 
matter that some widget might get awakened before the other widget it 
seeks has even been created. The navigation works by hitting kids slots, 
which get run JIT to spawn the thing being looked for.

When I saw that working without having been planned for I had a feeling 
I had stumbled onto something.

>> If you want to send me your whole project I will look to see how I would
>>rewrite the rules if that is even possible, and if not take a look at
>>solving this formally.
> 
> 
> As I said, I like to try things out in test-gtk first, so that I can
> isolate the error (and create a nice demo on the way).  I will work
> through cvs and commit it tomorrow, I hope.  I just don't want to
> force you to have to deal with my whole project -- and all the other
> issues it has.

Oh, no need for a reproducible. Now that I know it was in not-to-be it 
is an understood issue.

> 
> 
>> fyi, in the past I have done silly things like having Cells just return nil
> 
>  > on slot-value access to dead cells, but we may want to find something more
> 
>>elegant. :)
> 
> 
> I have such code in my project, too:
> 
> (defun deadp (cell)
>   (eql (slot-value cell 'cells::.md-state) :eternal-rest))
> 
> However, since I need the slot-value in my case, this does not help ;-)

Funny you should mention slot-value. :) That's what I used to get at the 
slots of dead instances in my not-to-be situation.

Well... I am hesitating. There /is/ an evil variant of corpse access to 
be blocked as an aid to the developer, a form of access wholly 
unjustifiable. I think we need to yell if these happen. As for deferring 
the death certificate, well, what if your not-to-be wants access to a 
sibling or parent or child who is also being interred? Perhaps they get 
not-to-be'd, declared dead, and now it is your turn and you try to read 
them? We ran but could not hide?

ie, I am not sure then where to put the certification of death, tho it 
occurs to be that the unfinished-business queue (if you have read that 
far) is exactly where this should go, ie we queue instances up for death 
certification to run after full propagation (including not-to-be 
processing) after which point it will be safe to say slot access is an 
absolute bug.

The neat thing is that we do have a distinct "awaken" stage in ufb 
processing, why not a distinct "ex-parrot" stage?

Another very quick fix would be simply to do something like 
with-integrity, which explicitly says "I know this may not run right 
away". We could wrap the not-to-be call in a with-post-mortem macro 
dynamically binding *dead-is-cool* to t and then any access anywhere to 
dead things would be allowed. This would avoid the runtime cost of 
queueing things up for annihilation (but then I do  not think that would 
be a big runtime burden at all).

Anyway, let me implement *dead-is-cool* and have that bound to t before 
calling not-to-be so there is no need to wrap not-to-be code at all and 
see what happens. I'll commit something soon, lemme know if it works. :)

kt