[elephant-devel] Schema evolution

Thu Oct 25 20:52:50 UTC 2007

So there are a number of cases where you might want to know you are  
being shot in the foot.

1) You re-evaluate a defclass which drops a persistent slot.
    a) the class is indexed, so the system checks for instances - if  
there are none, it succeeds silently otherwise do (b)
    b) the class is not indexed, the system signals the user to  
invoke a restart:
       i) drop the data, leak the space taken by the dropped data
       ii) walk entire db with default MOP behavior (including any  
user implemented overloading
           of the mop reinitialize-instance-for-redefined-class, etc.
       iii) invoke a user function to visit objects and apply MOP  
behavior (prune the space of objects
visited)

    There are user parameters to effect this default behavior and  
inhibiting macros that suppress the signals if redefinition is  
happening inside code.  Typically the default would be to ask the  
user if they're sure when doing a redefinition that is dropping slots  
or adding slots.

2) You call change-class
    a) there is no class-specific MOP update function (reinitialize- 
instance-for-changed-class)
    b) there is an update fn
       signal and provide restarts as in (1b) above

If there is no disagreement, I can document this on Trac.

Ian

On Oct 25, 2007, at 7:58 AM, lists at infoway.net wrote:

> I kind of agree with Robert. It has taken me some time to realize  
> that Elephant is not a DBMS. As such, and as documented in the  
> manual, if someone changes the schema, s/he would be responsible  
> for writing such a function to walk down the entire DB and refresh  
> the data.
>
> If the other solutions are feasible, great. However, I would think  
> that solving the sorting issue, from the other thread, would be of  
> higher priority :)
>
> On Oct 24, 2007, at 11:16 AM, Robert L. Read wrote:
>
>> This is a very complex subject.
>>
>> In the greatest generality, one needs a function to go from one  
>> schema
>> to the next; for example, if you change the type or encoding of a  
>> slot,
>> one must provide a translation function for the slot.
>>
>> I personally, in the style in which I am working, would be most
>> comfortable with a function that I could invoke manually to walk the
>> entire DB, updating where necessary.
>>
>> The other solutions, although potentially more elegant, seem like  
>> a lot
>> more work.
>>
>>
>>
>> On Mon, 2007-10-22 at 15:48 -0400, Ian Eslick wrote:
>>> Another detail to iron down is the implications of change-class and
>>> redefining a class via defclass.
>>>
>>> change-class is pretty easy as it is an explicit call by the user to
>>> change a given instance.  I added a warning mechanism that  
>>> signals if
>>> you are going to delete data from a store by dropping a persist slot
>>> from the instance.  This is immediate.
>>>
>>> Redefining a class via defclass, thus initiating calls to change-
>>> instance-for-redefined-class is harder because it is lazy in some  
>>> (or
>>> all) lisps.  When a defclass causes a change in a standard class
>>> schema, the instances of that class are updated at latest when an
>>> object slot is next accessed.  update-instance-for-redefined class
>>> can be overloaded by the user for any given class.
>>>
>>> In standard lisp, there is a problem that if you redefine the class
>>> twice and haven't touched the object in the meantime, you will  
>>> have a
>>> different transformed state for each object and only some of them
>>> will have had change-instance-for-redefined-class called on  
>>> them.  At
>>> least this is empirically true under Allegro.
>>>
>>> However, if you do this sort of things with persistent slots, then
>>> you have storage leaks in your DB due to slot values not being
>>> reclaimed on an intermediary change.
>>>
>>> i.e.
>>>
>>> (defclass test () (slot1 slot2))
>>> (make-instance 'test :slot1 1 :slot2 2)
>>> (defclass test () (slot1 (slot3 :initform 10))
>>> (defclass test () (slot1 slot4))
>>>
>>> An instance of this class with values in slot1 and slot2 that is
>>> loaded after the second definition will cause the value of slot2 to
>>> be lost.  Slot3 will never have been written and slot4 will be  
>>> empty.
>>>
>>>
>>> It gets worse.  If you disconnect from your db without touching all
>>> the objects in it, then when you restart the system won't  
>>> remember to
>>> change any instances of the redefined class when they are loaded, so
>>> you'll have objects with the old definition; any initforms for new
>>> class slots won't have been called (will be unbound) and the storage
>>> associated with any dropped slots will be retained but inaccessible.
>>>
>>> So we can do a couple of things about this:
>>> 1) The "lisp way" here is to allow the users to shoot themselves in
>>> the foot by giving them the power to control this process via
>>> explicit touching of objects to properly update after a class change
>>>
>>> 2) Automatically walk INDEXED classes only, updating instances by
>>> pulling them into memory
>>>
>>> 3) Provide a function they can call, make-persistent-instances-
>>> obsolete, which invokes the update behavior on INDEXED classes only.
>>>
>>> 4) Do a deep walk the entire DB to update classes (either
>>> automatically or via a function)
>>>
>>> Automatic behaviors can be put into defpclass or made available as
>>> functions.  Walking the entire DB can be VERY expensive, but I think
>>> it could be done in an online fashion as any instances read by other
>>> threads will automatically be updated in parallel.  We would have to
>>> catch any new changes to the class and inhibit them until the prior
>>> update was complete.  A similar strategy would work for indexed
>>> classes, but be much more efficient since all instances would be
>>> directly accessible via the class index.
>>>
>>>
>>> A persistently lazy method would be messier, but perhaps a better  
>>> all
>>> around solution.  In this case, for any persistent objects that are
>>> redefined causing slots to be added or deleted, we store a schema-
>>> change record in the DB and maintain a schema ID for each instance.
>>> Then, when we pull a persistent instance out of the db, we can walk
>>> the list of prior changes between its version and the most current
>>> version and properly update it.
>>>
>>> There are still some problems with this.  If we update a class and
>>> are not connected to a DB, then the schema change will not be
>>> recorded.  Multiple stores containing instances of the same class
>>> will not necessarily be synchronized.
>>>
>>>
>>> I don't see a good way, other than the #1 above.  We inform the
>>> users, provide some utility functions and illustrate best practices
>>> (one data store per class, always update manually after class redef)
>>> to avoid getting shot in the foot.  However, I wanted to throw this
>>> out in case people had a better policy idea.
>>>
>>> Regards,
>>> Ian
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> elephant-devel site list
>>> elephant-devel at common-lisp.net
>>> http://common-lisp.net/mailman/listinfo/elephant-devel
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel