[elephant-devel] Schema evolution

Ian Eslick eslick at csail.mit.edu
Mon Oct 22 19:48:05 UTC 2007


Another detail to iron down is the implications of change-class and  
redefining a class via defclass.

change-class is pretty easy as it is an explicit call by the user to  
change a given instance.  I added a warning mechanism that signals if  
you are going to delete data from a store by dropping a persist slot  
from the instance.  This is immediate.

Redefining a class via defclass, thus initiating calls to change- 
instance-for-redefined-class is harder because it is lazy in some (or  
all) lisps.  When a defclass causes a change in a standard class  
schema, the instances of that class are updated at latest when an  
object slot is next accessed.  update-instance-for-redefined class  
can be overloaded by the user for any given class.

In standard lisp, there is a problem that if you redefine the class  
twice and haven't touched the object in the meantime, you will have a  
different transformed state for each object and only some of them  
will have had change-instance-for-redefined-class called on them.  At  
least this is empirically true under Allegro.

However, if you do this sort of things with persistent slots, then  
you have storage leaks in your DB due to slot values not being  
reclaimed on an intermediary change.

i.e.

(defclass test () (slot1 slot2))
(make-instance 'test :slot1 1 :slot2 2)
(defclass test () (slot1 (slot3 :initform 10))
(defclass test () (slot1 slot4))

An instance of this class with values in slot1 and slot2 that is  
loaded after the second definition will cause the value of slot2 to  
be lost.  Slot3 will never have been written and slot4 will be empty.


It gets worse.  If you disconnect from your db without touching all  
the objects in it, then when you restart the system won't remember to  
change any instances of the redefined class when they are loaded, so  
you'll have objects with the old definition; any initforms for new  
class slots won't have been called (will be unbound) and the storage  
associated with any dropped slots will be retained but inaccessible.

So we can do a couple of things about this:
1) The "lisp way" here is to allow the users to shoot themselves in  
the foot by giving them the power to control this process via  
explicit touching of objects to properly update after a class change

2) Automatically walk INDEXED classes only, updating instances by  
pulling them into memory

3) Provide a function they can call, make-persistent-instances- 
obsolete, which invokes the update behavior on INDEXED classes only.

4) Do a deep walk the entire DB to update classes (either  
automatically or via a function)

Automatic behaviors can be put into defpclass or made available as  
functions.  Walking the entire DB can be VERY expensive, but I think  
it could be done in an online fashion as any instances read by other  
threads will automatically be updated in parallel.  We would have to  
catch any new changes to the class and inhibit them until the prior  
update was complete.  A similar strategy would work for indexed  
classes, but be much more efficient since all instances would be  
directly accessible via the class index.


A persistently lazy method would be messier, but perhaps a better all  
around solution.  In this case, for any persistent objects that are  
redefined causing slots to be added or deleted, we store a schema- 
change record in the DB and maintain a schema ID for each instance.   
Then, when we pull a persistent instance out of the db, we can walk  
the list of prior changes between its version and the most current  
version and properly update it.

There are still some problems with this.  If we update a class and  
are not connected to a DB, then the schema change will not be  
recorded.  Multiple stores containing instances of the same class  
will not necessarily be synchronized.


I don't see a good way, other than the #1 above.  We inform the  
users, provide some utility functions and illustrate best practices  
(one data store per class, always update manually after class redef)  
to avoid getting shot in the foot.  However, I wanted to throw this  
out in case people had a better policy idea.

Regards,
Ian











More information about the elephant-devel mailing list