[elephant-devel] Schema evolution

lists at infoway.net lists at infoway.net
Thu Oct 25 11:58:34 UTC 2007


I kind of agree with Robert. It has taken me some time to realize  
that Elephant is not a DBMS. As such, and as documented in the  
manual, if someone changes the schema, s/he would be responsible for  
writing such a function to walk down the entire DB and refresh the data.

If the other solutions are feasible, great. However, I would think  
that solving the sorting issue, from the other thread, would be of  
higher priority :)

On Oct 24, 2007, at 11:16 AM, Robert L. Read wrote:

> This is a very complex subject.
>
> In the greatest generality, one needs a function to go from one schema
> to the next; for example, if you change the type or encoding of a  
> slot,
> one must provide a translation function for the slot.
>
> I personally, in the style in which I am working, would be most
> comfortable with a function that I could invoke manually to walk the
> entire DB, updating where necessary.
>
> The other solutions, although potentially more elegant, seem like a  
> lot
> more work.
>
>
>
> On Mon, 2007-10-22 at 15:48 -0400, Ian Eslick wrote:
>> Another detail to iron down is the implications of change-class and
>> redefining a class via defclass.
>>
>> change-class is pretty easy as it is an explicit call by the user to
>> change a given instance.  I added a warning mechanism that signals if
>> you are going to delete data from a store by dropping a persist slot
>> from the instance.  This is immediate.
>>
>> Redefining a class via defclass, thus initiating calls to change-
>> instance-for-redefined-class is harder because it is lazy in some (or
>> all) lisps.  When a defclass causes a change in a standard class
>> schema, the instances of that class are updated at latest when an
>> object slot is next accessed.  update-instance-for-redefined class
>> can be overloaded by the user for any given class.
>>
>> In standard lisp, there is a problem that if you redefine the class
>> twice and haven't touched the object in the meantime, you will have a
>> different transformed state for each object and only some of them
>> will have had change-instance-for-redefined-class called on them.  At
>> least this is empirically true under Allegro.
>>
>> However, if you do this sort of things with persistent slots, then
>> you have storage leaks in your DB due to slot values not being
>> reclaimed on an intermediary change.
>>
>> i.e.
>>
>> (defclass test () (slot1 slot2))
>> (make-instance 'test :slot1 1 :slot2 2)
>> (defclass test () (slot1 (slot3 :initform 10))
>> (defclass test () (slot1 slot4))
>>
>> An instance of this class with values in slot1 and slot2 that is
>> loaded after the second definition will cause the value of slot2 to
>> be lost.  Slot3 will never have been written and slot4 will be empty.
>>
>>
>> It gets worse.  If you disconnect from your db without touching all
>> the objects in it, then when you restart the system won't remember to
>> change any instances of the redefined class when they are loaded, so
>> you'll have objects with the old definition; any initforms for new
>> class slots won't have been called (will be unbound) and the storage
>> associated with any dropped slots will be retained but inaccessible.
>>
>> So we can do a couple of things about this:
>> 1) The "lisp way" here is to allow the users to shoot themselves in
>> the foot by giving them the power to control this process via
>> explicit touching of objects to properly update after a class change
>>
>> 2) Automatically walk INDEXED classes only, updating instances by
>> pulling them into memory
>>
>> 3) Provide a function they can call, make-persistent-instances-
>> obsolete, which invokes the update behavior on INDEXED classes only.
>>
>> 4) Do a deep walk the entire DB to update classes (either
>> automatically or via a function)
>>
>> Automatic behaviors can be put into defpclass or made available as
>> functions.  Walking the entire DB can be VERY expensive, but I think
>> it could be done in an online fashion as any instances read by other
>> threads will automatically be updated in parallel.  We would have to
>> catch any new changes to the class and inhibit them until the prior
>> update was complete.  A similar strategy would work for indexed
>> classes, but be much more efficient since all instances would be
>> directly accessible via the class index.
>>
>>
>> A persistently lazy method would be messier, but perhaps a better all
>> around solution.  In this case, for any persistent objects that are
>> redefined causing slots to be added or deleted, we store a schema-
>> change record in the DB and maintain a schema ID for each instance.
>> Then, when we pull a persistent instance out of the db, we can walk
>> the list of prior changes between its version and the most current
>> version and properly update it.
>>
>> There are still some problems with this.  If we update a class and
>> are not connected to a DB, then the schema change will not be
>> recorded.  Multiple stores containing instances of the same class
>> will not necessarily be synchronized.
>>
>>
>> I don't see a good way, other than the #1 above.  We inform the
>> users, provide some utility functions and illustrate best practices
>> (one data store per class, always update manually after class redef)
>> to avoid getting shot in the foot.  However, I wanted to throw this
>> out in case people had a better policy idea.
>>
>> Regards,
>> Ian
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> elephant-devel site list
>> elephant-devel at common-lisp.net
>> http://common-lisp.net/mailman/listinfo/elephant-devel
>
> _______________________________________________
> elephant-devel site list
> elephant-devel at common-lisp.net
> http://common-lisp.net/mailman/listinfo/elephant-devel




More information about the elephant-devel mailing list