[vial-devel] Re: Flexichain and Cursorchain

Tue Oct 24 21:21:25 UTC 2006

On 10/24/06, Brad Beveridge <brad.beveridge at gmail.com> wrote:
> On 23/10/06, Brad Beveridge <brad.beveridge at gmail.com> wrote:
> > On 23/10/06, Brad Beveridge <brad.beveridge at gmail.com> wrote:
> > > If it's not nailed down, steal it.  That's my motto.  Vial originally
> > > started life as a fun way to learn Common Lisp, which meant doing some
> > > stuff myself that maybe I shouldn't have.
> > > Particularly the buffer stuff.
> > > It might be worthwhile moving to Flexichain
> > > (http://common-lisp.net/project/flexichain/), which is the underlying
> > > implementation that Climacs uses for buffer management.
> > >
> > > I'm going to look at it harder tonight, and if I do decide to change
> > > then it will probably mean big changes for Vial's internal code - but
> > > probably changes that clean the code up quite a bit.
> > > If we do move to flexichain, the first step will be to abstract out
> > > all the code that currently plays with the internals of buffer or
> > > cursor objects, then re-implement those functions using Flexichain.
> > >
> > > Thoughts?
> > >
> > > Cheers
> > > Brad
> > >
> > I've looked at Flexichain tonight.  It is a good idea, and worth
> > moving to.  However I want to get something done, and changing out the
> > buffer, cursor and undo system is a good chunk of work.  So
> > Flexichains are not going to happen anytime soon.  However, we will
> > start tightening up abstraction.  Right now lots of modules directly
> > use the internal bits of various classes, that is going to stop.  The
> > next few changes I make will probably be with cleaning up intermodule
> > interfaces.
> >
> > Cheers
> > Brad
> >
> Flexichain is a buffer API used by Climacs and refactored into a
> library.  Flexichain provides CLOS classes and generic methods that
> abstract a sequence that has insert/delete/get functions.
> Cursorchain provides the concept of a point or mark into a Flexichain,
> with the semantics that you may insert or delete at the cursor.  It
> also moves cursors so that they maintain their position when text is
> inserted or deleted.
> Flexi* also provides an implementation for the protocol in the form of
> a circular gap buffer.  It is obvious that we could wrap our own List
> of lines method with Flexi* protocols, so I will just discuss the
> advantages/disadvantages of a gap buffer implementation and a list of
> lines implementation.
> Note - flexichains only provides for accessing single elements in an
> abstract way, but if you wish to write implementation specific code
> you can make performance improvements.
>
> The main slow operations in a gap buffer are
> 1) Moving the gap
> 2) Moving by lines as the protocol does not know about lines.
>
> I timed 1 & 2 on Clisp on a 2Ghz Pentium-M: a middle of the road CPU
> and the slowest Lisp.  I didn't actually use Flexichains, but did the
> kind of operations it would need to do.
> Moving the gap for a 1Mb buffer takes ~0.3 of a second.  Scanning a
> 1Mb buffer for the whole length of the buffer takes ~0.5 of a second.
> This I think is absolute worst case behaviour, on a compiled Lisp, on
> a decent machine, on a reglar sized file (even on a big 300K file)
> these operations will take no time at all.
> I have no idea how a LineList implementation would stack up, but I
> suspect it would be slightly worse for scanning for chars.
> LL = List of lines
> FC = Flexichain
>
> Random Access:
>  * FC lets you treat the whole buffer as an array, providing O(1) access.
>  * LL is worst case O(n), though average case can be better. (N = num
> lines in buffer)
>
> Sequential Access:
>  * Both are O(1).
>  * LL is O(n) for the first access (N = num lines in buffer)
>
> Moving by Line:
>  * LL is O(1)
>  * FC is O(n) (N = chars in line)
>    * Note, here N is small (< 100) and moving by char is VERY fast
>
> Moving to specific line:
>  * LL is O(n)   (N = line to move to)
>  * FC is O(n*m) (N = line to move to, M = average chars per line)
>    * Note, here M is small (< 100) and moving by char is VERY fast
>
> Converting buffer to a string (read only access):
>  * FC is O((n*m)/2), it needs to simply move the gap to the end of the
> buffer, a very fast and non-consing operation.  (N = line to move to,
> M = average chars per line)
>  * LL is O(n), but must make an entire copy of the buffer, consing
> horribly.  (N = number of lines in buffer)
>  ** FC is the CLEAR winner here **
>
> Converting part of a buffer to a read only string:
>  * Both are O(n), where N is chars in the destination string.
>  * FC moves the gap and won't need to cons.
>    * It may be better to cons up a new string rather than move the gap.
>    * You will only need to move the gap to just before or after the
> sequence you are taking out.
>    * The gap will need to move back when you continue editing.
>  * LL will need to cons for any string that spans lines.
>
> Converting part of a buffer to a new string:
>  * Both about O(n) + book keeping.  N is chars in the destination
> string.  Consing the new string is the biggest issue.
>
> Sequential inserting into buffer:
>  * Both are O(1), plus book keeping (move the gap, or find the line first)
>
> Marks:
>  * A clever mark implementation for LL will have no cost for marks
> when doing ops.
>  * FC will be O(n) every time the gap buffer is moved.  (N = number of
> marks to move)
>
> Use cases:
>  All in all, both methods are similar orders of magnitude in most
> operations, you could pick either and be satisifed with performance
> for most operations.
> The only clear difference in operations is converting to read only
> strings, and only then when the read-only string involves large
> sections of the buffer.
> So, at what times would we like to deal with large strings?
> Saving:
>  * LL, needs to loop over lines and use WRITE-LINE
>  * FC, move the gap and save the array in one call
>
> Searching the whole buffer:
>  * LL, needs to loop over lines
>  * FC, move the gap, one call to CL-PPCRE
>  * FC will be much easier for multiline search strings
>  * On text changing (active highlighting):
>   * much the same for both, need to rescan the text as a string
>   * LL, always keeps the line as a plain string, just pass it to CL-PPCRE
>   * FC, need to reconstruct the line: move gap to end of line, pass to
> CL-PPCRE, move gap back
>   * Neither matters that much, it is happening at human speed - so
> just whatever works.
>
> Coping from one buffer/register to another:
>  * FC, move the gap and do a single call to string copy
>  * LL, loop over lines, copying to new buffer
>  * FC is easier for partial lines - it has no concept of lines.
>  * LL needs to do fiddly work at the start/end of a region to copy
> partial lines.
>
> Partial lines:
>  * FC has no concept of lines, so there is no special handling of
> "lines" you just copy chars around.
>  * LL has "lines" ingrained in it.  Any time you want to access text,
> you need to
>   * consider you may only want to deal with a partial line
>   * account for moving over line chars and do something special
>
> Conclusion:
> I think that using FC makes for a simpler system, that may have
> performance gains and will certainly not have large performance
> losses.
> I now believe that switching to Flexichains is a prudent move.
> Unfortuantly, it will mean a re-write of perhaps 70% of Vial, but that
> 70% should shrink significantly.  About the only thing that can stay
> is the command handling and ncurses input layer.  I will write up a
> plan on how to move to FC soon.
>
> Thoughts?
>
> Cheers
> Brad
> _______________________________________________
> vial-devel mailing list
> vial-devel at common-lisp.net
> http://common-lisp.net/cgi-bin/mailman/listinfo/vial-devel
>

I tried out climacs a little, It's not pretty flash, but It could be
because of clim too, don't know... emacs seems a lot faster, at a
first view.
Anyway, i read that emacs uses one big gap buffer to represent the
entire text buffer, and gnu emacs performance seems enough (when it
comes only to text editing, without considering the long
initialization it needs (and they say that it's something wrong with
debian setups, I recently updated and now emacs has a really fast
startup, like a fraction of second, and not the old 8 seconds...
lol)).

So let's go for the simpler system!