[slime-devel] Re: Partial multiprocessing support on CMUCL

Wed Dec 17 19:18:07 UTC 2003

Hey, aren't you supposed to be writing your book? ;-)

> What do you guys mean by "race condition"? (In the context of SLIME,
> that is--I know what a race condition is in general.)

I guess you've seen my recent description of the push-down automaton
that keeps track of protocol state. This can be seen as a
representation in Emacs of relevant parts of the Lisp stack. For
example, when we push into the EVALUATING state in Emacs we ask Lisp
to call this (abbreviated) function:

  (defslimefun eval-string (string buffer-package)
    (let (ok result)
      (unwind-protect
           (setq result (eval (read-form string)))
           (setq ok t))
        (send-to-emacs (if ok `(:ok ,result) '(:aborted))))))

Before the corresponding Lisp stack frame returns it will send either
(:ok RESULT) or (:aborted) to Emacs. When Emacs receives either
message, it will pop its EVALUATING state off the stack. In this way
the two stacks stay synchronized, and Emacs knows whether it is in the
debugger, or waiting on an RPC result, etc.

The protocol is free of race conditions provided that at any given
time only one of Lisp and Emacs is able to cause a state change (or
otherwise perform a state-dependent operation). If both are able to,
then they could do them at the same time, and then they would each
push/pop their stacks in a different order and lose synchronization.

That's the sort of race condition we mean. When the stacks go out of
sync, chaos ensues (or would if not caught by assertions).

We've considered three ways to cope with this:

  Ensure that only one of Emacs and Lisp is allowed to talk at a
  time. This worked well in the beginning, but it's now breaking down.

  Remove enough state from Emacs so that races can be tollerated. This
  is an appealing ideal, but no specifics have been discussed.

  Add a mechanism to the protocol to detect out-of-order events and
  resolve them in some deterministic way. This has been the subject of
  recent mails.

The thing that seems hard about removing state from Emacs is that
some operations are state-dependent. For example:

  If Lisp is "busy" evaluating an RPC, we won't bother with things
  like fetching arglists. That would just create a backlog of requests
  that probably aren't interesting by the time they're done. (Though
  Helmut's idea of sending TCP-OOB requests to be served by a signal
  handler sounds like fun :-)

  Our debugger wants to know if Lisp is sitting in the debugger
  loop. If it has started doing something else then the backtrace we
  present in our debug buffer is wrong.

Possibly these could be solved in some simple and clever way.

I'm not sure right now whether all of our race conditions span short
time frames (as in network latency). There aren't any documented cases
of them occuring in the wild as far as I know. Still, we must have a
correct protocol (non-robust ones are well known to piss people off
royally), and already people are running SLIME with Emacs and Lisp on
separate machines so latency isn't necessarily on the order of a
millisecond.

-Luke