[Ecls-list] Revisiting locks and signals

Fri Oct 29 21:56:33 UTC 2010

Mathew, let me try to explain again what is the problem and a possible
operation model.

First of all a reminder of how I name things (which may not be compatible
with usual standards :-). Compiled programs may receive "interrupts" or
"signals" which may be originated by the operating system or by other
situations. I see only these scenarios for interrupts:

1* Inter-process communication.
2* Inter-thread communication
3* Serious computation errors: SIGSEGV, SIGBUS
4* Not so serious errors: SIGFPE
5* Interruption of code as with Ctrl-C. I see these reasons
  6- Interrupt rroneous code with infinite loops
  7- Interrupt deadlocked code
  8- I/O operations that take too long
  9- Interrupt as a way to debug / inspect a thread
(Feel free to add more to the list)

The problematic begins when deciding how the program reacts to these
interrupts. C libraries typically allow two ways of working:

i) the interrupt is delivered at any time, user code is stopped and an
appropriate handler is executed. Since the interrupt may happen almost at
any part of the code, the signal handler can only perform simple tasks that
do not conflict with whatever was being done. In particular many resources
(locks, files, etc) may be left in an inconsistent state during the signal.

ii) the program has a thread that waits for those interrupts. In this case
it is like reading from a file a list of events. Things are safe and ok for
handling, but not all interrupts can be waited for (see 3, 4 or the group 5)

Let us, as an exercise, assume that ECL runs with most interrupts disabled.
In other words, the signal handlers in an ECL thread can only perform
trivial tasks and we have an optional thread implementing what point (ii)
above says.

The first two situations (1,2, typically implemented via INTERRUPT-PROCESS)
can be eliminated or "enforced out". There are better ways to do
inter-process communication than signals and most kind of such signals can
be automatically translated into other communication means (SIGPIPE ->
errno, user signal -> pipe message or socket...)

The situation 3 is serious and should be handled accordingly, for the
affected thread may not continue to execute normally. Possible responses are
a) suspending the thread and opening a new thread with a debugger
b) jumping to an outer point of code (unsafe)
c) killing the thread
Out of these b) and c) are deemed unsafe but we are already in a muddy land
when a SIGSEGV is delivered.

The case 4 can be handled similarly as 3 but we can complement with an
additional option, "d) ignore floating point signals and continue", which is
safe and ok.

Case 8 is ok. Getting an interrupt delivered during I/O operations is safe.
We may enforce I/O operations to abort on receiving an interrupt even
without using signal handlers. Calls to READ or PRINT will recognize that
the I/O operation failed, look at the list of pending interrupts and invoke
the appropriate error handlers.

Case 9 is also simple. One may reserve a signal to indicate thread
suspension. In that case th signal handler is simple and just waits for a
"resume" signal, allowing another thread to inspect its environment and
gather information. This can be done in a POSIX-compatible way if the
debugger does not want to "inject" or execute additional code in the
suspended thread.

Cases 6 and 7 are more complicated. The problems with infinite loops and
deadlocks (infinitely waiting mutexes), is that we would like be able to
break the offending code (as with Ctrl-C) without quitting the lisp image.
This means we need a way to stop a thread, typically forcing it to jump to
an outer point. There are various ways to implement such a SIGINT handler
a* The SIGINT handler always jumps to an outer point in the lisp code.
b* Similar as "a" but only when the function is marked interruptible.
c* Similar as "a" but the thread is paused and in a separate thread a
debugger is started, from which we can decide whether to jump to an outer
point.
d* The SIGINT handler queues the interrupt until it is explicitly checked
for.
Only the last alternative is POSIX-compliant, but it is very costly, because
it forces us to add interrupt checks every now and then, as in GOTOs, and
does not solve the problem of deadlocks.

So it seems it would be possible to execute ECL threads that run with
interrupts mostly disabled. Signal handlers would do very little, and only
in the undesirable situations would they allow jumping to outer parts of the
code or canceling the thread (unwinding any possible operations), but that
would be done placing the burden of possible side-effects on the user.

This would have a couple of positive side effects. One would be that it
would make coding a lot simpler. Most of ECL right now is not
async-signal-safe and it will probably never be. Lisp code also can't be
async-safe. Instead of revisiting all the code, filling it with
ecl_disable_interrupt() calls, which are costly, we would be able to get a
cleaner Lispwhere everything is assumed to run properly, except in weird
situations.

It might also help us in thinking of simpler ways to integrate ECL with
foreign signal handlers, specially when embedding -- since ECL threads do
not expect signals, or only serious ones with a specific protocol (unwind or
exit) it would make embedders' lives easier.

Juanjo

--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain)
http://juanjose.garciaripoll.googlepages.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.common-lisp.net/pipermail/ecl-devel/attachments/20101029/0e9c6d89/attachment.html>