[LITMUS^RT] Supporting Job Aborts

Fri Sep 7 00:34:57 CEST 2012

Hello All,

I am working on adding support to Litmus to allow jobs to be aborted.  A simple use case: Abort a job on budget exhaustion.

Suppose we have a task T with consecutive jobs J_1 and J_2.  Further suppose J_1 exhausts its budget before completing.  Budget exhaustion is handled by Litmus currently in either of two ways: (1) NO_ENFORCEMENT - J_1 is allowed to continue execution.  (2) QUANTUM/PRECISE_ENFORCEMENT - J_1 is preempted and does not resume until its budget is replenished.  In either case, all of the work of J_1 must be completed before the work for J_2 can start.  This stinks in applications where J_1 has no utility if J_1 cannot be completed until after the budget has been replenished.  This also can put pressure on later jobs to complete within their effectively reduced budgets as well.

I would like to have a third option: When J_1 exhausts its budget, J_1 is discarded/terminated and J_2 is ready to run as soon as the task's budget has been replenished.  Let us presume that the OS does not strictly enforce budgets, but it does expect tasks that have been informed of budget exhaustion to complete soon.  (If strict budgets are still needed, we could change the budget exhaustion notifications to an early warning ("You're about to run out of budget!") and then still strictly enforce budgets.)

There are several issues that must be resolved to allow this to work: (1) We need a mechanism to clean up any temporary state of J_1.  (2) We need a way to trigger this cleanup.  (3) We need a way to reset the task's stack & program counter to the start of the job-code so J_2 can begin fresh.

(POLLING) An obvious solution to this problem is to write budget expiration flags to a control page from within Litmus.  J_1 would periodically poll these flags in the control page to see if it should terminate.  Upon recognition of a termination request, J_1 would perform any necessary clean up and then call sleep_next_period() to notify to the kernel that J_1 has "completed" (albeit, before J_1 could complete all of the work it had wished).

Example Pooling Code:
void job()
{
	do_a();
	if(exhausted()) { cleanup_a(); goto finish;}
	do_b();
	if(exhausted()) { cleanup_b(); cleanup_a(); goto finish;}
	do_c();
	if(exhausted()) { cleanup_c(); cleanup_b(), cleanup_a(); goto finish;}
	…
finish:
	sleep_next_period();
}

Pros:
	- Simple mechanisms.
Cons:
	- Polling is dirty.
	- Delays between polls.

In some cases, it might be nice to model budget exhaustion as an exception.  Example:

void job()
{
	try {
		job_body();
	}
	catch (BudgetExhausted& e) {
		// budget exhausted! terminate now!
		// ... clean up ...
	}
	sleep_next_period();
}

Pros:
	- No polling!
	- Exceptions can be re-thrown, so clean-up can be done incrementally as the call stack unwinds.

Cons:
	- The OS has to somehow throw an exception to the real-time task.  This appears to be difficult.

Triggering the exception ugly.  I've explored several ways of implementing this and I'd like to get some opinions from others.  So far, I have not been able to find a single clear solution.  In general, all of my approaches do the following:
	(1) Add a new signal to Linux, SIG_BUDGET.
	(2) Add new budget policies "QUANTUM_SIGNALS" and "PRECISE_SIGNALS".
	(3) Send SIG_BUDGET to a task when it has exhausted its budget (assuming the task has a  *_SIGNALS budget policy).
	(4) Set up a SIG_BUDGET signal handler in liblitmus.  In C, setjmp()/longjmp() can be used to implement the try-catch structure.*  In C++, the signal handler can throw an exception, provided that the signal handler has been compiled with "-fnon-call-exceptions".  (I wonder if the C++ method relies on setjmp()/longjmp() under the hood, but I cannot say for certain.)

Items #2 and #3 are straightforward.  Items #1 and #4 are tricky.

#1: Adding a new signal to Linux
While this seems straightforward, we have some options.  The core signals in Linux range from 0 to 31.  Signals 32 through 63 are "real-time signals" (SIGRT)  User applications can send real-time signals with values [32, 63] amongst each other.  It appears that Linux requires all signals must have a value <= 63.  We need to carve out a space for Litmus signals, SIG_BUDGET specifically.

(Disjoint Range for Litmus Signals)
The range of SIGRT values are parameterized by the constants SIGRTMIN to SIGRTMAX.  libc quires these values at runtime and exposes the loaded values to user applications.  This means that we should be able to change the range of [SIGRTMIN, SIGRTMAX] to allocate a space for Litmus signals.  I've done this.  It is easy to allocate new signals for Litmus with an approach similar to how we add new system calls to Linux.

Pros:
	- Disjoint range of signals are clean.
Cons:
	- Minor modifications to Linux to treat Litmus signals like SIGRT signals.  SIGRT signals are queued and are not masked/merged with other pending signals of the same type.  Seems nice to have for Litmus.
	- libc doesn't know anything about Litmus Signals!!!  (See #4)

(Overlapping Range for Litmus Signals)
Instead of carving out a range of signals for Litmus, we could just allow Litmus signals to take values from the range [SIGRTMIN, SIGRTMAX].  These are normally only sent by the user-space, but I don't believe that this will cause major integration issues.  Basically, Litmus signals would piggy-back on SIGRT handling.  (Note that I have not yet implemented this method, but plan to do so soon.)

Pros:
	- Little or no code changes.
Cons:
	- libc (hopefully) knows about signals in the [SIGRTMIN, SIGRTMAX] range and allows their use.
	- Potential conflicts with code that uses real-time signals.  This could cause one program to interpret a SIG_BUDGET as some other application-level signal.  However, I know of no code that uses these signals (I only learned of them recently), so perhaps the danger of this is small.

#4 Handling SIG_BUDGET in liblitmus.
Setting up signal handling is easy using libc interfaces.  However, in the case of (Disjoint Range for Litmus Signals), parameter validation in libc prevents us from using signals outside of the known range of valid signals.  Thus, any attempts to simply use the libc interfaces fail with EINVAL.  I have found two workarounds (both are very ugly).

(Direct System Calls)
We can bypass libc completely and interface directly with Linux through standard system calls.  This requires exporting a few more files from Linux to Liblitmus, but this is manageable.  However, I ran into a problem on x86-64.  On x86-64, Linux requires that the caller of sys_sigaction(), the system call used to register a signal handler, specify a post-signal-handler routine (http://lxr.linux.no/linux+v3.5.3/arch/x86/kernel/signal.c#L442).  This is not a part of the POSIX spec, so libc attaches its own assembly-based routine transparently.  Our options seem to be: (a) Create a Litmus patch of libc (nonstarter); (b) Export code from libc to Liblitmus (another nonstarter); (c) copy this routine out of libc into Liblitmus (messy); or (d) modify Linux to not require the post-signal-handler routine, at least in the case of SIG_BUDGET.

I tried out (d) and everything seems to work quite nicely.  One must then wonder what the post-signal-handler routine is used for.  I haven't found definitive information yet, but it appears that the routine is used to help gdb understand signal handler stack frames.  Perhaps this drawback is acceptable.

Pros (of (d)):
	- It works!
Cons:
	- Do we really want to muck with low-level things like this? (Although, hack currently limited to x86-64.)

(extern __libc_sigaction)
Another option is to still perform our signal handling through libc, but bypass its parameter validation.  This is possible by calling private functions in libc directly.  For example, with "extern int __libc_sigaction(…)" we can bypass the parameter validation of libc.  __libc_sigaction() sets up the post-signal-handler routine and then invokes sys_sigaction().

Pros:
	- Does not bend x86-64 constraints.
Cons:
	- Use of private libc functions.  Interfaces can change.
	- (BIGGIE) __libc_sigaction() is only visible in libc.a.  It is hidden in libc.so.  There might be limitations on how we can use libraries.  It also hurts portability.  For example, not every Linux distro uses the same libc! (glibc v. eglibc v. bionic).

Does anyone have any comments they would like to share?  What approach is most appealing to you?  Have I missed any pitfalls or alternatives?  Any thoughts on additional signals that might be useful?  I thought that there might be use for a SIG_CRTICIALITY_CHANGE signal for multi-criticality systems.  A SIG_DEADLINE_MISS might also be useful (there may be times where we want to allocate a budget larger than relative deadline, but this would be another new feature)?

Note that I need to further explore allowing Litmus signals and SIGRT signals to share an overlapping range of values.  However, there may be some issues here besides application-level conflicts: (1) Linux probably expects all SIGRT signals to originate from applications, not from the kernel itself.  (2) Not all SIGRT handling methods may work for us.

* I have read that it is not advisable to call longjmp() from a signal handler.  However, many seem to do this anyhow.  I know Mac has run into this issue with his user-space scheduling, so perhaps he can suggest an alternative.  However, I believe calling longjmp() from a signal handler should be safe for the following reasons: (1) SIG_BUDGET is sent directly to the offending real-time thread (innocent threads are not involved). (2) As long as SA_ONSTACK is not set in sigaction() (and we can enforce this for SIG_BUDGET), the signal handler executes on the same stack as the offending thread.  (3) GCC must be doing *something* to allow exceptions to be thrown from signal handlers, so hopefully some sort of similar method (if not setjmp()/longjmp()) has been codified.

Thanks,
Glenn