[LITMUS^RT] Litmus Bug: newly admitted, but blocked, task added to ready queue

Björn Brandenburg bbb at mpi-sws.org
Tue Jan 22 07:42:35 CET 2013


On Jan 22, 2013, at 1:29 AM, Glenn Elliott <gelliott at cs.unc.edu> wrote:
> Lately I've been hitting a lot of bugs in Litmus because I am forcing threads to become realtime by external calls.  That is, threads do not make themselves realtime; other threads do this on their behalf.  Litmus has the flexibility to allow this, but we've never been able to test it before.  I think I may have a new bug, and I'd like to discuss the best way to fix it.

This is indeed a known bug in the current LITMUS^RT. John Gamboa (TUKL) found a similar bug in the P-EDF plugin (not patched yet).

In the early versions, we supported this, but it must have been lost during some edits long ago and never tested again. Making a suspended task a real-time task should really be a test case for all plugins in the liblitmus test suite.

> Bug: *_task_new() calls *_job_arrival() for all new real-time tasks, even if a task is blocked, i.e., suspended, on I/O, a semaphore, etc..  This causes havoc when that thread is released; it is woken up prematurely.
> 
> Take a look at gsnedf_task_new() (https://github.com/LITMUS-RT/litmus-rt/blob/master/litmus/sched_gsn_edf.c#L535).  It does the following:
> 1) Sets up an initial job to be released at the current time.
> 2) If the function parameter 'running' was set by Linux (see kernel/sched.c), then the corresponding cpu_entry_t is updated to reflect that the task is scheduled.  (The code is slightly more complicated if release_master is used).
> 3) gsnedf_job_arrival(t) is called.  Since the initial job is to be released immediately, the initial job is added to ready queue.
> 
> Clearly, we don't want to add a blocked job to the ready queue.
> 
> Should we move the call to gsnedf_job_arrival() to the end of the "if(running){…}" condition?  Should we wrap the call to gsnedf_job_arrival() in its own condition?  If so, can we rely on the 'running' parameter passed to us by Linux, or should we use the is_running() macro?  Will 'running' and is_running() always be the same?
> 
> In my observations, it appears that "running == is_running(t)" always holds.  I think we should probably move the *job_arrival() up to the end of the "if(running)" block.  Although, the call to job_arrival() may have been intentionally placed outside of the "if(running)" block for reasons of which I am unaware.

I'm not aware of any good reason for job_arrival() to be outside the "if (running)" check.

I don't have time to look at this in a pre-ECRTS timeframe. If you need it fixed, can you prepare a patch to fix this, ideally with a corresponding test case in liblitmus? Otherwise, I'll just put it on my todo list…

Btw, if you ever start using SIGSTOP, I'm sure there are many interesting bugs to be found as well.

Thanks,
Björn





More information about the litmus-dev mailing list