[LITMUS^RT] prop/completion-fix

Jonathan Herman hermanjl at cs.unc.edu
Fri Apr 12 21:56:48 CEST 2013


The current use of the completion flag in Gedf and Cedf allows a race
condition to occur during synchronous releases. Currently, the flag works
like so:
1. The completion flag is set on task A on CPU 1.
2. CPU 1 calls _schedule()
3. CPU 1 calls job_completion on task A.
4. The completion flag is set on task A, again (which is repetitive).
5. CPU 1 calls unlink() and the completion flag is cleared in
link_task_to_cpu (instead of job_completion, where it would make sense to
clear it).

Semantically, the completion flag means 'the next time CPU 1 calls
_schedule, it will job_completion task A.'

The issue: between 1 and 2, other CPUs can requeue task A. Unlink() handles
this situation by removing a task from the ready queue if the task
is_queued. However, this breaks if task A was requeued on the release queue
and not the ready queue.

A synchronous release is:
I. Task A calls do_wait_for_ts_release on CPU 1, which consists of:
  1. task_block() called on task A
  2. task_wake_up() called on task A
      before glenns patch: is_tardy() is true, release_at(task, now) is
called
      now: if is_tardy() AND is_sporadic(), you are released.c
  3. release_at is called for the synchronous release
  4. complete_job is called and the CPU is rescheduled.
..... (see above)
  5. unlink() is called in job completion
  6. check_for_preemptions is called

The nasty race condition:
1. Tasks A, B are not sporadic and are running on CPUs 1 and 2, resp.
2. CPU 1 completes 1-4 of a synchronous release.
3. CPU 2 completes 1-5 and executes 6.
4. CPU 2 preempts task A with something, but sees that task A is releasing
far in the future, and adds it to the release queue
5. CPU 1 executes line 5. is_queued() is true, so unlink() attempts to
remove task A from the ready queue, when task A is actually in the release
queue. The system crashes.

For reasons I can't quite wrap my head around, Glenn's patch makes step 4
of the race condition much more likely.

The fix:
Change the semantics of completion flag for Gedf and Cedf to 'the next time
CPU 1 calls _schedule, it will job completion task A and no one else will
requeue task A in the mean time." The flag is set in complete_job(),
cleared in job_completion(), and requeue_preempted_job returns False if
completed is 1.

The changes are in prop/completion-fix.
-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20130412/d6449953/attachment.html>


More information about the litmus-dev mailing list