[LITMUS^RT] prop/completion-fix
Jonathan Herman
hermanjl at cs.unc.edu
Fri Apr 12 21:56:48 CEST 2013
The current use of the completion flag in Gedf and Cedf allows a race
condition to occur during synchronous releases. Currently, the flag works
like so:
1. The completion flag is set on task A on CPU 1.
2. CPU 1 calls _schedule()
3. CPU 1 calls job_completion on task A.
4. The completion flag is set on task A, again (which is repetitive).
5. CPU 1 calls unlink() and the completion flag is cleared in
link_task_to_cpu (instead of job_completion, where it would make sense to
clear it).
Semantically, the completion flag means 'the next time CPU 1 calls
_schedule, it will job_completion task A.'
The issue: between 1 and 2, other CPUs can requeue task A. Unlink() handles
this situation by removing a task from the ready queue if the task
is_queued. However, this breaks if task A was requeued on the release queue
and not the ready queue.
A synchronous release is:
I. Task A calls do_wait_for_ts_release on CPU 1, which consists of:
1. task_block() called on task A
2. task_wake_up() called on task A
before glenns patch: is_tardy() is true, release_at(task, now) is
called
now: if is_tardy() AND is_sporadic(), you are released.c
3. release_at is called for the synchronous release
4. complete_job is called and the CPU is rescheduled.
..... (see above)
5. unlink() is called in job completion
6. check_for_preemptions is called
The nasty race condition:
1. Tasks A, B are not sporadic and are running on CPUs 1 and 2, resp.
2. CPU 1 completes 1-4 of a synchronous release.
3. CPU 2 completes 1-5 and executes 6.
4. CPU 2 preempts task A with something, but sees that task A is releasing
far in the future, and adds it to the release queue
5. CPU 1 executes line 5. is_queued() is true, so unlink() attempts to
remove task A from the ready queue, when task A is actually in the release
queue. The system crashes.
For reasons I can't quite wrap my head around, Glenn's patch makes step 4
of the race condition much more likely.
The fix:
Change the semantics of completion flag for Gedf and Cedf to 'the next time
CPU 1 calls _schedule, it will job completion task A and no one else will
requeue task A in the mean time." The flag is set in complete_job(),
cleared in job_completion(), and requeue_preempted_job returns False if
completed is 1.
The changes are in prop/completion-fix.
--
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20130412/d6449953/attachment.html>
More information about the litmus-dev
mailing list