[LITMUS^RT] prop/completion-fix

Jonathan Herman hermanjl at cs.unc.edu
Fri Apr 12 23:34:15 CEST 2013


I should add that this seems related to the issue seen in 'non-rt sync
releases', but the previous fix did not fix it for me.


On Fri, Apr 12, 2013 at 5:25 PM, Jonathan Herman <hermanjl at cs.unc.edu>wrote:

>
>
> The current use of the completion flag in Gedf and Cedf allows a race
> condition to occur during synchronous releases. Currently, the flag works
> like so:
> 1. The completion flag is set on task A on CPU 1.
> 2. CPU 1 calls _schedule()
> 3. CPU 1 calls job_completion on task A.
> 4. The completion flag is set on task A, again (which is repetitive).
> 5. CPU 1 calls unlink() and the completion flag is cleared in
> link_task_to_cpu (instead of job_completion, where it would make sense to
> clear it).
>
> Semantically, the completion flag means 'the next time CPU 1 calls
> _schedule, it will job_completion task A.'
>
> The issue: between 1 and 2, other CPUs can requeue task A. Unlink()
> handles this situation by removing a task from the ready queue if the task
> is_queued. However, this breaks if task A was requeued on the release queue
> and not the ready queue.
>
> A synchronous release is:
> I. Task A calls do_wait_for_ts_release on CPU 1, which consists of:
>   1. task_block() called on task A
>   2. task_wake_up() called on task A
>       before glenns patch: is_tardy() is true, release_at(task, now) is
> called
>       now: if is_tardy() AND is_sporadic(), you are released.c
>   3. release_at is called for the synchronous release
>   4. complete_job is called and the CPU is rescheduled.
> ..... (see above)
>   5. unlink() is called in job completion
>   6. check_for_preemptions is called
>
> The nasty race condition:
> 1. Tasks A, B are not sporadic and are running on CPUs 1 and 2, resp.
> 2. CPU 1 completes 1-4 of a synchronous release.
> 3. CPU 2 completes 1-5 and executes 6.
> 4. CPU 2 preempts task A with something, but sees that task A is releasing
> far in the future, and adds it to the release queue
> 5. CPU 1 executes line 5. is_queued() is true, so unlink() attempts to
> remove task A from the ready queue, when task A is actually in the release
> queue. The system crashes.
>
> For reasons I can't quite wrap my head around, Glenn's patch makes step 4
> of the race condition much more likely.
>
> The fix:
> Change the semantics of completion flag for Gedf and Cedf to 'the next
> time CPU 1 calls _schedule, it will job completion task A and no one else
> will requeue task A in the mean time." The flag is set in complete_job(),
> cleared in job_completion(), and requeue_preempted_job returns False if
> completed is 1.
>
> The changes are in prop/completion-fix.
> --
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
>
>
>
> --
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
>



-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20130412/d40acaf0/attachment.html>


More information about the litmus-dev mailing list