<div dir="ltr">I should add that this seems related to the issue seen in 'non-rt sync releases', but the previous fix did not fix it for me.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Apr 12, 2013 at 5:25 PM, Jonathan Herman <span dir="ltr"><<a href="mailto:hermanjl@cs.unc.edu" target="_blank">hermanjl@cs.unc.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><div dir="ltr"><br><div class="gmail_quote"><br><div dir="ltr">The current use of the completion flag in Gedf and Cedf allows a race condition to occur during synchronous releases. Currently, the flag works like so:<div>
1. The completion flag is set on task A on CPU 1.</div>
<div>2. CPU 1 calls _schedule()</div><div>3. CPU 1 calls job_completion on task A.</div><div>4. The completion flag is set on task A, again (which is repetitive).</div><div>5. CPU 1 calls unlink() and the completion flag is cleared in link_task_to_cpu (instead of job_completion, where it would make sense to clear it).</div>
<div><br></div><div>Semantically, the completion flag means 'the next time CPU 1 calls _schedule, it will job_completion task A.'</div><div><br></div><div>The issue: between 1 and 2, other CPUs can requeue task A. Unlink() handles this situation by removing a task from the ready queue if the task is_queued. However, this breaks if task A was requeued on the release queue and not the ready queue.<br>
</div><div><br></div><div>A synchronous release is:</div><div>I. Task A calls do_wait_for_ts_release on CPU 1, which consists of:<br></div><div> 1. task_block() called on task A</div><div> 2. task_wake_up() called on task A</div>
<div> before glenns patch: is_tardy() is true, release_at(task, now) is called</div><div> now: if is_tardy() AND is_sporadic(), you are released.c</div><div> 3. release_at is called for the synchronous release</div>
<div> 4. complete_job is called and the CPU is rescheduled.</div><div>..... (see above)</div><div> 5. unlink() is called in job completion</div><div> 6. check_for_preemptions is called</div><div>
<br></div><div>The nasty race condition:</div><div>1. Tasks A, B are not sporadic and are running on CPUs 1 and 2, resp.</div><div>2. CPU 1 completes 1-4 of a synchronous release.</div><div>3. CPU 2 completes 1-5 and executes 6.</div>
<div>4. CPU 2 preempts task A with something, but sees that task A is releasing far in the future, and adds it to the release queue</div><div>5. CPU 1 executes line 5. is_queued() is true, so unlink() attempts to remove task A from the ready queue, when task A is actually in the release queue. The system crashes.</div>
<div><br></div><div>For reasons I can't quite wrap my head around, Glenn's patch makes step 4 of the race condition much more likely.</div><div><br></div><div>The fix:</div><div>Change the semantics of completion flag for Gedf and Cedf to 'the next time CPU 1 calls _schedule, it will job completion task A and no one else will requeue task A in the mean time." The flag is set in complete_job(), cleared in job_completion(), and requeue_preempted_job returns False if completed is 1.</div>
<div><br></div><div>The changes are in prop/completion-fix.</div><span><font color="#888888"><div>-- <br>Jonathan Herman<br>Department of Computer Science at UNC Chapel Hill
</div></font></span></div>
</div><br><br clear="all"><div><br></div>-- <br>Jonathan Herman<br>Department of Computer Science at UNC Chapel Hill
</div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Jonathan Herman<br>Department of Computer Science at UNC Chapel Hill
</div>