[LITMUS^RT] Budget Consumed By Jobs In Ready Queue
Jonathan Herman
hermanjl at cs.unc.edu
Tue Sep 4 19:06:20 CEST 2012
This is caused by preempted jobs which have exhausted their budget
being re-added to the ready queues. Specifically, this line of code in
GSN-EDF (which Jeremy's work is based on):
/* Any task that is preemptable and either exhausts its
execution
* budget or wants to sleep completes. We may have to
reschedule after
* this. Don't do a job completion if we block (can't have
timers running
* for blocked jobs). Preemption go first for the same reason.
*/
if (!np && (out_of_time || sleep) && !blocks && !preempt)
job_completion(entry->scheduled, !sleep);
Because the out_of_time job was preempted, the !preempt flag causes
job_completion not to be called. Changing to the more complicated:
if (!np && !blocks && ((sleep && !preempt) || out_of_time))
fixes the issue. The call to job_completion will call unlink(), which
will remove the task from the ready queue. I have done something
similar in all my scheduling plugins without realizing it. Does anyone
know why we weren't calling job_completion on preempted out_of_time
jobs? If there isn't a good reason, this is a bug.
On Fri, Aug 24, 2012 at 6:02 PM, Jeremy Erickson <jerickso at cs.unc.edu> wrote:
> On Fri, Aug 24, 2012 at 4:24 AM, Björn Brandenburg <bbb at mpi-sws.org> wrote:
>>
>>
>> No, GSN-EDF implements link-based scheduling to support non-preemptive
>> sections. A job is in the ready queue if it is not _linked_. A linked job
>> may still be scheduled, either while it is non-preemptive or when the actual
>> system is still catching up to the ideal system (e.g., if the rescheduling
>> IPI is still in flight).
>>
>> Thus, queued jobs may consume budget since they could still be scheduled
>> (for some short time).
>>
>> - Björn
>
>
> So what's been happening to me is that I've been getting unlucky and having
> jobs get unlinked right before they run out of budget. Apparently something
> about my scheduler and/or tests makes that more likely than it usually is,
> but it should have a nonzero probability even with the normal scheduler and
> tests. When that happens, it triggers the BUG_ON in arm_enforcement_timer
> in budget.c as soon as the (now-exhausted) job is taken off the ready queue
> and scheduled.
>
> I think this is a minor bug that should be fixed in mainline LITMUS^RT. I
> have attached two possible fixes: the first triggers a reschedule when
> trying to set the enforcement timer for an expired job (without setting the
> timer), and the second simply removes the BUG_ON (as there's a reasonable
> situation where a timer can be set for a task that just ran out of budget,
> even though the task is preemptible.) A more complicated fix would be to
> check for exhausted jobs when pulling off the ready queue, but that would
> require a change in each scheduler.
>
> These patches do appear to solve the particular issue I was having.
>
> -Jeremy Erickson
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>
--
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
More information about the litmus-dev
mailing list