[LITMUS^RT] Budget Consumed By Jobs In Ready Queue

Jonathan Herman hermanjl at cs.unc.edu
Tue Sep 4 19:06:20 CEST 2012


This is caused by preempted jobs which have exhausted their budget
being re-added to the ready queues. Specifically, this line of code in
GSN-EDF (which Jeremy's work is based on):

        /* Any task that is preemptable and either exhausts its
execution
         * budget or wants to sleep completes. We may have to
reschedule after
         * this. Don't do a job completion if we block (can't have
timers running
         * for blocked jobs). Preemption go first for the same reason.
         */
        if (!np && (out_of_time || sleep) && !blocks && !preempt)
                job_completion(entry->scheduled, !sleep);

Because the out_of_time job was preempted, the !preempt flag causes
job_completion not to be called. Changing to the more complicated:

  if (!np && !blocks && ((sleep && !preempt) || out_of_time))

fixes the issue. The call to job_completion will call unlink(), which
will remove the task from the ready queue. I have done something
similar in all my scheduling plugins without realizing it. Does anyone
know why we weren't calling job_completion on preempted out_of_time
jobs? If there isn't a good reason, this is a bug.

On Fri, Aug 24, 2012 at 6:02 PM, Jeremy Erickson <jerickso at cs.unc.edu> wrote:
> On Fri, Aug 24, 2012 at 4:24 AM, Björn Brandenburg <bbb at mpi-sws.org> wrote:
>>
>>
>> No, GSN-EDF implements link-based scheduling to support non-preemptive
>> sections. A job is in the ready queue if it is not _linked_. A linked job
>> may still be scheduled, either while it is non-preemptive or when the actual
>> system is still catching up to the ideal system (e.g., if the rescheduling
>> IPI is still in flight).
>>
>> Thus, queued jobs may consume budget since they could still be scheduled
>> (for some short time).
>>
>> - Björn
>
>
> So what's been happening to me is that I've been getting unlucky and having
> jobs get unlinked right before they run out of budget.  Apparently something
> about my scheduler and/or tests makes that more likely than it usually is,
> but it should have a nonzero probability even with the normal scheduler and
> tests.  When that happens, it triggers the BUG_ON in arm_enforcement_timer
> in budget.c as soon as the (now-exhausted) job is taken off the ready queue
> and scheduled.
>
> I think this is a minor bug that should be fixed in mainline LITMUS^RT.  I
> have attached two possible fixes: the first triggers a reschedule when
> trying to set the enforcement timer for an expired job (without setting the
> timer), and the second simply removes the BUG_ON (as there's a reasonable
> situation where a timer can be set for a task that just ran out of budget,
> even though the task is preemptible.)  A more complicated fix would be to
> check for exhausted jobs when pulling off the ready queue, but that would
> require a change in each scheduler.
>
> These patches do appear to solve the particular issue I was having.
>
> -Jeremy Erickson
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>



-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill




More information about the litmus-dev mailing list