[LITMUS^RT] preparing a new release

Jonathan Herman hermanjl at cs.unc.edu
Fri Nov 30 23:12:19 CET 2012


This is caused by my scripts elevating ftcat for sched_trace to SCHED_FIFO.
Apparently the SCHED_FIFO rebalancing timer can trigger a reschedule for
two reasons:
1. Too much SCHED_FIFO work was done in the last period
2. Too little SCHED_FIFO work was done in the last period (this is what got
me)
My SCHED_FIFO ftcat's were getting starved, causing the rebalancing timer
to remotely reschedule a running rtspin in the vain hope that ftcat would
be selected to run.

I wouldn't worry about this issue. False alarm.


On Fri, Nov 30, 2012 at 2:21 PM, Jonathan Herman <hermanjl at cs.unc.edu>wrote:

> It does still crash, yes. It looks like this is being caused by the
> real-time balancing timer in linux
> (see Documentation/scheduler/sched-rt-group.txt) which forces Linux
> real-time (ie SCHED_FIFO) tasks to relinquish 5% of the CPU to SCHED_OTHER
> tasks. It works like so:
> 1. A timer fires periodically.
> 2. The timer scans ALL cpus.
> 3. If any CPU has real-time work exceeding some threshold over the last
> timer period, the currently running task on that CPU is rescheduled.
>
> Step 3 is what caused our BUG. The BUG is hit if a remote processor, in
> this case the processor on which the rebalancing timer fired, triggers a
> reschedule of a SCHED_LITMUS task.
>
>  This can be disabled with:
> echo -1 > /proc/sys/kernel/sched_rt_runtime_us
>
> I will spend some time figuring out why this timer is considering
> SCHED_LITMUS work when rebalancing. Somehow our work is counting towards
> the 95% work which SCHED_FIFO tasks can use. That is why the BUG is only
> hit when the system is fully utilized as well.
>
>
> On Thu, Nov 29, 2012 at 5:39 PM, Björn Brandenburg <bbb at mpi-sws.org>wrote:
>
>>
>> On Nov 29, 2012, at 9:20 PM, Jonathan Herman <hermanjl at cs.unc.edu> wrote:
>>
>> > I have also confirmed that this only happens when shed-trace is running
>> (not overhead tracing, just scheduling). Every commit I checked back to the
>> 3.0 merge where my sched-trace still ran had this same issue.
>>
>> Yikes. Sounds like it could be a bug in the Feather-Trace triggers.
>>
>> Could you please try the following: edit arch/x86/Kconfig to remove
>> ARCH_HAS_FEATHER_TRACE. This should disable the asm Feather-Trace hacks and
>> replace it with a tame default implementation. Does it still crash?
>>
>> Thanks,
>> Björn
>>
>>
>> _______________________________________________
>> litmus-dev mailing list
>> litmus-dev at lists.litmus-rt.org
>> https://lists.litmus-rt.org/listinfo/litmus-dev
>>
>
>
>
> --
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
>



-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20121130/c7f414f4/attachment.html>


More information about the litmus-dev mailing list