[LITMUS^RT] preparing a new release

Fri Nov 30 23:15:36 CET 2012

This is an issue has hit me before.  I think running ftcat as SCHED_FIFO is useful.  Is there any way we can fix this?  Perhaps we should silently ignore the resched event instead of BUG()'ing?

-Glenn

On Nov 30, 2012, at 5:12 PM, Jonathan Herman <hermanjl at cs.unc.edu> wrote:

> This is caused by my scripts elevating ftcat for sched_trace to SCHED_FIFO. Apparently the SCHED_FIFO rebalancing timer can trigger a reschedule for two reasons:
> 1. Too much SCHED_FIFO work was done in the last period
> 2. Too little SCHED_FIFO work was done in the last period (this is what got me)
> My SCHED_FIFO ftcat's were getting starved, causing the rebalancing timer to remotely reschedule a running rtspin in the vain hope that ftcat would be selected to run.
> 
> I wouldn't worry about this issue. False alarm.
> 
> 
> On Fri, Nov 30, 2012 at 2:21 PM, Jonathan Herman <hermanjl at cs.unc.edu> wrote:
> It does still crash, yes. It looks like this is being caused by the real-time balancing timer in linux (see Documentation/scheduler/sched-rt-group.txt) which forces Linux real-time (ie SCHED_FIFO) tasks to relinquish 5% of the CPU to SCHED_OTHER tasks. It works like so:
> 1. A timer fires periodically.
> 2. The timer scans ALL cpus.
> 3. If any CPU has real-time work exceeding some threshold over the last timer period, the currently running task on that CPU is rescheduled.
> 
> Step 3 is what caused our BUG. The BUG is hit if a remote processor, in this case the processor on which the rebalancing timer fired, triggers a reschedule of a SCHED_LITMUS task.
> 
> This can be disabled with:
> echo -1 > /proc/sys/kernel/sched_rt_runtime_us
> 
> I will spend some time figuring out why this timer is considering SCHED_LITMUS work when rebalancing. Somehow our work is counting towards the 95% work which SCHED_FIFO tasks can use. That is why the BUG is only hit when the system is fully utilized as well.
> 
> 
> On Thu, Nov 29, 2012 at 5:39 PM, Björn Brandenburg <bbb at mpi-sws.org> wrote:
> 
> On Nov 29, 2012, at 9:20 PM, Jonathan Herman <hermanjl at cs.unc.edu> wrote:
> 
> > I have also confirmed that this only happens when shed-trace is running (not overhead tracing, just scheduling). Every commit I checked back to the 3.0 merge where my sched-trace still ran had this same issue.
> 
> Yikes. Sounds like it could be a bug in the Feather-Trace triggers.
> 
> Could you please try the following: edit arch/x86/Kconfig to remove ARCH_HAS_FEATHER_TRACE. This should disable the asm Feather-Trace hacks and replace it with a tame default implementation. Does it still crash?
> 
> Thanks,
> Björn
> 
> 
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
> 
> 
> 
> -- 
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
> 
> 
> 
> -- 
> Jonathan Herman
> Department of Computer Science at UNC Chapel Hill
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20121130/c8b5f4af/attachment.html>