[LITMUS^RT] Increasing time allocated to the scheduler

Mon Jan 26 23:46:06 CET 2015

Looks like that fixed it! Thanks It’s taken me a while to track down the source of my errors. It’s nice to know the fix was easy :)

Aaron Block
Assistant Professor of Computer Science

Austin College | Math and Computer Science
900 North Grand Avenue, Suite 61592 | Sherman, Texas 75090
Phone 903.813.2563
austincollege.edu<http://austincollege.edu/>

[cid:4D99A401-FA1C-43BE-B041-E52331DACF4A at earthlink.net]

On Jan 26, 2015, at 2:06 PM, Glenn Elliott <gelliott at cs.unc.edu<mailto:gelliott at cs.unc.edu>> wrote:

On Jan 26, 2015, at 2:57 PM, Aaron Block <ablock at austincollege.edu<mailto:ablock at austincollege.edu>> wrote:

A plugin I am writing right now has a lot of logic in it for what do when a
job completes. The plugin works correctly, except that it crashes when the
system load is high. I might be barking up the wrong tree but I suspect
that my scheduling logic is "taking too long" and this is causing the
kernel to crash. I was wondering if anybody has an idea of how I could
increase the amount of time allocated to the scheduler (if there is even
such a value)

Thanks,

--Aaron
Aaron Block
Assistant Professor of Computer Science

Austin College | Math and Computer Science
900 North Grand Avenue, Suite 61592 | Sherman, Texas 75090
Phone 903.813.2563
austincollege.edu<http://austincollege.edu/>

Hi Aaron,

Do you have “CONFIG_LOCKUP_DETECTOR” set?  (It’s under the Kernel Hacking > Kernel debugging submenu).  Linux’s soft lockup detector works by running a background high-priority SCHED_FIFO task that runs every few seconds.  Every time it executes, the task resets a counter.  In parallel, using NMIs, an interrupt handler increments this counter.  If this counter reaches a great enough value (~20 seconds by default, I think), the NMI interrupt handler will crash the system.  In systems heavily loaded with Litmus tasks, the SCHED_LITMUS tasks starve the soft lockup detector for CPU time.  The NMI interrupt handler inevitably crashes the system.  I’ve wasted days figuring this one out.

Workaround: Don’t set CONFIG_LOCKUP_DETECTOR.

I’ve thought about submitting a patch to Litmus where the on_tick API causes the lockup detector counter to be reset.  I haven’t gotten around to it yet though.

Of course, maybe CONFIG_LOCKUP_DETECTOR isn’t the source of the problem you are seeing.

-Glenn
_______________________________________________
litmus-dev mailing list
litmus-dev at lists.litmus-rt.org<mailto:litmus-dev at lists.litmus-rt.org>
https://lists.litmus-rt.org/listinfo/litmus-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20150126/6a436a6a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1685 bytes
Desc: image001.gif
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20150126/6a436a6a/attachment.gif>