[LITMUS^RT] Increasing time allocated to the scheduler
Aaron Block
ablock at austincollege.edu
Mon Jan 26 23:46:06 CET 2015
Looks like that fixed it! Thanks It’s taken me a while to track down the source of my errors. It’s nice to know the fix was easy :)
Aaron Block
Assistant Professor of Computer Science
Austin College | Math and Computer Science
900 North Grand Avenue, Suite 61592 | Sherman, Texas 75090
Phone 903.813.2563
austincollege.edu<http://austincollege.edu/>
[cid:4D99A401-FA1C-43BE-B041-E52331DACF4A at earthlink.net]
On Jan 26, 2015, at 2:06 PM, Glenn Elliott <gelliott at cs.unc.edu<mailto:gelliott at cs.unc.edu>> wrote:
On Jan 26, 2015, at 2:57 PM, Aaron Block <ablock at austincollege.edu<mailto:ablock at austincollege.edu>> wrote:
A plugin I am writing right now has a lot of logic in it for what do when a
job completes. The plugin works correctly, except that it crashes when the
system load is high. I might be barking up the wrong tree but I suspect
that my scheduling logic is "taking too long" and this is causing the
kernel to crash. I was wondering if anybody has an idea of how I could
increase the amount of time allocated to the scheduler (if there is even
such a value)
Thanks,
--Aaron
Aaron Block
Assistant Professor of Computer Science
Austin College | Math and Computer Science
900 North Grand Avenue, Suite 61592 | Sherman, Texas 75090
Phone 903.813.2563
austincollege.edu<http://austincollege.edu/>
Hi Aaron,
Do you have “CONFIG_LOCKUP_DETECTOR” set? (It’s under the Kernel Hacking > Kernel debugging submenu). Linux’s soft lockup detector works by running a background high-priority SCHED_FIFO task that runs every few seconds. Every time it executes, the task resets a counter. In parallel, using NMIs, an interrupt handler increments this counter. If this counter reaches a great enough value (~20 seconds by default, I think), the NMI interrupt handler will crash the system. In systems heavily loaded with Litmus tasks, the SCHED_LITMUS tasks starve the soft lockup detector for CPU time. The NMI interrupt handler inevitably crashes the system. I’ve wasted days figuring this one out.
Workaround: Don’t set CONFIG_LOCKUP_DETECTOR.
I’ve thought about submitting a patch to Litmus where the on_tick API causes the lockup detector counter to be reset. I haven’t gotten around to it yet though.
Of course, maybe CONFIG_LOCKUP_DETECTOR isn’t the source of the problem you are seeing.
-Glenn
_______________________________________________
litmus-dev mailing list
litmus-dev at lists.litmus-rt.org<mailto:litmus-dev at lists.litmus-rt.org>
https://lists.litmus-rt.org/listinfo/litmus-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20150126/6a436a6a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1685 bytes
Desc: image001.gif
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20150126/6a436a6a/attachment.gif>
More information about the litmus-dev
mailing list