[LITMUS^RT] preparing a new release
Björn Brandenburg
bbb at mpi-sws.org
Thu Nov 29 18:42:15 CET 2012
On Nov 29, 2012, at 5:52 PM, Jonathan Herman <hermanjl at cs.unc.edu> wrote:
> What is the following line defending against:
>
> litmus/preempt.c:
> 31│ /* Litmus tasks should never be subject to a remote
> 32│ * set_tsk_need_resched(). */
> 33│ BUG_ON(is_realtime(tsk));
> 34│ //TRACE_TASK(tsk, "SUPERBAD"); /* I added this */
It defends against misuse of set_tsk_need_resched(). You can't safely use set_tsk_need_resched() for non-local tasks without acquiring the task's corresponding runqueue lock.
>
> I keep hitting this when I test with a full schedule under GSN-EDF. Oddly, when I debug using gdb and view t->comm it is "rtspin", but this is not the case in the trace log. If I remove the BUG_ON so that TRACE_TASK is hit, I git the following lines:
> 158079 P0 [sched_state_will_schedule at litmus/preempt.c:34]: (kworker/0:0/0:0) SUPERBAD
> 158080 P0 [sched_state_will_schedule at litmus/preempt.c:37]: (kworker/0:0/0:0) set_tsk_need_resched() ret:ffffffff810268d4
>
What is the symbolic name of ret:ffffffff810268d4? You might want to look at __builtin_return_address(1) instead. Do you have a backtrace?
> Which, as far as I can tell, is only possible if tsk->comm == "kworker/0:0". But then why is there no pid? This is on the current staging of liblitmus and litmus-rt.
Is it reproducible? Can you bisect the recent patches to see where it crept in?
>
>
> Ideas? I'm assuming race condition, as Glenn suggested, because usually if it's insane, its a race.
Looks indeed quite strange. What happens just before this bug? A context switch? A migration? Nonsensical races / panics can also be an indicator of stack corruption.
Thanks,
Björn
More information about the litmus-dev
mailing list