[LITMUS^RT] preparing a new release

Jonathan Herman hermanjl at cs.unc.edu
Thu Nov 29 21:20:36 CET 2012


Back trace:
#0  delay_tsc (loops=2127949) at arch/x86/lib/delay.c:69
#1  0xffffffff8127fcef in __delay (loops=<optimized out>) at
arch/x86/lib/delay.c:112
#2  0xffffffff8127fd2c in __const_udelay (xloops=<optimized out>) at
arch/x86/lib/delay.c:126
#3  0xffffffff814a2e93 in panic (fmt=<optimized out>) at kernel/panic.c:152
#4  0xffffffff81005541 in oops_end (flags=70, regs=<optimized out>,
signr=11) at arch/x86/kernel/dumpstack.c:243
#5  0xffffffff810056a8 in die (str=0xffffffff815ef64a "invalid opcode",
regs=0xffff88003f003d28, err=0)
    at arch/x86/kernel/dumpstack.c:305
#6  0xffffffff81002044 in do_trap (trapnr=6, signr=4,
str=0xffffffff815ef64a "invalid opcode",
    regs=0xffff88003f003d28, error_code=<optimized out>,
info=0xffff88003f003c88) at arch/x86/kernel/traps.c:177
#7  0xffffffff81002355 in do_invalid_op (regs=0xffff88003f003d28,
error_code=0) at arch/x86/kernel/traps.c:218
#8  <signal handler called>
#9  0xffffffff8126382d in sched_state_will_schedule
(tsk=0xffff880039d40fa0) at litmus/preempt.c:33
#10 0xffffffff810268d4 in set_tsk_need_resched (tsk=0xffff880039d40fa0) at
include/linux/sched.h:2461
#11 resched_task (p=<optimized out>) at kernel/sched.c:1189
#12 0xffffffff8102cdb7 in sched_rt_rq_enqueue (rt_rq=<optimized out>) at
kernel/sched_rt.c:326
#13 do_sched_rt_period_timer (overrun=1, rt_b=<optimized out>) at
kernel/sched_rt.c:598
#14 sched_rt_period_timer (timer=0xffffffff81838ba8) at kernel/sched.c:177
#15 0xffffffff81061973 in __run_hrtimer (timer=0xffffffff81838ba8,
now=0xffff88003f003f50) at kernel/hrtimer.c:1310
#16 0xffffffff81062a8b in hrtimer_interrupt (dev=<optimized out>) at
kernel/hrtimer.c:1398
#17 0xffffffff814ad996 in local_apic_timer_interrupt () at
arch/x86/kernel/apic/apic.c:834
#18 smp_apic_timer_interrupt (regs=<optimized out>) at
arch/x86/kernel/apic/apic.c:861
#19 <signal handler called>
#20 0x00cf9b000000ffff in ?? ()

I have also confirmed that this only happens when shed-trace is running
(not overhead tracing, just scheduling). Every commit I checked back to the
3.0 merge where my sched-trace still ran had this same issue.


On Thu, Nov 29, 2012 at 12:42 PM, Björn Brandenburg <bbb at mpi-sws.org> wrote:

>
> On Nov 29, 2012, at 5:52 PM, Jonathan Herman <hermanjl at cs.unc.edu> wrote:
>
> > What is the following line defending against:
> >
> > litmus/preempt.c:
> >   31│                 /* Litmus tasks should never be subject to a remote
> >   32│                  * set_tsk_need_resched(). */
> >   33│                 BUG_ON(is_realtime(tsk));
> >   34│                 //TRACE_TASK(tsk, "SUPERBAD"); /* I added this */
>
> It defends against misuse of set_tsk_need_resched(). You can't safely use
> set_tsk_need_resched() for non-local tasks without acquiring the task's
> corresponding runqueue lock.
>
> >
> > I keep hitting this when I test with a full schedule under GSN-EDF.
> Oddly, when I debug using gdb and view t->comm it is "rtspin", but this is
> not the case in the trace log. If I remove the BUG_ON so that TRACE_TASK is
> hit, I git the following lines:
> > 158079 P0 [sched_state_will_schedule at litmus/preempt.c:34]:
> (kworker/0:0/0:0) SUPERBAD
> > 158080 P0 [sched_state_will_schedule at litmus/preempt.c:37]:
> (kworker/0:0/0:0) set_tsk_need_resched() ret:ffffffff810268d4
> >
>
> What is the symbolic name of ret:ffffffff810268d4? You might want to look
> at __builtin_return_address(1) instead. Do you have a backtrace?
>
> > Which, as far as I can tell, is only possible if tsk->comm ==
> "kworker/0:0". But then why is there no pid? This is on the current staging
> of liblitmus and litmus-rt.
>
> Is it reproducible? Can you bisect the recent patches to see where it
> crept in?
>
> >
> >
> > Ideas? I'm assuming race condition, as Glenn suggested, because usually
> if it's insane, its a race.
>
> Looks indeed quite strange. What happens just before this bug? A context
> switch? A migration? Nonsensical races / panics can also be an indicator of
> stack corruption.
>
> Thanks,
> Björn
>
>
>
> _______________________________________________
> litmus-dev mailing list
> litmus-dev at lists.litmus-rt.org
> https://lists.litmus-rt.org/listinfo/litmus-dev
>



-- 
Jonathan Herman
Department of Computer Science at UNC Chapel Hill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.litmus-rt.org/pipermail/litmus-dev/attachments/20121129/ddb5fc2f/attachment.html>


More information about the litmus-dev mailing list