[LITMUS^RT] Invalid Opcode Issue

Tue Feb 9 09:21:30 CET 2016

> On 09 Feb 2016, at 06:55, Meng Xu <xumengpanda at gmail.com> wrote:
> 
> One quick thought:
> 
> Did you try it on the fresh Linux, instead of LITMUS? This could at least pin down if the issue comes from LITMUS or Linux. 
> 
> If fresh Linux cannot run on Xen, it will definitely be a bug in Xen or in Linux. 
> 
> Meng
> 
> On Tue, Feb 9, 2016 at 12:17 AM, Geoffrey Tran <gtran at isi.edu> wrote:
> Hello all,
> 
> I have been working with Litmus-RT inside a Xen guest, but have come across the following issue:  the guest crashes with the following
> trace output by Xen:
> 
> [25623.603388] ------------[ cut here ]------------
> [25623.603499] kernel BUG at drivers/xen/events/events_base.c:1209!
> [25623.603531] invalid opcode: 0000 [#1] PREEMPT SMP 
> [25623.603564] Modules linked in: x86_pkg_temp_thermal joydev coretemp ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64
> [25623.603666] CPU: 0 PID: 31 Comm: xenwatch Tainted: G             L  4.1.3+ #2
> [25623.603700] task: ffff8801f4abe5c0 ti: ffff8801f4ba0000 task.ti: ffff8801f4ba0000
> [25623.603735] RIP: e030:[<ffffffff8147e169>]  [<ffffffff8147e169>] xen_send_IPI_one+0x59/0x60
> [25623.603784] RSP: e02b:ffff8801f4ba3a78  EFLAGS: 00010086
> [25623.603809] RAX: ffff8801f5ec8240 RBX: 0000000000000000 RCX: 0000000000000001
> [25623.604058] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 00000000ffffffff
> [25623.604058] RBP: ffff8801f4ba3a78 R08: 0000000000000000 R09: fffffffffffffd3d
> [25623.604058] R10: 0000000000000001 R11: 0000000000050108 R12: 0000000000000001
> [25623.604058] R13: ffff8801f5e92c60 R14: ffff88001bc21970 R15: 0000000000012c60
> [25623.604058] FS:  00007f5c92344700(0000) GS:ffff8801f5e00000(0000) knlGS:0000000000000000
> [25623.604058] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [25623.604058] CR2: 00007f5c92357000 CR3: 00000001d97df000 CR4: 0000000000042660
> [25623.604058] Stack:
> [25623.604058]  ffff8801f4ba3a88 ffffffff81013fb0 ffff8801f4ba3aa8 ffffffff8139ff45
> [25623.604058]  ffff8801f4ba3b18 ffff880000000001 ffff8801f4ba3af8 ffffffff8139ceb8
> [25623.604058]  0000000000000000 ffffffff817b0bc0 ffffffff819f54c2 0000000000000113
> [25623.604058] Call Trace:
> [25623.604058]  [<ffffffff81013fb0>] xen_smp_send_reschedule+0x10/0x20
> [25623.604058]  [<ffffffff8139ff45>] litmus_reschedule+0x85/0xc0
> [25623.604058]  [<ffffffff8139ceb8>] preempt_if_preemptable+0xa8/0x150
> [25623.604058]  [<ffffffff813a5e91>] check_for_preemptions+0x201/0x3b0
> [25623.604058]  [<ffffffff813a633e>] gsnedf_release_jobs+0x3e/0x60
> [25623.604058]  [<ffffffff813a2163>] on_release_timer+0x73/0x80
> [25623.604058]  [<ffffffff810ddb16>] __run_hrtimer+0x76/0x290
> [25623.604058]  [<ffffffff813a20f0>] ? arm_release_timer_on+0x300/0x300
> [25623.604058]  [<ffffffff810de6d3>] hrtimer_interrupt+0x113/0x290
> [25623.604058]  [<ffffffff8176a9f1>] ? _raw_spin_unlock_irqrestore+0x21/0x40
> [25623.604058]  [<ffffffff810de8a5>] __hrtimer_peek_ahead_timers+0x55/0x60
> [25623.604058]  [<ffffffff810de9cb>] hrtimer_cpu_notify+0x11b/0x240
> [25623.604058]  [<ffffffff8176a9ae>] ? _raw_spin_unlock_irq+0x1e/0x40
> [25623.604058]  [<ffffffff8109693d>] notifier_call_chain+0x4d/0x70
> [25623.604058]  [<ffffffff810969fe>] __raw_notifier_call_chain+0xe/0x10
> [25623.604058]  [<ffffffff810774b0>] __cpu_notify+0x20/0x40
> [25623.604058]  [<ffffffff81077595>] cpu_notify_nofail+0x15/0x20
> [25623.604058]  [<ffffffff81756615>] _cpu_down+0x155/0x2b0
> [25623.604058]  [<ffffffff81481d20>] ? xenbus_thread+0x2a0/0x2a0
> [25623.604058]  [<ffffffff817567a5>] cpu_down+0x35/0x50
> [25623.604058]  [<ffffffff81479772>] handle_vcpu_hotplug_event+0x72/0xf0
> [25623.604058]  [<ffffffff81481dc7>] xenwatch_thread+0xa7/0x170
> [25623.604058]  [<ffffffff810b4d60>] ? prepare_to_wait_event+0x100/0x100
> [25623.604058]  [<ffffffff81095ff9>] kthread+0xc9/0xe0
> [25623.604058]  [<ffffffff81095f30>] ? flush_kthread_worker+0x90/0x90
> [25623.604058]  [<ffffffff8176b622>] ret_from_fork+0x42/0x70
> [25623.604058]  [<ffffffff81095f30>] ? flush_kthread_worker+0x90/0x90
> [25623.604058] Code: ff ff 5d c3 bf 0b 00 00 00 48 63 f1 31 d2 e8 af 31 b8 ff 85 c0 79 eb 89 c2 89 ce 48 c7 c7 c8 d3 a1 81 31 c0 e8 05 15 2e 00 5d c3 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 53 48 83 ec 08 
> [25623.604058] RIP  [<ffffffff8147e169>] xen_send_IPI_one+0x59/0x60
> [25623.604058]  RSP <ffff8801f4ba3a78>
> [25623.604058] ---[ end trace 702caae929567a1d ]---
> [25623.604058] note: xenwatch[31] exited with preempt_count 1
> 
> I’m not sure whether the issue lies with Xen or Litmus-RT, but thought to try here first.  I am adding and removing VCPUs from the VM at runtimes per
> http://backdrift.org/how-to-hot-addremove-vcpus-from-a-xen-domain, so that may be a factor here.  The issue pops up intermittently, so I have not
> been able to nail down steps to reproduce this.

Hi Geoffrey,

none of the schedulers in LITMUS^RT supports CPU hot-plugging. The code has never been tested with CPU hot-plugging, but it is known to not work with it. Simply put, we never had the resources or the motivation to pay attention to CPU hot-plugging. 

I’d be happy to merge patches adding CPU hot-plugging support, but any such effort will have to spearheaded by someone outside of MPI-SWS.

- Björn