[LITMUS^RT] switch_sched_plugin() bug?

Felipe Cerqueira felipeqcerqueira at gmail.com
Mon Sep 24 00:27:39 CEST 2012


Hello everyone,

I think there is a bug in the function switch_sched_plugin(). A
deadlock occurs sometimes when the scheduler plugin is changed, while
I run task sets in sequence.

I'm running Litmus on a dual-core system, so that may be the reason
why the deadlock occurs easily, because smp_call_function_many() turns
into smp_call_function_single().
Have a look at this behaviour of switch_sched_plugin():

PROC 0 (the one which is trying to change the scheduler plugin):
1) Changes cannot_use_plugin to 1
2) Sends IPI for PROC 1 to call synch_on_plugin_switch()

PROC 1:
3) Receives IPI
4) Increment cannot_use_plugin to 2
5) Wait while cannot_use_plugin > 0
    ... blocked.

PROC 0:
6) Still inside smp_call_function_single()
7) Hangs at csd_lock_wait() ... blocked.

Deadlock.

Proc 0 was supposed to free the other processor afterwards, but it's
blocked trying to obtain csd lock.
Proc 1, on the other hand, would call csd_unlock() only after the IPI
function returned.

It's said in kernel/smp.c that we should use non-blocking functions
when calling IPIs.
Perhaps in smp_call_function_many() case (in a system with more
processors) the csd lock is removed by some other processor and the
deadlock doesn't happen.

Isn't there another way to implement this cannot_use_plugin barrier?
Or the implementation is right and the bug is in my kernel version?

Here's the stack trace of both threads:

#0  csd_lock_wait (data=<optimized out>) at kernel/smp.c:104
104        while (data->flags & CSD_FLAG_LOCK)
(gdb) bt
#0  csd_lock_wait (data=<optimized out>) at kernel/smp.c:104
#1  csd_lock (data=<optimized out>) at kernel/smp.c:110
#2  smp_call_function_single (cpu=1,
    func=0xffffffff812b4990 <synch_on_plugin_switch>, info=0x0,
    wait=<optimized out>) at kernel/smp.c:306
#3  0xffffffff8108e942 in smp_call_function_many (mask=0xffffffff8184ead8,
    func=0xffffffff812b4990 <synch_on_plugin_switch>, info=0x0, wait=false)
    at kernel/smp.c:439
#4  0xffffffff8108e9c6 in smp_call_function (
    func=0xffffffff812b4990 <synch_on_plugin_switch>, info=0x0, wait=0)
    at kernel/smp.c:495
#5  0xffffffff812b57d8 in switch_sched_plugin (plugin=0xffffffff817ce900)
    at litmus/litmus.c:409
#6  0xffffffff812b5908 in proc_write_curr (file=<optimized out>,
    buffer=<optimized out>, count=<optimized out>, data=<optimized out>)
    at litmus/litmus.c:577
#7  0xffffffff8119269b in proc_file_write (file=0xffff88007b8a90c0,
    buffer=0x7f36114a8000 "C=D", count=3, ppos=<optimized out>)
    at fs/proc/generic.c:224
#8  0xffffffff8118d073 in proc_reg_write (file=<optimized out>,
    buf=0x7f36114a8000 "C=D", count=3, ppos=0xffff88007b10df48)
    at fs/proc/inode.c:185
#9  0xffffffff81138b08 in vfs_write (file=0xffff88007b8a90c0,
---Type <return> to continue, or q <return> to quit---
    buf=0x7f36114a8000 "C=D", count=<optimized out>, pos=0xffff88007b10df48)
    at fs/read_write.c:349
#10 0xffffffff81138e4a in sys_write (fd=<optimized out>,
    buf=0x7f36114a8000 "C=D", count=3) at fs/read_write.c:401
#11 0xffffffff8100aff2 in ?? () at arch/x86/kernel/entry_64.S:487
#12 0x0000000000000246 in ?? ()
#13 0x00000000ffffffff in ?? ()
#14 0x00007f3611454030 in ?? ()
#15 0x00007f3611494700 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000003 in ?? ()
#18 0x0000000000000003 in ?? ()
#19 0x00007f360fe58100 in ?? ()
#20 0x0000000000000033 in ?? ()
#21 0x0000000000010246 in ?? ()
#22 0x00007fff25969cf0 in ?? ()
#23 0x000000000000002b in ?? ()
#24 0x0000000000000000 in ?? ()

(gdb) thread 2
[Switching to thread 2 (Thread 2)]
#0  synch_on_plugin_switch (info=0x0) at litmus/litmus.c:390
390        while (atomic_read(&cannot_use_plugin) > 0)
(gdb) bt
#0  synch_on_plugin_switch (info=0x0) at litmus/litmus.c:390
#1  0xffffffff8108ebde in generic_smp_call_function_single_interrupt ()
    at kernel/smp.c:250
#2  0xffffffff81024857 in smp_call_function_single_interrupt (
    regs=<optimized out>) at arch/x86/kernel/smp.c:239
#3  0xffffffff8100bb33 in ?? () at arch/x86/kernel/entry_64.S:1013
#4  0xffff880001d03f40 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb)

Thanks,
Felipe




More information about the litmus-dev mailing list