[LITMUS^RT] non-rt sync releases (and race condition in do_release_ts() found?)

Thu Jan 10 00:26:01 CET 2013

Hello All,

I am performing a comparison between Litmus and non-real-time scheduling.  In order to make an apples-to-apples comparison, I need synchronous task set releases, even in a non-real-time setting.  I've pushed a branch "prop/non-rt-sync-release" to github.  This functionality *can* be achieved through a posix real-time barrier placed in a shared memory page, but using this method, I can avoid divergent non-real-time and real-time code branches of my application code.  Since this functionality can be replicated in user-space, this might not be a good patch for mainline Litmus, but I wanted to share it nonetheless.

By the way, I noticed that the sync release code was revised in the latest version of Litmus, and I think that I may have identified a race condition that can lead to a bad pointer dereference:

1) Task 1 does do_wait_for_ts_release().
2) Task 2 does do_wait_for_ts_release().

** At this point, each task has a list node in the task_release_list. **

3) do_release_ts() is called by Task 3.
4) Task 3 wakes up Task 1.
5) Task 1 resumes and exits do_wait_for_ts_release().

** Task 1's list node is popped from Task 1's stack. **

6) Task 1 make some function call, pushing data to its stack.
7) Task 3 attempts to iterate to the next list node in task_release_list.  *CRASH* Task 3's pointer (pos) to Task 1's list node is no longer valid.

I hit a crash in KVM where the ts_release_wait pointer, wait, is dereferenced (inside the list_for_each loop) in do_release_ts().  However, I couldn't easily reproduce the crash.  I think we probably need to be using list_for_each_safe() in do_release_ts().

-Glenn