[LITMUS^RT] non-rt sync releases (and race condition in do_release_ts() found?)

Thu Jan 10 16:46:34 CET 2013

On Jan 10, 2013, at 12:26 AM, Glenn Elliott <gelliott at cs.unc.edu> wrote:

> By the way, I noticed that the sync release code was revised in the latest version of Litmus, and I think that I may have identified a race condition that can lead to a bad pointer dereference:
> 
> 1) Task 1 does do_wait_for_ts_release().
> 2) Task 2 does do_wait_for_ts_release().
> 
> ** At this point, each task has a list node in the task_release_list. **
> 
> 3) do_release_ts() is called by Task 3.
> 4) Task 3 wakes up Task 1.
> 5) Task 1 resumes and exits do_wait_for_ts_release().
> 
> ** Task 1's list node is popped from Task 1's stack. **
> 
> 6) Task 1 make some function call, pushing data to its stack.
> 7) Task 3 attempts to iterate to the next list node in task_release_list.  *CRASH* Task 3's pointer (pos) to Task 1's list node is no longer valid.
> 
> I hit a crash in KVM where the ts_release_wait pointer, wait, is dereferenced (inside the list_for_each loop) in do_release_ts().  However, I couldn't easily reproduce the crash.  I think we probably need to be using list_for_each_safe() in do_release_ts().

I've hit the same bug and pushed a fix to https://github.com/LITMUS-RT/litmus-rt/commits/prop/misc-fixes.

I've also included a patch that reimplements the plugin switching code, which caused lockups on my machine.

If there are no objections, I'll merge these patches into staging.

- Björn