[LITMUS^RT] [PATCH] EXPLANATION: Architecture dependent uncachable control page.

Christopher Kenna cjk at cs.unc.edu
Sun Oct 7 21:25:53 CEST 2012


I encountered some problems with the LITMUS^RT control page. To elaborate upon
what is stated in the patch's commit message (maybe go read that now), imagine
that we have a control page C that is mapped in both the kernel address space
and the user address space (like we already do in LITMUS^RT), and the following
kernel system call and userland code interact:

Kernel system call S():
    1) Read C->val and print it using TRACE().
    2) C->val := 12345

User code:
    1) C->val := 98765
    2) S()
    3) printf(C->val)

On x86 (tested on an Intel i7), the result is that the kernel sees the value
98765 from the userland and prints it to the LITMUS^RT log using TRACE, while
the userland code sees the kernel value 12345 and prints it using printf().
This happened in 100% of my test cases.

On ARM (tested on an ODROID-X / Samsung Exynos4412), the result is
non-deterministic. Usually, each address space prints only the value that it
wrote (12345 in the kernel and 98765 in the userland), but adding sleep()
between steps 1 and 2 of the userland code results in the kernel occasionally
printing the "correct" value.

I explored a few solutions to the problem. One was the restrict the task to run
on a single CPU. That did not work. I also tried inserting general memory
barriers in various places, but this also had no effect.

What did work is to make the mapping of the control page uncachable in both the
user- and kernelspace. One theory that could explain why this fixes the problem
is due to caches indexed or tagged with virtual addresses. Shared memory
accessed via different virtual addresses may need to be made uncached, because
the cache would contain different entries for each virtual address range. If
any of these mappings are writable, it causes a coherency problem because
modifications made through one mapping aren't visible through the cache entries
for other mappings. However, the documentation I found for the Cortex-A9 says
that the data cache is Physically Index and Physically Tagged (PIPT), and
that the instruction cache is Virtually Indexed and Physically Tagged
(VIPT). Since the control page is data, I thought it should be using the
PIPT cache, and the "aliasing" of virtual addresses should not be an issue.

Taking the above into account, if anyone can offer an explanation as to why
this path fixes the problem, I am very interested to know.

The patch itself is not that complicated. I added a configuration option that
enables the uncachable control page. Memory allocation is done in the usual
way. I then create a new virtual address mapping that has the uncachable page
protection bits set with vmap(), and use that as the virtual address of the
control page. I had to move the freeing and vunmap() calls to be done in a
workqueue because vunmap() cannot run in interrupt context. On my machine,
exit_litmus() is called sometime after some sortirq work, which would (and did)
trigger a BUG inside of the vunmap() code (BUG_ON(in_interrupt())).

I tested by running the LITMUS^RT unit tests on ARM and x86. I also did a test
run with the new functionality enabled on x86 to ensure that it works; however,
x86 does not seem to have this problem so it is off for this architecture by
default.

Comments welcome.

Online:
https://github.com/LITMUS-RT/litmus-rt/commit/3c6a29e4ebf4111b4d279dbe83febccd2fe4d499

Christopher Kenna (1):
  Architecture dependent uncachable control page.

 arch/arm/Kconfig          |    3 +
 include/litmus/litmus.h   |    2 +
 include/litmus/rt_param.h |    3 +
 litmus/ctrldev.c          |  150 +++++++++++++++++++++++++++++++++++++++++++--
 litmus/litmus.c           |   33 ++++++----
 5 files changed, 176 insertions(+), 15 deletions(-)

-- 
1.7.9.5





More information about the litmus-dev mailing list