[LITMUS^RT] [PATCH] EXPLANATION: Architecture dependent uncachable control page.

Christopher Kenna cjk at cs.unc.edu
Wed Oct 10 22:07:27 CEST 2012


On Wed, Oct 10, 2012 at 8:56 AM, Manohar Vanga <mvanga at mpi-sws.org> wrote:
> So it was printing 12345 in the kernel before you assigned it?? Or did you just write the sequence of steps in the system call backwards by mistake?

You're right, I made a mistake. The first iteration would print a 0.
After that it's what I said.

> I would have assumed it was some sort of issue with the write buffer not being flushed out. It can apparently be forced with the 'dsb' instruction. However, looking at the source code, it seems like mb() falls down to a dsb instruction followed by a sync in the case that CONFIG_SMP has been set. That makes this issue less probable :P
>
> If this had been the issue, it would actually explain the problem as cacheable memory seems to be buffered while using pgprot_noncached() seems to set it to non-buffered. This can be seen in arch/arm/include/asm/pgtable.h. In fact from what I see you can also use DMA coherent memory (using dma_alloc_coherent()) for allocating your control page. This might also explain why adding a sleep was making it work correctly; the write buffer was being flushed in that time. However, none of this applies as you say you already tried to use memory barriers (which flush the write buffer when CONFIG_SMP is set, according to arch/arm/include/asm/system.h), and it still didn't work :P

I am not very familiar with the order of memory operations on ARM, so
I was trying what I thought is the most heavy-handed form of memory
barrier. I could have gotten it wrong, so you might want to see if you
can reproduce it and/or fix it with memory barriers before giving up
on this.

> So could it be that the cache is not PIPT (even though the documentation clearly states it :D)? Perhaps we could try flushing the data cache before returning from the syscall and check? (I believe it can be done with the flush_dcache_page() function).

Maybe, but I doubt that the documentation is wrong. If it were, then
many assumptions about the cache would be incorrect in other parts of
the kernel, and surely someone would have seen data corruption
elsewhere. I have found that when something happens that I cannot
explain it is usually because I have made an assumption that is
incorrect and not the fault of other hardware. (Or, put simply,
because I've screwed up.) I just don't know what I've screwed up here
yet. Also, I tried flush_dcache_page() as well as a few other cache
flush functions, but it did not help either. Maybe it's not the
cache...?

> Keep me updated! I'll let you know about my results once I set things up :-)

Thanks.




More information about the litmus-dev mailing list