Flushing shared memory from CPU cache

Wed Nov 21 09:51:00 GMT 2018

On 20.11.2018 16:47, Florian Weimer wrote:
> * Konstantin Kharlamov:
> 
>> On 20.11.2018 16:20, Florian Weimer wrote:
>>> * Konstantin Kharlamov:
>>>
>>>> I have 2 processes and a shared memory in between. Process Î± writes into
>>>> a shm, then notifies the process Î² that it's done writing. Î² gets
>>>> notifications, and starts reading from shm â€” and occasionally sees a
>>>> unintialized memory.
>>>>
>>>> I think it's because changes of Î± are still in CPU cache. I have
>>>> searched a lot, but didn't find any documented behavior. Supposedly,
>>>> calling `munmap()` in Î± before notifying Î² would help, but
>>>> there's no documentation on that matter.
>>>>
>>>> I was about to report a bug, but this requires making a testcase, so I
>>>> wanted beforehand to just ask if anybody knows anything about that
>>>> *(e.g. if it's a known expected behavior, then I probably won't need to
>>>> make a testcase, only to report lack of documentation)*.
>>>
>>> Do you use atomics, following the rules of the memory model?  Either C
>>> atomics from <atomic.h>, C++ atomics from <atomic> or the GCC atomic
>>> builtins?
>>>
>>>     <https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/_005f_005fatomic-Builtins.html>
>>>
>>> If not, then what you are observing is not unusual at all.
>>
>> Do you mean, casting result of `mmap()` call to e.g.
>> `std::atomic<void*>`?
> 
> Well, not void *, but some integer type, but that's the idea, yes.
> 
>> Yeah, as a temporary workaround I did it (and it worked), but it's
>> likely a undefined behavior; besides I imagine it would make flush on
>> every write through the pointer, so e.g. for copying a 1M buffer I'd
>> get 10â¶ writes. Whereas I only need to flush shm once, after process Î±
>> done copying.
> 
> You can use fences in combination with the relaxed memory order.  There
> is no need for a system call.

Thanks your help! The problem is solved. To summarize what I learned:

I was told on IRC that on x86_64 there's cache coherency (I'm not sure
whether in general, or for shared memory, but from a cursory googling
it's probably specifically about shared memory).

There was a theory that what happens instead is instructions reordering,
so process Î² gets notified before copying actually finished. So I
removed the previous hack with std::atomic<char*>, and inserted memory
barriers after the copying done, in my case:

	std::atomic_thread_fence(std::memory_order_seq_cst);

â€¦and the problem seems fixed! At least I ran a test for 100 times in a
loop, and couldn't reproduce it anymore, whereas before it happened â‰ˆ1
of 8 runs.