This is the mail archive of the
mailing list for the glibc project.
Re: gcc 4.1 implements compiler builtins for atomic ops
From: Ulrich Drepper <firstname.lastname@example.org>
Date: Sun, 26 Jun 2005 17:50:16 -0700
> But the real problem with your argumentation is that there is no reason
> why the locking code should have a higher probability of defects in the
> processor than all the other parts combined.
>From working and talking with folks who have to deal with such
processor bugs, I come away with an opinion which differs from your's.
I've seen atomic operation bugs that resulted from any number of
problems. For example, I know of one case where atomic operations
failed unless done within a single instruction cache line due to
a problem with a NUMA gateway implementation. If an instruction
cache miss was generated during the atomic operation, you'd get
corruption in the memory the atomic operation was on.
No amount of microcode is going to fix bugs like that, yet a vDSO
page or library based implementation could handle that properly.
Especially, since you don't want GCC outputting every inline
atomic operation aligned to an I-cache line, calling out to a
function or similar is much more efficient in this case.
At the urging of another posting here, I read the GCC documentation on
the builtins. And sadly, the GCC atomic builtin memory ordering
semantics are very suboptimal. You don't need hard ordering if you
just want a counter to update atomically, and you don't care what
order other memory operations occur in wrt. that atomic operation.
Some processors eat a huge cost from the memory barriers, so avoiding
them for simple things such as an atomic counter used to collect
statistics or for reference counting is really needed.
This also applies to atomic operations on bitmaps and stuff like that.
We actually have a document in the Linux kernel which tries to
document precisely all of these cases and issues. It's called