This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] PPC atomic.h add compare_exchange_val forms


Hi,

> I've struggled with this idea for HPPA since the architecture reference
> explicitly states that only a single lock word is allowed on the 
cacheline
> (stride is 64-128 bytes wide depending on the processor).
> 
> Padding to cacheline size was attempted, but static locks would have to
> pad to maximum cacheline size. This seemed to be wasteful and problematic 
> for backwards binary compatibility, it also gave the linker some 
headaches.
> 
> Do you just live with the fact that two locks _could_ reside on the same
> cacheline?

Yes, I have always wondered the same thing.  In fact, if any writes to the 
same cache line (but not the same exact reserved address) also clear the 
reservation, then you could certainly slow things down by accessing items 
in the structure that might fall in the same cache line as the atomic_t
(for example even immediately *before* it in the case of locks)

For example, doesn't the latest linuxthreads code use the following:

struct _pthread_fastlock
{
   long int __status;
   int __spinlock;
}

I would guess both of these fields would use some form of load and reserve 
approach on powerpc for atomic increment/decrement of the __spinlock field 
versus compare_and_swap on the __status field which could theoretically 
interfere with each other.

Lucikly, I do not think the __spinlock field is ever used if the platform 
has compare_and_swap (then only the __status field is used)?  Isn't it?  

But if anythign was ever written to the __spinlock field by one thread 
while another was fighting to get a clear reservation on the __status 
field it could slow down progress.

So some padding around mutex_locks and atomic types might improve 
performance (reduce meaningless reservation clears).  

On many ppc32 machines the cache line size in 32 bytes and my quick count 
of the size of pthread_mutex_t is roughly 24 bytes so having two mutex_t 
in the same cache line is not possible.  But this is certainly possible 
for the simple atomic_t as defined in the kernel for example.

I just have no idea of the magnitude of the impact if any.  I would guess 
that as cache line size gets larger 128 bytes, and etc, the probability of 
having multiple atomic types fall into the same cache_line gets larger and 
also frequent writes to other addresses in that cache line should be more 
frequent as well.

Perhaps this only impact performance slightly if at all.  I don't know.

Has anyone at IBM ever measured this impact or even foudn an application 
that by chance ended up in live-lock caused by having two atomic_t on the 
same cache line?

Kevin





Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]