This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: PATCH: Handle various cache size
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Ulrich Drepper <drepper at redhat dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 7 Sep 2010 21:16:27 -0700
- Subject: Re: PATCH: Handle various cache size
- References: <822402900.362631283916267067.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com><919901228.362701283916382367.JavaMail.root@zmail06.collab.prod.int.phx2.redhat.com>
On Tue, Sep 7, 2010 at 8:26 PM, Ulrich Drepper <drepper@redhat.com> wrote:
> ----- "H.J. Lu" <hjl.tools@gmail.com> wrote:
>> If we want to provide the raw cache size
>> values, we should introduce new values so that we don't have to
>> change all those optimized string/memory functions,
>
> This all means that the values have to be used in a way to be influenced by the odd values. ?Why is this the case? ?I would expect the code to not compare the cache size value for quality. ?And if you need to have power-of-two values then I expect the code to easily handle this through some shifts.
Sure. But we have to do it in every call to such functions.
> Masking the value in cacheinfo means to unconditionally deprive users of the information which can deal with these odd values to use them. ?They'd run sub-optimal.
Those values are internal to glibc. Their "users" are those optimized
string/memory
functions. They won't run sub-optimal if we round up those values to multiple of
256bytes.
>
> Have you actually looked at every piece of code using the information?
Yes.
> And: why isn't the x86-64 code affected?
>
Only 32byte memset-sse2.S uses the cache size as one of counters:
L(128bytesormore_nt_start):
sub %ebx, %ecx
ALIGN (4)
L(128bytesormore_shared_cache_loop):
prefetcht0 0x3c0(%edx)
prefetcht0 0x380(%edx)
sub $0x80, %ebx
movdqa %xmm0, (%edx)
movdqa %xmm0, 0x10(%edx)
movdqa %xmm0, 0x20(%edx)
movdqa %xmm0, 0x30(%edx)
movdqa %xmm0, 0x40(%edx)
movdqa %xmm0, 0x50(%edx)
movdqa %xmm0, 0x60(%edx)
movdqa %xmm0, 0x70(%edx)
add $0x80, %edx
cmp $0x80, %ebx
jae L(128bytesormore_shared_cache_loop)
cmp $0x80, %ecx
jb L(shared_cache_loop_end)
where EBX has the cache size. This loop assumes shared cache size
is multiple of 128bytes.
--
H.J.