This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: Use aligned stores in memset
On 18/08/2017 06:10, Florian Weimer wrote:
> On 08/18/2017 08:51 AM, Rajalakshmi Srinivasaraghavan wrote:
>>
>>
>> On 08/18/2017 11:51 AM, Florian Weimer wrote:
>>> On 08/18/2017 07:11 AM, Rajalakshmi Srinivasaraghavan wrote:
>>>> * sysdeps/powerpc/powerpc64/power8/memset.S: Store byte by byte
>>>> for unaligned inputs if size is less than 8.
>>>
>>> This makes me rather nervous. powerpc64le was supposed to have
>>> reasonable efficient unaligned loads and stores. GCC happily generates
>>> them, too.
>>
>> This is meant ONLY for caching inhibited accesses. Caching Inhibited
>> accesses are required to be Guarded and properly aligned.
>
> The intent is to support memset for such memory regions, right? This
> change is insufficient. You have to fix GCC as well because it will
> inline memset of unaligned pointers, like this:
>
> typedef long __attribute__ ((aligned(1))) long_unaligned;
>
> void
> clear (long_unaligned *p)
> {
> memset (p, 0, sizeof (*p));
> }
>
> clear:
> li 9,0
> std 9,0(3)
> blr
>
> That's why I think your change is not useful in isolation.
POWER8 does have fast unaligned access memory and in fact unaligned access
could be used to provide a faster memcpy/memmove implementation (I created
one that I never sent upstream some time ago [1]). Unaligned accesses are
used extensively in some optimized str* implementation I created for POWER8.
It also allows GCC to use unaligned access for builtin mem* operation without
issue on *most* of the cases.
The problem is memset/memcpy/memmove *specifically* are used in some userland
drivers for DMA (if I recall correctly for some XORG drivers) and for this
specific user cases using unaligned access, specially vector ones, will case
the kernel to trap on *every* unaligned instruction leading to abysmal
performance. That's why I pushed 87868c2418fb74357757e3b739ce5b76b17a8929
to fix this very issue for POWER7 memcpy.
We already discussed this same issue some time ago [2] to try overcome this
limitation. I think ideally the drivers that rely on aligned mem* operations
should we its own mem* operations (similar to how dpdk does [3]).
[1] https://github.com/zatrazz/glibc/commits/memopt-power8
[2] https://sourceware.org/ml/libc-alpha/2015-01/msg00130.html
[3] http://dpdk.org/browse/dpdk/tree/lib/librte_eal/common/include/arch/ppc_64/rte_memcpy.h