This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] powerpc: Use aligned stores in memset



On 18/09/2017 10:54, Florian Weimer wrote:
> On 09/13/2017 03:12 PM, Tulio Magno Quites Machado Filho wrote:
>>> So I think the implementation constraint on the mem* functions is wrong.
>>>   It leads to a slower implementation of the mem* function for most of
>>> userspace which does not access device memory, and even for device
>>> memory, it is probably not what you want.
>> Makes sense.  But as there is nothing in the standard allowing or prohibiting
>> the usage of mem* functions to access caching-inhibited memory, I thought it
>> would make sense to provide functions that are as generic as possible.
> 
> But I have shown that you aren't doing that because of the GCC optimization which inlines the memset call.
> 
> But I won't continue this conversation as I don't see it particularly useful to anyone.  In the end, you are the architecture maintainers, and you should do what you think is best.
> 
> Thanks,
> Florian

I think one way to provide a slight better memcpy implementation for POWER8
and still be able to circumvent the non-aligned on non-cacheable memory
is to use tunables.

The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which
uses unaligned load and stores that I created some time ago but never actually
send upstream.  It shows better performance on both bench-memcpy and
bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large
(which it is mainly dominated by memory throughput and on the environment I am
using, a shared PowerKVM instance, the results does not seem to be reliable).

It could use some tunning, specially on some the range I used for unrolling
the load/stores and it also does not care for unaligned access on cross-page
boundary (which tend to be quite slow on current hardware, but also on
current page size of usual 64k also uncommon).

This first patch does not enable this option as a default for POWER8, it just
add on string tests as an option.  The second patch changes the selection to:

  1. If glibc is configure with tunables, set the new implementation as the
     default for ISA 2.07 (power8).

  2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt
     to disable the new implementation selection.

So programs that rely on aligned loads can set:

GLIBC_TUNABLES=glibc.tune.aligned_memopt=1

And then the memcpy ifunc selection would pick the power7 one which uses
only aligned load and stores.

This is a RFC patch and if the idea sounds to powerpc arch mantainers I can
work on finishing the patch with more comments and send upstream.  I tried
to apply same unaligned idea for memset and memmove, but I could get any real
improvement in neither.

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8

Attachment: bench-memcpy-random.out
Description: Text document

Attachment: bench-memcpy.out
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]