[PATCH] aarch64: Add tunable glibc.memset.dc_zva_threshold

Feng Xue OS fxue@os.amperecomputing.com
Thu Aug 8 03:56:00 GMT 2019


This version disable DC ZVA in emag.

Feng
------
    * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD):
    Disable DC ZVA code if this macro is defined as zero.
    * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD):
    Change to zero to disable using DC ZVA.
---
 ChangeLog                                 |  7 +++++++
 sysdeps/aarch64/multiarch/memset_base64.S | 12 ++++++++++--
 sysdeps/aarch64/multiarch/memset_emag.S   | 12 +++++++-----
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index dbdb85d..ba27f96 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2019-08-08  Feng Xue  <fxue@os.amperecomputing.com>
+
+       * sysdeps/aarch64/multiarch/memset_base64.S (DC_ZVA_THRESHOLD):
+       Disable DC ZVA code if this macro is defined as zero.
+       * sysdeps/aarch64/multiarch/memset_emag.S (DC_ZVA_THRESHOLD):
+       Change to zero to disable using DC ZVA.
+
 2019-07-25  Florian Weimer  <fweimer@redhat.com>

        [BZ #24677]
diff --git a/sysdeps/aarch64/multiarch/memset_base64.S b/sysdeps/aarch64/multiarch/memset_base64.S
index 9a62325..c0cccba 100644
--- a/sysdeps/aarch64/multiarch/memset_base64.S
+++ b/sysdeps/aarch64/multiarch/memset_base64.S
@@ -23,6 +23,7 @@
 # define MEMSET __memset_base64
 #endif

+/* To disable DC ZVA, set this threshold to 0. */
 #ifndef DC_ZVA_THRESHOLD
 # define DC_ZVA_THRESHOLD 512
 #endif
@@ -91,11 +92,12 @@ L(set96):
        .p2align 4
 L(set_long):
        stp     val, val, [dstin]
+       bic     dst, dstin, 15
+#if DC_ZVA_THRESHOLD
        cmp     count, DC_ZVA_THRESHOLD
        ccmp    val, 0, 0, cs
-       bic     dst, dstin, 15
        b.eq    L(zva_64)
-
+#endif
        /* Small-size or non-zero memset does not use DC ZVA. */
        sub     count, dstend, dst

@@ -105,7 +107,11 @@ L(set_long):
         * count is less than 33 bytes, so as to bypass 2 unneccesary stps.
         */
        sub     count, count, 64+16+1
+
+#if DC_ZVA_THRESHOLD
+       /* Align loop on 16-byte boundary, this might be friendly to i-cache. */
        nop
+#endif

 1:     stp     val, val, [dst, 16]
        stp     val, val, [dst, 32]
@@ -121,6 +127,7 @@ L(set_long):
        stp     val, val, [dstend, -16]
        ret

+#if DC_ZVA_THRESHOLD
        .p2align 3
 L(zva_64):
        stp     val, val, [dst, 16]
@@ -173,6 +180,7 @@ L(zva_64):
 1:     stp     val, val, [dstend, -32]
        stp     val, val, [dstend, -16]
        ret
+#endif

 END (MEMSET)
 libc_hidden_builtin_def (MEMSET)
diff --git a/sysdeps/aarch64/multiarch/memset_emag.S b/sysdeps/aarch64/multiarch/memset_emag.S
index 1c1fabc..c2aed62 100644
--- a/sysdeps/aarch64/multiarch/memset_emag.S
+++ b/sysdeps/aarch64/multiarch/memset_emag.S
@@ -21,12 +21,14 @@
 # define MEMSET __memset_emag

 /*
- * Using dc zva to zero memory does not produce better performance if
+ * Using DC ZVA to zero memory does not produce better performance if
  * memory size is not very large, especially when there are multiple
- * processes/threads contending memory/cache. Here we use a somewhat
- * large threshold to trigger usage of dc zva.
-*/
-# define DC_ZVA_THRESHOLD 1024
+ * processes/threads contending memory/cache. Here we set threshold to
+ * zero to disable using DC ZVA, which is good for multi-process/thread
+ * workloads.
+ */
+
+# define DC_ZVA_THRESHOLD 0

 # include "./memset_base64.S"
 #endif
--
1.8.3.1

________________________________________
From: Siddhesh Poyarekar <siddhesh@gotplt.org>
Sent: Wednesday, August 7, 2019 10:12:48 PM
To: Wilco Dijkstra; 'GNU C Library'; Feng Xue OS
Cc: nd
Subject: Re: [PATCH] aarch64: Add tunable glibc.memset.dc_zva_threshold

On 06/08/19 9:47 PM, Wilco Dijkstra wrote:
> Hi Feng,
>
>> I still hope this tuning on dc zva can work for other aarch64 processors.
>> Since we focus on emag, and got no other aarch64 machines on hand,
>> Then, if someone of other aarch64 is willing to test this, that would be better.
>
> I don't believe this kind of tunable is useful in general. DC ZVA exists because
> it gives a speedup - quite significantly so on the latest microarchitectures, but it
> improves gcc_r performance as well on older cores like Cortex-A57.
>
> If you find that it doesn't help emag, the best option is to avoid DC ZVA
> altogether - this is even faster as you don't have to execute the runtime check.
> Or you could use a tunable to select between fixed settings of the DC ZVA.
>
> In fact it might be useful to have a generic tunable which allows one to choose
> specific ifuncs, eg. glibc.memset=__memset_no_dczva.

This is an interesting idea.  Although just for this specific case, it
might be sufficient to implement the glibc.cpu.hwcaps tunable from x86
and have "dczva" as a capability that can be turned on or off with + or -.

But your first suggestion is probably the easiest; drop dc zva
completely for ampere.

Siddhesh



More information about the Libc-alpha mailing list