[PATCH v3] malloc: Optimize small memory zeroing for calloc

Guo, Wangyang wangyang.guo@intel.com
Fri Nov 29 02:59:22 GMT 2024


On 11/29/2024 10:22 AM, H.J. Lu wrote:
> On Fri, Nov 29, 2024, 9:41 AM Guo, Wangyang <wangyang.guo@intel.com> 
> wrote:
>
>     On 11/29/2024 6:02 AM, H.J. Lu wrote:
>
>     > On Fri, Nov 29, 2024 at 12:24 AM Wilco Dijkstra
>     <Wilco.Dijkstra@arm.com> wrote:
>     >> Hi H.J.,
>     >>
>     >> +static __always_inline void
>     >> +clear_small_memory (INTERNAL_SIZE_T *mem, unsigned long nclears)
>
>     >> to avoiding unpredictable branches in this benchmark.
>     > Wangang, can you try memset only on Xeon like this?
>
>     only using memset does not work well in Xeon platform.
>
>     Test Platform: Xeon-8380 Bench: bench-calloc-thread Ratio: New /
>     Original time_per_iteration (Lower is Better) Threads# | Ratio
>     -----------|------ 1 thread | 1.018 4 threads | 1.015
>
>
> Please try my v4 patch with and without ISA level 3.

Which build options do I need to apply?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20241129/40c82cfe/attachment.htm>


More information about the Libc-alpha mailing list