[PATCH v3] malloc: Optimize small memory zeroing for calloc
Guo, Wangyang
wangyang.guo@intel.com
Fri Nov 29 02:59:22 GMT 2024
On 11/29/2024 10:22 AM, H.J. Lu wrote:
> On Fri, Nov 29, 2024, 9:41 AM Guo, Wangyang <wangyang.guo@intel.com>
> wrote:
>
> On 11/29/2024 6:02 AM, H.J. Lu wrote:
>
> > On Fri, Nov 29, 2024 at 12:24 AM Wilco Dijkstra
> <Wilco.Dijkstra@arm.com> wrote:
> >> Hi H.J.,
> >>
> >> +static __always_inline void
> >> +clear_small_memory (INTERNAL_SIZE_T *mem, unsigned long nclears)
>
> >> to avoiding unpredictable branches in this benchmark.
> > Wangang, can you try memset only on Xeon like this?
>
> only using memset does not work well in Xeon platform.
>
> Test Platform: Xeon-8380 Bench: bench-calloc-thread Ratio: New /
> Original time_per_iteration (Lower is Better) Threads# | Ratio
> -----------|------ 1 thread | 1.018 4 threads | 1.015
>
>
> Please try my v4 patch with and without ISA level 3.
Which build options do I need to apply?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20241129/40c82cfe/attachment.htm>
More information about the Libc-alpha
mailing list