[PATCH v4] malloc: Optimize small memory zeroing for calloc
H.J. Lu
hjl.tools@gmail.com
Sat Nov 30 04:07:55 GMT 2024
On Sat, Nov 30, 2024, 2:51 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi H.J.,
>
> +static __always_inline void *
> +clear_memory (void *mem, unsigned long clearsize)
> +{
> + /* Unroll clear memory size up to 9 * INTERNAL_SIZE_T bytes. We know
> + that contents have an odd number of INTERNAL_SIZE_T-sized words;
> + minimally 3 words. */
> + INTERNAL_SIZE_T *d = (INTERNAL_SIZE_T *) mem;
> + unsigned long nclears = clearsize / sizeof (INTERNAL_SIZE_T);
> +
> + if (nclears > 9)
> + return memset (d, 0, clearsize);
> +
> + /* Use overlapping stores with 1 branch, instead of up to 3. */
> + *(d + 0) = 0;
> + *(d + 1) = 0;
> + *(d + 2) = 0;
> + *(d + nclears - 2) = 0;
> + *(d + nclears - 2 + 1) = 0;
> + if (nclears > 3)
>
> Should be nclears > 5, right?
>
Will fix.
> + {
> + *(d + 3) = 0;
> + *(d + 3 + 1) = 0;
> + *(d + nclears - 4) = 0;
> + *(d + nclears - 4 + 1) = 0;
> + }
> + else if (nclears < 3)
> + __builtin_unreachable ();
> +
> + return mem;
> +}
>
> This interface makes much more sense indeed. Still, it's hard to see how
> targets could do much better than memset
>
memset needs 1 indirect branch and more conditional
branches.
H.J.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20241130/5c96da08/attachment.htm>
More information about the Libc-alpha
mailing list