[PATCH v4] malloc: Optimize small memory zeroing for calloc

H.J. Lu hjl.tools@gmail.com
Sat Nov 30 04:07:55 GMT 2024


On Sat, Nov 30, 2024, 2:51 AM Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:

> Hi H.J.,
>
> +static __always_inline void *
> +clear_memory (void *mem, unsigned long clearsize)
> +{
> +  /* Unroll clear memory size up to 9 * INTERNAL_SIZE_T bytes.  We know
> +     that contents have an odd number of INTERNAL_SIZE_T-sized words;
> +     minimally 3 words.  */
> +  INTERNAL_SIZE_T *d = (INTERNAL_SIZE_T *) mem;
> +  unsigned long nclears = clearsize / sizeof (INTERNAL_SIZE_T);
> +
> +  if (nclears > 9)
> +    return memset (d, 0, clearsize);
> +
> +  /* Use overlapping stores with 1 branch, instead of up to 3.  */
> +  *(d + 0) = 0;
> +  *(d + 1) = 0;
> +  *(d + 2) = 0;
> +  *(d + nclears - 2) = 0;
> +  *(d + nclears - 2 + 1) = 0;
> +  if (nclears > 3)
>
> Should be nclears > 5, right?
>

Will fix.


> +    {
> +      *(d + 3) = 0;
> +      *(d + 3 + 1) = 0;
> +      *(d + nclears - 4) = 0;
> +      *(d + nclears - 4 + 1) = 0;
> +    }
> +  else if (nclears < 3)
> +    __builtin_unreachable ();
> +
> +  return mem;
> +}
>
> This interface makes much more sense indeed. Still, it's hard to see how
> targets could do much better than memset
>

memset needs 1 indirect branch and more conditional
branches.

H.J.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20241130/5c96da08/attachment.htm>


More information about the Libc-alpha mailing list