This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Variations of memset()

From: "H.J. Lu" <hjl dot tools at gmail dot com>
To: "Carlos O'Donell" <carlos at redhat dot com>
Cc: Matthew Wilcox <willy at infradead dot org>, Jeff Law <law at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 4 Aug 2017 13:28:17 -0700
Subject: Re: Variations of memset()
Authentication-results: sourceware.org; auth=none
References: <20170804171117.GA13948@bombadil.infradead.org> <9d2492af-08e4-1048-78bb-93cadcce4b7f@redhat.com> <20170804190213.GB13948@bombadil.infradead.org> <e3cab671-5e43-55fb-8e23-786368fde5e3@redhat.com> <df60cb45-0628-a5fa-723d-cc619e4a2aac@redhat.com>

On Fri, Aug 4, 2017 at 12:54 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 08/04/2017 03:11 PM, Carlos O'Donell wrote:
>> On 08/04/2017 03:02 PM, Matthew Wilcox wrote:
>>> Here's the sample usage from the symbios driver:
>>>
>>> -               for (i = 0 ; i < 64 ; i++)
>>> -                       tp->luntbl[i] = cpu_to_scr(vtobus(&np->badlun_sa));
>>> +               memset32(tp->luntbl, cpu_to_scr(vtobus(&np->badlun_sa)), 64);
>>>
>>> I expect a lot of users would be of this type; simply replacing the
>>> explicit for-loop equivalent with a library call.
>>
>> Have you measured the performance of this kind of conversion when using a
>> simple application and a library implementing your various memset routines?
>> In the kernel is one thing, outside of the kernel we have dynamic linking
>> and no-inling across that shared object boundary.
>
> I want to  reiterate that measuring the performance of various options in
> userspace is going to be relevant (particularly when they vary from the kernel):
>
> * Application doing the naive loop above (-O0).
>
> * Application doing the naive loop above ([-O2,-O3] + <vectorize options>).
>
> * Application calling memset32 (-O0)
>
> * Application calling memset32 (-O3)
>
> <vectorize options>="-ftree-vectorize [-msse2,-mavx] -fopt-info-missed=missed.all"
>
> You need to split the memset32 into another DSO to simulate this accurately.
>

These functions aren't very useful for x86-64 where wmemset,
aka, memset32, is implemented with memset:

Dump of assembler code for function __wmemset_sse2_unaligned:
   0x0000000000000020 <+0>: shl    $0x2,%rdx
   0x0000000000000024 <+4>: movd   %esi,%xmm0
   0x0000000000000028 <+8>: mov    %rdi,%rax
   0x000000000000002b <+11>: pshufd $0x0,%xmm0,%xmm0
   0x0000000000000030 <+16>: jmp    0x64 <__memset_sse2_unaligned+20>
End of assembler dump.


-- 
H.J.

References:
- Variations of memset()
  - From: Matthew Wilcox
- Re: Variations of memset()
  - From: Carlos O'Donell
- Re: Variations of memset()
  - From: Matthew Wilcox
- Re: Variations of memset()
  - From: Carlos O'Donell
- Re: Variations of memset()
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]