This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp


On Mon, Feb 03, 2014 at 05:16:13PM +0100, Florian Weimer wrote:
> On 02/03/2014 03:43 PM, OndÅej BÃlka wrote:
> 
> >And there is third factor that memcmp with small constant arguments
> >could be inlined. This is not case now but a patch would be welcome.
> 
> Inlining memcmp in GCC has historically been a bad decision.
> Perhaps we could make an exception for memcmp calls with known
> alignment and really small sizes.  In terms of GCC optimizations,
> dispatching to a few versions specialized for certain lengths, and a
> version that only delivers an unordered, boolean result promises
> significant wins as well.
> 
That is problem in gcc that builtins are often badly optimized. Second
problem is that expansion needs to be small or you will lose when you
inline cold code.

Also making that a builtin adds unnecessary complexity, adding these
conditions to header is simpler.

In addition to constant sizes when you know that size is always larger 
than 8 and mismatch is likely there then you could do use inlined version below.

There is no need for specialized unordered case when you do comparison,
gcc is smart enough to optimize these as well as memcmp(x,y,n) > 0 case. Following:

int foo (int x)
{
  if (x>0) return 1;
  if (x<0) return -1;
  return 0;
}

int bar(int x){
  if (foo(x))
    return 4;
  else
   return 2;
}

gets optimized to

bar:
.LFB1:
        .cfi_startproc
        cmpl    $1, %edi
        sbbl    %eax, %eax
        andl    $-2, %eax
        addl    $4, %eax
        ret

And expansion that I talked about is here, I could make that cross
platform with check if unaligned loads are ok and bswap is reasonably
fast.



#include <stdint.h>
#include <string.h>
#undef memcmp

#define memcmp(x, y, n) \
({ \
  void *__x = x, *__y = y; \
  size_t __n = n; \
  int __ret; \
  if (__builtin_constant_p (__n >= 8)) \
    { \
      uint64_t __a = __builtin_bswap64(*((uint64_t *) __x)); \
      uint64_t __b = __builtin_bswap64(*((uint64_t *) __y)); \
      if (__a > __b) \
        __ret = 1; \
      else if (__a < __b) \
        __ret = -1; \
      else \
        __ret = __memcmp (__x + 8, __y + 8, __n - 8); \
    } \
  else \
    __ret = __memcmp (__x, __y, __n); \
  __ret;\
})

int foo(char *x, char *y){
  if (memcmp(x,y,10) > 0)
    return 15;
  else
   return 42;
}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]