This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi, I am now looking how implement unaligned strncmp. The first step is gather and analyze input data. To get them I used a simple program that you use with commands gcc -fPIC shared strncmp.c -o strncmp.so LD_PRELOAD=./strncmp.so bash 2> data # now run something to gather data #include <stdio.h> int strncmp(unsigned char *x,unsigned char *y,int n){ fprintf(stderr, "n:%i ",n); int i; for (i=0;i<n;i++) { if (x[i]!=y[i]) {fprintf(stderr,"dif\n"); return x[i]-y[i];} if (!x[i]) {fprintf(stderr,"same\n"); return 0;} } fprintf(stderr,"size\n"); return 0; } Now when I looked at data I noticed that n is almost always less than 64, probability of mismatch is around 4/5 and ending at n is 1/5 which I got from 21:15:14:~$ grep dif data | wc -l 102495 21:15:18:~$ grep same data | wc -l 0 21:15:24:~$ grep size data | wc -l 25031 Now given this information I needed to decide how make hot case in strncmp as fast as possible. A relevant part how this is done in strcmp checks first 64 bytes in roughtly following way: strcmp(char *a, char *b) { int or = (((unsigned long) a) | ((unsigned long) b)) & 4095; if (or <= 4096 - 64) /* Does not cross page boundary. */ { /* Function mask16 (a, b) return x such that for 0 <= i < 16 x & (1 << i) == (a[i] != b[i] || a[i] == 0) << i */ unsigned long m = mask16 (a, b); /* Check bytes a[0]..a[15]. */ if (m) return a[ffs (m)] - b[ffs (m)]; m = mask_next48 (a, b); /* Check bytes a[16]..a[63]. */ if (m) return a[ffs (m)] - b[ffs (m)]; } Given these data a solution that adds least overhead is to use mask which has first n bits 1 to filter out matches after n-1'th byte in mask. We first check if there is match(4/5 of cases) then if n<64 and as we checked 64 bytes there is no match. On x64 there is little problem that shifts are done modulo 64 so we must check mask again wnen n>=64. This could be done with following code. strncmp(char *a, char *b, size_t n) if (!n) return 0; int or = (((unsigned long) a) | ((unsigned long) b)) & 4095; if (or <= 4096 - 64) { unsigned long xm = 1L << n; xm = xm ^ (xm - 1); unsigned long m = mask16 (a, b); if (m & xm) return a[ffs (m & xm)] - b[ffs (m & xm)]; m = m | mask_next48 (a, b); if (m & xm) return a[ffs (m & xm)] - b[ffs (m & xm)]; if (n <= 64) return 0; if (m) /* This is needed as x64 does 64bit shifts modulo 64. */ return a[ffs (m)] - b[ffs (m)]; } For larger strings there is a loop case. In strcmp I use a counter to detect crossing a page. For strcmp I could get nearly identical performance by storing in that counter minimum of crossing page and exceeding n bytes. A calculations to archieve this are bit involved so I would like somebody to check that I did not ommit any case. These are at strcmp and strncmp skeletons that I attached. >From code review perspective I do not know which is better, First possibility is check first a c language skeletons, let gcc generate assembly and send diff that adds actual detection. This has disadvantage of duplication between strcmp and strncmp. Second one would be simulate c computatitons in assembly and add it to strcmp-sse2-unaligned.S while ensuring that we do not accidentaly modify additional register. What do you prefer?
Attachment:
strcmp_skeleton.c
Description: Text document
Attachment:
strncmp_skeleton.c
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |