strncasecmp bugfix and improvements
Mon Apr 6 22:30:00 GMT 2009
I'm going to pick on strncasecmp specifically, but strlwr, strupr, and strcasecmp have similar problems. All of these routines needlessly rely on an OPTIONAL extended ctype table which might not be built in and might not work properly. The changes I am suggesting avoid requiring the extended ctype table and generally improve the generated code.
I am looking at both newlib 1.12 (installed with my Nios2 tools) and newlib 1.17.0. The pertinent string/ sources have not changed between these versions. The ctype/ sources HAVE changed but do not affect the issue at hand. Test builds are using gcc/g++ version 3.4.1 (nios2) and gcc/g++ version 3.4.4 (pc).
43 #include <string.h>
44 #include <ctype.h>
47 _DEFUN (strncasecmp, (s1, s2, n),
48 _CONST char *s1 _AND
49 _CONST char *s2 _AND
50 size_t n)
52 if (n == 0)
53 return 0;
55 while (n-- != 0 && tolower(*s1) == tolower(*s2))
57 if (n == 0 || *s1 == '\0' || *s2 == '\0')
63 return tolower(*(unsigned char *) s1) - tolower(*(unsigned char *) s2);
There are several problems with the current implementation. Line 55 contains a bug: tolower is possibly called with argument values in the range −128 to +127. Other problems result in needlessly suboptimal size and performance.
It is redundant to call tolower in the return statement. The compiler CANNOT optimize these calls away, even if they are pure macros/inlines, because the arguments differ due to the aforementioned bug. The simplest possible fix is to use unsigned char in the while statement as well as in the return statement:
55 while (n-- != 0 && tolower(*(unsigned char *) s1) == tolower(*(unsigned char *) s2))
It is redundant to check *s2 == '\0' in line 57; this test is unreachable. Furthermore, using unsigned char consistently here will likely result in better generated code.
It is redundant to check n == 0 in three places. Let's assume it is self-evident that, on average, it is much more likely that the current value of n is nonzero than that the current value of n is zero.
I propose the following implementation which fixes the bug and improves both size and performance:
int strncasecmp ( const char *s1, const char *s2, size_t n )
const unsigned char *ucs1 = (const unsigned char *) s1;
const unsigned char *ucs2 = (const unsigned char *) s2;
int d = 0;
for ( ; n != 0; n--)
const int c1 = tolower(*ucs1++);
const int c2 = tolower(*ucs2++);
if (((d = c1 - c2) != 0) || (c2 == '\0'))
We could test either c1 or c2 in the inner if. Testing c2 here avoids a register copy on 2-operand instruction architectures.
On both Nios2 and x86, the new implementation is both smaller and faster than the original. On Nios2, the generated code is reduced from 54 to 28 instructions! The savings on x86 (with -O3) is similar. And, of course, tolower is being called only with values in the unsigned char range, thus fixing the bug.
Can anyone tell me why we have
#define isupper(c) ((__ctype_ptr__)[(unsigned)((c)+1)]&_U)
#define isupper(c) ((__ctype_ptr__)[(int)(c)+1]&_U)
Given the extended ctype table, it seems the first implementation works by accident. On any architecture where address size is bigger than unsigned, the first implementation does NOT work; arguments in the range -127 to -2 result in a wild fetch past the end of the ctype table. The older implementation (newlib 1.12) is even worse; under the same condition it doesn't work for EOF either.
More information about the Newlib