This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Suggestion for improvement to strupr and strlwr functions.


J.D McLaughlin wrote:
Hi,

Examining the source code to C's string functions, I noticed that strupr()
called toupper() on each char in turn if isupper() returned false(or maybe
if islower() returned true). However, the toupper() function itself calls
this function to check beforehand whether or not it needs to do anything,
and vice versa for the strlwr() function. It looks as if it would be
better to avoid the duplicated check by just calling toupper for every
character in the array - although I could be wrong, if the string had less
than half it's characters in lower case, it would be constantly calling
toupper and isupper for characters where only isupper was needed.

I'm not sure what the statistical properties are of strings generally
input to strupr, but I thought I'd share this idea in case in the
average-case scenario it's more efficient.

James McLaughlin.



James,


If you look at newlib's <ctype.h> which gets included you will see that isupper/islower are macros and toupper/tolower are essentially inlined functions. Yes, the islower/isupper invocations end up being redundant in the normal GNUC situation.

However, in the case of x86 gcc (and mn10300 gcc), for example, the compiler has everything it needs to recognize that the isupper/islower check is redundant and simply optimizes out the 2nd check. Here's a snippet from the x86 compiled code of strupr.o:

  1d:   8b bb 00 00 00 00       mov    0x0(%ebx),%edi
                        1f: R_386_GOT32 __ctype_ptr
  23:   90                      nop
    {
      if (islower (*a))
  24:   8b 07                   mov    (%edi),%eax
  26:   0f be d2                movsbl %dl,%edx
  29:   f6 04 02 02             testb  $0x2,(%edx,%eax,1)
  2d:   74 05                   je     34 <strupr+0x34>
        *a = toupper (*a);
  2f:   8d 42 e0                lea    0xffffffe0(%edx),%eax
  32:   88 01                   mov    %al,(%ecx)
  34:   8a 51 01                mov    0x1(%ecx),%dl
  37:   41                      inc    %ecx
  38:   84 d2                   test   %dl,%dl
  3a:   75 e8                   jne    24 <strupr+0x24>
      ++a;
    }

Notice how the testb (islower) is only done the one time in the loop.

In the case of a non-GNUC compiler, the toupper call will be a function call which will very likely be more costly than an isupper/islower macro check. In that particular situation, savings/wastage depend on the input string as you have noted.

The function itself is non-standard (not ANSI, POSIX, C99, or Single Unix) and isn't likely to be that commonly used.

-- Jeff J.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]