[PATCH] string: Add page tests for test-strncasecmp and test-strncpy

Mon Jun 1 18:53:35 GMT 2020

Thanks for the review, it helped me find some bugs, I'll fix it and add 
more comments to the code:

On 29/05/2020 18:13, Paul A. Clarke wrote:
> On Fri, May 29, 2020 at 10:53:05AM -0300, Raphael Moreira Zinsly via Libc-alpha wrote:
>> Adds tests to check if strings placed at page bondaries are
>> handled correctly by strncasecmp and strncpy similar to tests
>> for strncmp and strnlen.  This should catch regressions in the
>> optmized functions.
>> ---
> [snip]
>> diff --git a/string/test-strncasecmp.c b/string/test-strncasecmp.c
>> index 6a9c27beae..0399f37117 100644
>> --- a/string/test-strncasecmp.c
>> +++ b/string/test-strncasecmp.c
>> @@ -137,6 +137,44 @@ do_test (size_t align1, size_t align2, size_t n, size_t len, int max_char,
> 
> These tests could use some comments, since there seems to be some
> arbitrary choices, making it difficult to determine what is being
> tested...
> 
>> +static void
>> +do_page_tests (void)
>> +{
>> +  size_t i, offset1, offset2;
>> +  char *s1, *s2, *s3;
>> +  int exp_result;
>> +
>> +  s1 = (char *) buf1;
>> +  for (i = 0; i < page_size - 1; i++)
>> +    s1[i] = 23;
> 
> Magic number. Does this 23 have any significance?

No, but it actually would be better to use memset and '\1' as I did for 
the strncpy test.

> 
>> +  s1[i] = 0;
>> +
>> +  s3 = strdup (s1);
>> +  offset2 = 2636;
> 
> Magic number.  How was it chosen and why?

These offsets were picked from the strncmp test, that came originally 
from https://sourceware.org/bugzilla/show_bug.cgi?id=12597

> 
>> +
>> +  for (i = 0; i < 64; ++i)
> 
> Does 64 have some signficance?
> It might be a cache line length, in that you are trying to cross
> the page boundary at every offset in a cache line?  And if that's
> the case, maybe 128 would be better, since it covers more
> architectures?  (Or, maybe even query the cache line size?)

Yes, I liked the suggestion of using 128.

> 
>> +    {
>> +      offset1 = 3988 + i;
> 
> Another magic number >
>> +
>> +      if (offset1 >= page_size
>> +	  || offset2 >= page_size)
>> +	return;
> 
> Maybe a comment before this stating something like
> "make sure we don't run past the end of the buffer"?
> 
> Also, offset2 is fixed, so that part of this test need
> not be in the loop.  And, if you pick a good offset2,
> as I suggest below, there is no need to test it at all.
> If the constant to which offset2 is set just happens to
> be greater than page_size, then this entire test does
> nothing.

There is a bug here, it should be "continue" instead of return.

> 
>> +
>> +      s1 = (char *) buf1;
>> +      s2 = s3;
> 
> Maybe a comment "reset pointers to base addresses" or the like?
> 
>> +      s1 += offset1;
>> +      s2 += offset2;
> 
> So, offset1 ranges from [3988..3988+63] ? offset2 is fixed at 2636.
> Since offset2 is fixed, why not just set "s3 = s2 + offset2"
> before the loop and then this increment is unnecessary.

Agreed, thanks for the suggestion.

> 
> Maybe we need to note somewhere that "page_size" is actually
> *2* pages and that doing a strncasecmp from an offset which is
> less than a full page to page_size is guaranteed to cross a page
> boundary.
> It might be a good idea to write code that ensures that the
> offsets are both indeed less than a single page size.
> Something like "offset2 = page_size / 4" and change the
> 3988 to something relative to page_size ("3 * page_size / 4"?).
> 

Agreed.

>> +
>> +      exp_result= *s1;
> 
> strncasecmp man page says it returns "an integer less than, equal to,
> or greater than zero".  It doesn't seem to be appropriate to look for
> a specific value.

This is a valid point, but all other functions besides do_random_tests 
look for a specific value, I'm just keeping the behavior.

> 
>> +
>> +      FOR_EACH_IMPL (impl, 0)
>> +	{
>> +	  check_result (impl, s1, s2, page_size, -exp_result);
>> +	  check_result (impl, s2, s1, page_size, exp_result);
>> +	}
>> +    }
>> +}
> [snip]
>> diff --git a/string/test-strncpy.c b/string/test-strncpy.c
>> index c978753ad8..30c69dd34b 100644
>> --- a/string/test-strncpy.c
>> +++ b/string/test-strncpy.c
>> @@ -155,6 +155,42 @@ do_test (size_t align1, size_t align2, size_t len, size_t n, int max_char)
> [snip]
>> +static void
>> +do_page_tests (void)
>> +{
>> +  size_t i, len, start_offset, offset;
>> +  CHAR *s1, *s2;
>> +  /* Calculate the null character offset.  */
>> +  size_t last_offset = (page_size / sizeof (CHAR)) - 1;
>> +
>> +  s2 = (CHAR *) buf1;
>> +  s1 = (CHAR *) buf2;
>> +  MEMSET (s1, '\1', last_offset);
>> +  s1[last_offset] = '\0';
>> +
>> +  /* Place short strings ending at page boundary.  */
>> +  offset = last_offset;
>> +  for (i = 0; i < 128; i++)
> 
> Is 128 enough? Why was this chosen?

It should be SMALL_CHAR.

> 
>> +    {
>> +      offset--;
>> +      len = last_offset - offset;
>> +      FOR_EACH_IMPL (impl, 0)
>> +	do_one_test (impl, s2, (CHAR *) (s1 + offset), len, len);
>> +    }
>> +
>> +  /* Place long strings ending at page boundary.  */
>> +  start_offset = (last_offset + 1) / 2;
>> +  for (i = 0; i < 64; ++i)
> 
> Is 64 enough? Why was this chosen?

This was intended to test each alignment in a cache line, it could use 
128 to cover more architectures as suggested for test-strncacecmp.

> 
>> +    {
>> +      offset = start_offset + i;
>> +      if (offset >= (last_offset + 1))
>> +	break;
>> +      len = last_offset - offset;
>> +      FOR_EACH_IMPL (impl, 0)
>> +	do_one_test (impl, s2, (CHAR *) (s1 + offset), page_size, len);
>> +    }
> 
> Would it be bad to collapse these 2 loops into one that tests every
> length from 1..page_size?

We could collapse both into one loop with 128 iterations, I don't think 
using page_size would be bad but we would cover many alignments that are 
already covered by other tests.

> 
>> +}
> [snip]
> 
> PC
> 

Best regards,
-- 
Raphael Moreira Zinsly
IBM
Linux on Power Toolchain