This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)



On 19/05/2017 13:14, H.J. Lu wrote:
> On Fri, May 19, 2017 at 8:10 AM, Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>>
>> On 19/05/2017 11:45, Adhemerval Zanella wrote:
>>>
>>>
>>> On 19/05/2017 10:44, H.J. Lu wrote:
>>>> On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>> Ping (with the page_size/2 fix included).
>>>>>>
>>>>>> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>>>>>>> This patch fixes the regression added by 23d2770 for final address
>>>>>>> overflow calculation.  The subtraction of the considered size (16)
>>>>>>> at line 120 is at wrong place, for sizes less than 16 subsequent
>>>>>>> overflow check will not take in consideration an invalid size (since
>>>>>>> the subtraction will be negative).  Also, the lea instruction also
>>>>>>> does not raise the carry flag (CF) that is used in subsequent jbe
>>>>>>> to check for overflow.
>>>>>>>
>>>>>>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>>>>>>> is first check and a sub instruction is issued.  In case of resulting
>>>>>>> negative size, CF will be set by the sub instruction and a NULL
>>>>>>> result will be returned.  The patch also add similar tests reported
>>>>>>> in bug report.
>>>>>>>
>>>>>>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>>>>>>
>>>>>>>       [BZ# 21182]
>>>>>>>       * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>>>>>>       near end of a page.
>>>>>>>       * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>>>>>>       overflow calculation.
>>>>>>> ---
>>>>>>>  ChangeLog                                 | 7 +++++++
>>>>>>>  string/test-memchr.c                      | 6 ++++++
>>>>>>>  sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>>>>>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>>>>>>> index d64d10c..87a077b 100644
>>>>>>> --- a/string/test-memchr.c
>>>>>>> +++ b/string/test-memchr.c
>>>>>>> @@ -210,6 +210,12 @@ test_main (void)
>>>>>>>        do_test (0, i, i + 1, i + 1, 0);
>>>>>>>      }
>>>>>>>
>>>>>>> +  /* BZ#21182 - wrong overflow calculation for i686 implementation
>>>>>>> +     with address near end of the page.  */
>>>>>>> +  for (i = 2; i < 16; ++i)
>>>>>>> +    /* page_size is in fact getpagesize() * 2.  */
>>>>>>> +    do_test (page_size/2 - i, i, i, 1, 0x9B);
>>>>>>> +
>>>>>>>    do_random_tests ();
>>>>>>>    return ret;
>>>>>>>  }
>>>>>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>>> index 910679c..e41f324 100644
>>>>>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>>>> @@ -117,7 +117,6 @@ L(crosscache):
>>>>>>>
>>>>>>>  # ifndef USE_AS_RAWMEMCHR
>>>>>>>       jnz     L(match_case2_prolog1)
>>>>>>> -     lea     -16(%edx), %edx
>>>>>>>          /* Calculate the last acceptable address and check for possible
>>>>>>>             addition overflow by using satured math:
>>>>>>>             edx = ecx + edx
>>>>>>> @@ -125,6 +124,7 @@ L(crosscache):
>>>>>>>       add     %ecx, %edx
>>>>>>>       sbb     %eax, %eax
>>>>>>>       or      %eax, %edx
>>>>>>> +     sub     $16, %edx
>>>>>>>       jbe     L(return_null)
>>>>>>>       lea     16(%edi), %edi
>>>>>>>  # else
>>>>>>>
>>>>>
>>>>> Looks good.
>>>>>
>>>>>
>>>>
>>>> Just a thought.  Is this approach
>>>>
>>>>         /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)"
>>>>            instead of "(edx + ecx) - 16" to void possible addition
>>>>            overflow.  */
>>>>         neg     %ecx
>>>>         add     $16, %ecx
>>>>         sub     %ecx, %edx
>>>>         jbe     L(return_null)
>>>>
>>>> a little bit better?
>>>
>>> I do not have a strong preference here, although imho it is simpler to
>>> understand why the saturated math would work here and it is the same
>>> strategy used on other arch implementations.  May you might outline in
>>> comment that ecx is always in the range of [0,64) so '-ecx + 16' can
>>> possible underflow.  Is is this change just for micro-optimization?
>>>
>>
>> Sorry, my bad here.  I made some confusion, in fact what I meant was ecx
>> was indeed in range [0,16) (is the and instruction for alignment that
>> uses 64) and your suggestion *can't* overflow.  I would recommend
>> you to just outline the possible value of 'edx' in comments.
> 
> Here is a patch for SSE2 memchr.   OK for master?
> 

LGTM, thanks.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]