This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Fri, 19 May 2017 12:10:12 -0300
- Subject: Re: [PATCH] Fix i686 memchr overflow calculation (BZ#21182)
- Authentication-results: sourceware.org; auth=none
- References: <1489512487-24860-1-git-send-email-adhemerval.zanella@linaro.org> <40ea40ff-763f-e5ff-4bf1-1d24db403df3@linaro.org> <CAMe9rOoyTNL6jmmpSQPSG-8cQGYo4eL78=z+kQkvgPw8ZjFC9A@mail.gmail.com> <CAMe9rOrxU79HmdvLDv=hq9Pkb7+Dg8iB2ozWr4QVs_ZduT4yYA@mail.gmail.com> <5d284d70-b6e5-b92e-185a-bfede9777ad8@linaro.org>
On 19/05/2017 11:45, Adhemerval Zanella wrote:
>
>
> On 19/05/2017 10:44, H.J. Lu wrote:
>> On Tue, Mar 28, 2017 at 2:47 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Tue, Mar 28, 2017 at 12:22 PM, Adhemerval Zanella
>>> <adhemerval.zanella@linaro.org> wrote:
>>>> Ping (with the page_size/2 fix included).
>>>>
>>>> On 14/03/2017 14:28, Adhemerval Zanella wrote:
>>>>> This patch fixes the regression added by 23d2770 for final address
>>>>> overflow calculation. The subtraction of the considered size (16)
>>>>> at line 120 is at wrong place, for sizes less than 16 subsequent
>>>>> overflow check will not take in consideration an invalid size (since
>>>>> the subtraction will be negative). Also, the lea instruction also
>>>>> does not raise the carry flag (CF) that is used in subsequent jbe
>>>>> to check for overflow.
>>>>>
>>>>> The fix is to follow x86_64 logic from 3daef2c where the overflow
>>>>> is first check and a sub instruction is issued. In case of resulting
>>>>> negative size, CF will be set by the sub instruction and a NULL
>>>>> result will be returned. The patch also add similar tests reported
>>>>> in bug report.
>>>>>
>>>>> Checked on i686-linux-gnu and x86_64-linux-gnu.
>>>>>
>>>>> [BZ# 21182]
>>>>> * string/test-memchr.c (do_test): Add BZ#21182 checks for address
>>>>> near end of a page.
>>>>> * sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
>>>>> overflow calculation.
>>>>> ---
>>>>> ChangeLog | 7 +++++++
>>>>> string/test-memchr.c | 6 ++++++
>>>>> sysdeps/i386/i686/multiarch/memchr-sse2.S | 2 +-
>>>>> 3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/string/test-memchr.c b/string/test-memchr.c
>>>>> index d64d10c..87a077b 100644
>>>>> --- a/string/test-memchr.c
>>>>> +++ b/string/test-memchr.c
>>>>> @@ -210,6 +210,12 @@ test_main (void)
>>>>> do_test (0, i, i + 1, i + 1, 0);
>>>>> }
>>>>>
>>>>> + /* BZ#21182 - wrong overflow calculation for i686 implementation
>>>>> + with address near end of the page. */
>>>>> + for (i = 2; i < 16; ++i)
>>>>> + /* page_size is in fact getpagesize() * 2. */
>>>>> + do_test (page_size/2 - i, i, i, 1, 0x9B);
>>>>> +
>>>>> do_random_tests ();
>>>>> return ret;
>>>>> }
>>>>> diff --git a/sysdeps/i386/i686/multiarch/memchr-sse2.S b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>> index 910679c..e41f324 100644
>>>>> --- a/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>> +++ b/sysdeps/i386/i686/multiarch/memchr-sse2.S
>>>>> @@ -117,7 +117,6 @@ L(crosscache):
>>>>>
>>>>> # ifndef USE_AS_RAWMEMCHR
>>>>> jnz L(match_case2_prolog1)
>>>>> - lea -16(%edx), %edx
>>>>> /* Calculate the last acceptable address and check for possible
>>>>> addition overflow by using satured math:
>>>>> edx = ecx + edx
>>>>> @@ -125,6 +124,7 @@ L(crosscache):
>>>>> add %ecx, %edx
>>>>> sbb %eax, %eax
>>>>> or %eax, %edx
>>>>> + sub $16, %edx
>>>>> jbe L(return_null)
>>>>> lea 16(%edi), %edi
>>>>> # else
>>>>>
>>>
>>> Looks good.
>>>
>>>
>>
>> Just a thought. Is this approach
>>
>> /* Calculate "edx + ecx - 16" by using "edx - (16 - ecx)"
>> instead of "(edx + ecx) - 16" to void possible addition
>> overflow. */
>> neg %ecx
>> add $16, %ecx
>> sub %ecx, %edx
>> jbe L(return_null)
>>
>> a little bit better?
>
> I do not have a strong preference here, although imho it is simpler to
> understand why the saturated math would work here and it is the same
> strategy used on other arch implementations. May you might outline in
> comment that ecx is always in the range of [0,64) so '-ecx + 16' can
> possible underflow. Is is this change just for micro-optimization?
>
Sorry, my bad here. I made some confusion, in fact what I meant was ecx
was indeed in range [0,16) (is the and instruction for alignment that
uses 64) and your suggestion *can't* overflow. I would recommend
you to just outline the possible value of 'edx' in comments.