This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] X86-64: Add _dl_runtime_resolve_avx[512]_opt [BZ #20508]


On Tue, Aug 30, 2016 at 1:30 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Mon, Aug 29, 2016 at 5:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Mon, Aug 29, 2016 at 4:07 PM, Richard Henderson <rth@twiddle.net> wrote:
>>> On 08/26/2016 10:18 AM, H.J. Lu wrote:
>>>>
>>>> +       vpcmpeqd %xmm8, %xmm8, %xmm8
>>>> +       vorpd %ymm9, %ymm10, %ymm10
>>>> +       vptest %ymm10, %ymm8
>>>
>>>
>>> No need to create a mask of all -1; use vptest ymm10, ymm10.
>>>
>>
>> ymm8 isn't all -1.  Only the lower 128 bis are all -1:
>>
>>
>> (gdb) p/x $ymm8
>> $4 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
>>     0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {
>>     0xff <repeats 16 times>, 0x0 <repeats 16 times>}, v16_int16 = {0xffff,
>>     0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0,
>>     0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xffffffff, 0xffffffff, 0xffffffff,
>>     0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffffffffff,
>>     0xffffffffffffffff, 0x0, 0x0}, v2_int128 = {
>>     0xffffffffffffffffffffffffffffffff, 0x00000000000000000000000000000000}}
>> (gdb)
>>
>> ymm10 (ymm0|..|ymm7) has
>>
>> (gdb) p/x $ymm10
>> $2 = {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {
>>     0x8000000000000000, 0x8000000000000000, 0x0, 0x0}, v32_int8 = {0x6d,
>>     0x79, 0x72, 0x6f, 0x7f, 0x74, 0x6f, 0x73, 0x77, 0x6f, 0x6f, 0x67, 0x6f,
>>     0xff, 0x6f, 0xff, 0x0 <repeats 16 times>}, v16_int16 = {0x796d, 0x6f72,
>>     0x747f, 0x736f, 0x6f77, 0x676f, 0xff6f, 0xff6f, 0x0, 0x0, 0x0, 0x0, 0x0,
>>     0x0, 0x0, 0x0}, v8_int32 = {0x6f72796d, 0x736f747f, 0x676f6f77,
>>     0xff6fff6f, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x736f747f6f72796d,
>>     0xff6fff6f676f6f77, 0x0, 0x0}, v2_int128 = {
>>     0xff6fff6f676f6f77736f747f6f72796d, 0x00000000000000000000000000000000}}
>>
>> Since
>>
>> vptest %ymm10, %ymm8
>>
>> IF (SRC[255:0] BITWISE AND NOT DEST[255:0] = 0)
>> THEN CF = 1;
>> ELSE CF = 0;
>>
>> this ignores the lower 128 bits of ymm10 and sets CF = 0
>> only if the upper 128 bits of ymm10 aren't zero.  If we use
>>
>> vptest ymm10, ymm10
>>
>> CF is always 1 and we will always preserve ymm0-ymm7 even
>> when the upper 128 bits are zero.
>>
>
> Here is the updated patch to add PRESERVE_BND_REGS_PREFIX
> before branches.  Otherwise bound registers will be cleared.  OK
> for master?
>

Any comments? I will check it in next week if there is no objection.


-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]