This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: Add p{read,write}[v]64 to syscalls.list [BZ #20348]


On Thu, Jul 14, 2016 at 3:45 AM, Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
>
> On 13/07/2016 18:27, H.J. Lu wrote:
>> On Wed, Jul 13, 2016 at 8:25 AM, Adhemerval Zanella
>> <adhemerval.zanella@linaro.org> wrote:
>>>
>>>
>>> On 12/07/2016 17:19, H.J. Lu wrote:
>>>> On Tue, Jul 12, 2016 at 8:49 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>> On Tue, Jul 12, 2016 at 8:19 AM, Adhemerval Zanella
>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>> On 12/07/2016 16:03, H.J. Lu wrote:
>>>>>>> On Tue, Jul 12, 2016 at 7:04 AM, Adhemerval Zanella
>>>>>>> <adhemerval.zanella@linaro.org> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/07/2016 14:26, H.J. Lu wrote:
>>>>>>>>> 64-bit off_t in pread64, preadv, pwrite64 and pwritev syscalls is pased
>>>>>>>>> in one 64-bit register for both x32 and x86-64.  Since the inline
>>>>>>>>> asm statement only passes long, which is 32-bit for x32, in registers,
>>>>>>>>> 64-bit off_t is truncated to 32-bit on x32.  Since __ASSUME_PREADV and
>>>>>>>>> __ASSUME_PWRITEV are defined unconditionally, these syscalls can be
>>>>>>>>> implemented in syscalls.list to pass 64-bit off_t in one 64-bit register.
>>>>>>>>>
>>>>>>>>> Tested on x86-64 and x32 with off_t > 4GB on pread64/pwrite64 and
>>>>>>>>> preadv64/pwritev64.
>>>>>>>>>
>>>>>>>>> OK for master?
>>>>>>>>>
>>>>>>>>> H.J.
>>>>>>>>> ---
>>>>>>>>>       [BZ #20348]
>>>>>>>>>       * sysdeps/unix/sysv/linux/x86_64/syscalls.list: Add pread64,
>>>>>>>>>       preadv64, pwrite64 and pwritev64.
>>>>>>>>> ---
>>>>>>>>>  sysdeps/unix/sysv/linux/x86_64/syscalls.list | 4 ++++
>>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/sysdeps/unix/sysv/linux/x86_64/syscalls.list b/sysdeps/unix/sysv/linux/x86_64/syscalls.list
>>>>>>>>> index d09d101..bcf6370 100644
>>>>>>>>> --- a/sysdeps/unix/sysv/linux/x86_64/syscalls.list
>>>>>>>>> +++ b/sysdeps/unix/sysv/linux/x86_64/syscalls.list
>>>>>>>>> @@ -6,6 +6,10 @@ msgctl               -       msgctl          i:iip   __msgctl        msgctl
>>>>>>>>>  msgget               -       msgget          i:ii    __msgget        msgget
>>>>>>>>>  msgrcv               -       msgrcv          Ci:ibnii __msgrcv       msgrcv
>>>>>>>>>  msgsnd               -       msgsnd          Ci:ibni __msgsnd        msgsnd
>>>>>>>>> +pread64              -       pread64         Ci:ipii __libc_pread    __libc_pread64 __pread64 pread64 __pread pread
>>>>>>>>> +preadv64     -       preadv          Ci:ipii preadv64        preadv
>>>>>>>>> +pwrite64     -       pwrite64        Ci:ipii __libc_pwrite   __pwrite64 pwrite64 __pwrite pwrite
>>>>>>>>> +pwritev64    -       pwritev         Ci:ipii pwritev64       pwritev
>>>>>>>>>  shmat                -       shmat           i:ipi   __shmat         shmat
>>>>>>>>>  shmctl               -       shmctl          i:iip   __shmctl        shmctl
>>>>>>>>>  shmdt                -       shmdt           i:s     __shmdt         shmdt
>>>>>>>>>
>>>>>>>>
>>>>>>>> This is pretty much what I suggested [1] with the difference that my
>>>>>>>> idea is to re-add the auto-generated wrappers just for x32.  I would
>>>>>>>> prefer to keep x86_64 continue to use default implementation and
>>>>>>>> work on fix {INLINE,INTERNAL}_SYSCALL to work with 64-bit arguments
>>>>>>>> in x32.
>>>>>>>
>>>>>>> syscalls.list is the preferred way to implement a system call, not
>>>>>>> {INLINE,INTERNAL}_SYSCALL.  There is no reason only to do it
>>>>>>> for x32.  As for {INLINE,INTERNAL}_SYSCALL with 64-bit argument,
>>>>>>> they are only used in p{read,write}[v]64 and it is incorrect to pass long
>>>>>>> as 64-bit integer to x32 syscall if the argument is long or pointer.
>>>>>>
>>>>>> The idea I am trying to push with all these consolidation are twofold:
>>>>>>
>>>>>>  1. Remove the complexity implementation files and way to call syscalls
>>>>>>     inside GLIBC and make easier to implement new ports
>>>>>
>>>>> That is fine.
>>>>>
>>>>>>  2. Remove the redundant sysdep-cancel.h requirement for each port
>>>>>>     which basically implementations pic/nopic function calls in assembly.
>>>>>>     This is also remove implementation complexity and make easier for
>>>>>>     new port implementation.
>>>>>>
>>>>>> Also, for 2. it also helps the long standing pthread cancellation
>>>>>> (bz#12683) by focusing all cancellation calls in only one implementation.
>>>>>>
>>>>>> I do get the idea the auto-generation call is currently preferred way
>>>>>> to implementation syscalls, but I think for *cancellable* way we should
>>>>>> push to implement using SYSCALL_CANCEL (which is in turn based on
>>>>>> INTERNAL_SYSCALL).
>>>>>
>>>>> That is fine also.  But on x86-64, we should use syscalls.list if possible,
>>>>> especially for cancellation calls.
>>>>>
>>>>> With {INLINE,INTERNAL}_SYSCALL:
>>>>>
>>>>> 0000000000000000 <__libc_pread>:
>>>>>    0: 8b 05 00 00 00 00     mov    0x0(%rip),%eax        # 6 <__libc_pread+0x6>
>>>>>    6: 49 89 ca             mov    %rcx,%r10
>>>>>    9: 85 c0                 test   %eax,%eax
>>>>>    b: 75 2b                 jne    38 <__libc_pread+0x38>
>>>>>    d: 48 63 ff             movslq %edi,%rdi
>>>>>   10: b8 11 00 00 00       mov    $0x11,%eax
>>>>>   15: 0f 05                 syscall
>>>>>   17: 48 3d 00 f0 ff ff     cmp    $0xfffffffffffff000,%rax
>>>>>   1d: 77 01                 ja     20 <__libc_pread+0x20>
>>>>>   1f: c3                   retq
>>>>>   20: 48 8b 15 00 00 00 00 mov    0x0(%rip),%rdx        # 27 <__libc_pread+0x27>
>>>>>   27: f7 d8                 neg    %eax
>>>>>   29: 64 89 02             mov    %eax,%fs:(%rdx)
>>>>>   2c: 48 c7 c0 ff ff ff ff mov    $0xffffffffffffffff,%rax
>>>>>   33: c3                   retq
>>>>>   34: 0f 1f 40 00           nopl   0x0(%rax)
>>>>>   38: 41 55                 push   %r13
>>>>>   3a: 41 54                 push   %r12
>>>>>   3c: 49 89 cd             mov    %rcx,%r13
>>>>>   3f: 55                   push   %rbp
>>>>>   40: 53                   push   %rbx
>>>>>   41: 49 89 d4             mov    %rdx,%r12
>>>>>   44: 48 89 f5             mov    %rsi,%rbp
>>>>>   47: 89 fb                 mov    %edi,%ebx
>>>>>   49: 48 83 ec 18           sub    $0x18,%rsp
>>>>>   4d: e8 00 00 00 00       callq  52 <__libc_pread+0x52>
>>>>>   52: 4d 89 ea             mov    %r13,%r10
>>>>>   55: 41 89 c0             mov    %eax,%r8d
>>>>>   58: 4c 89 e2             mov    %r12,%rdx
>>>>>   5b: 48 89 ee             mov    %rbp,%rsi
>>>>>   5e: 48 63 fb             movslq %ebx,%rdi
>>>>>   61: b8 11 00 00 00       mov    $0x11,%eax
>>>>>   66: 0f 05                 syscall
>>>>>   68: 48 3d 00 f0 ff ff     cmp    $0xfffffffffffff000,%rax
>>>>>   6e: 77 1d                 ja     8d <__libc_pread+0x8d>
>>>>>   70: 44 89 c7             mov    %r8d,%edi
>>>>>   73: 48 89 44 24 08       mov    %rax,0x8(%rsp)
>>>>>   78: e8 00 00 00 00       callq  7d <__libc_pread+0x7d>
>>>>>   7d: 48 8b 44 24 08       mov    0x8(%rsp),%rax
>>>>>   82: 48 83 c4 18           add    $0x18,%rsp
>>>>>   86: 5b                   pop    %rbx
>>>>>   87: 5d                   pop    %rbp
>>>>>   88: 41 5c                 pop    %r12
>>>>>   8a: 41 5d                 pop    %r13
>>>>>   8c: c3                   retq
>>>>>   8d: 48 8b 15 00 00 00 00 mov    0x0(%rip),%rdx        # 94 <__libc_pread+0x94>
>>>>>   94: f7 d8                 neg    %eax
>>>>>   96: 64 89 02             mov    %eax,%fs:(%rdx)
>>>>>   99: 48 c7 c0 ff ff ff ff mov    $0xffffffffffffffff,%rax
>>>>>   a0: eb ce                 jmp    70 <__libc_pread+0x70>
>>>>>
>>>>> With syscalls.list:
>>>>>
>>>>> Disassembly of section .text:
>>>>>
>>>>> 0000000000000000 <__libc_pread>:
>>>>>    0: 83 3d 00 00 00 00 00 cmpl   $0x0,0x0(%rip)        # 7 <__libc_pread+0x7>
>>>>>    7: 75 17                 jne    20 <__pread64_nocancel+0x17>
>>>>>
>>>>> 0000000000000009 <__pread64_nocancel>:
>>>>>    9: 49 89 ca             mov    %rcx,%r10
>>>>>    c: b8 11 00 00 00       mov    $0x11,%eax
>>>>>   11: 0f 05                 syscall
>>>>>   13: 48 3d 01 f0 ff ff     cmp    $0xfffffffffffff001,%rax
>>>>>   19: 0f 83 00 00 00 00     jae    1f <__pread64_nocancel+0x16>
>>>>>   1f: c3                   retq
>>>>>   20: 48 83 ec 08           sub    $0x8,%rsp
>>>>>   24: e8 00 00 00 00       callq  29 <__pread64_nocancel+0x20>
>>>>>   29: 48 89 04 24           mov    %rax,(%rsp)
>>>>>   2d: 49 89 ca             mov    %rcx,%r10
>>>>>   30: b8 11 00 00 00       mov    $0x11,%eax
>>>>>   35: 0f 05                 syscall
>>>>>   37: 48 8b 3c 24           mov    (%rsp),%rdi
>>>>>   3b: 48 89 c2             mov    %rax,%rdx
>>>>>   3e: e8 00 00 00 00       callq  43 <__pread64_nocancel+0x3a>
>>>>>   43: 48 89 d0             mov    %rdx,%rax
>>>>>   46: 48 83 c4 08           add    $0x8,%rsp
>>>>>   4a: 48 3d 01 f0 ff ff     cmp    $0xfffffffffffff001,%rax
>>>>>   50: 0f 83 00 00 00 00     jae    56 <__pread64_nocancel+0x4d>
>>>>>   56: c3                   retq
>>>>>
>>>>> This one is much better.
>>>
>>> I think SYSCALL_CANCEL can be slight improved by using INTERNAL_SYSCALL
>>> instead of INLINE_SYSCALL and checking the error outside of cancellation
>>> handling.  I see a slight improvement, however sysdep-cancel.h is
>>> still slight better.
>>>
>>> However the idea is I am trying to push in the end is to remove the
>>> sysdep-cancel.h because the new cancellation scheme will require it
>>> to basically do a function call instead of calling the cancellation
>>> enable/disable (which will be also removed).
>>>
>>>>>
>>>>>>>
>>>>>>>> Also, I think we should remove the first try to fix LO_HI_LONG [2],
>>>>>>>> since 64 bits argument does not work correct in x32 anyway.
>>>>>>>
>>>>>>> I think LO_HI_LONG should be defined only if __WORDSIZE != 64
>>>>>>> and p{read,write}[v]64 should be added to wordsize-64/syscalls.list.
>>>>>>
>>>>>> Indeed this is something I will get back now that I see x32 fails
>>>>>> with current implementation.  I got the wrong idea all ILP32 would
>>>>>> use the compat code (as MIPS64n64).
>>>>
>>>> x86/entry/syscalls/syscall_64.tbl has
>>>>
>>>> 17 common pread64 sys_pread64
>>>> 295 64 preadv sys_preadv
>>>> 534 x32 preadv compat_sys_preadv64
>>>>
>>>> compat_sys_preadv64 takes 64-bit offset in one piece.
>>>>
>>>>>> About the patch, based on current timeframe I think your solution is
>>>>>> the safest one for x86.
>>>>
>>>> I will check it in.
>>>>
>>>>>> However I do would like to re-enable it on x86, including x32, when 2.25
>>>>>> opens. The patch below changes slight on how {INLINE,INTERNAL}_SYSCALL
>>>>>> works on x32: int/long/pointer should be passed as before and uint64_t
>>>>>> arguments should be passed as register all well without casting.
>>>>>> If you have time I would like to check if it would be acceptable for
>>>>>> 2.25. It shows no regression on x32, including the tst-preadwrite{64}
>>>>>> testcase you sent earlier:
>>>>>
>>>>> I will give it a try.
>>>>
>>>> I got no regressions on x32.  But on x86-64, I got
>>>>
>>>> Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init:
>>>> Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
>>>> FAIL: nptl/tst-stack4
>>>>
>>>> It doesn't fail every time.  I am wondering if typeof returns the
>>>> correct type on
>>>> signed/unsigned integer constants.
>>>>
>>>
>>> I think the issue is not really related to this patch.  I also see this
>>> very issue in i386, powerpc64le and aarch64 in a inconsistent way
>>>
>>> Regarding typeof type for signed/unsigned, it looks like gcc gets it
>>> right:
>>>
>>> $ cat test.c
>>> #include <stdio.h>
>>>
>>> unsigned char uc;
>>> signed char sc;
>>>
>>> int foo_1 ()
>>> {
>>>   typeof (uc) c = 255;
>>>   return c > 128;
>>> }
>>>
>>> int foo_2 ()
>>> {
>>>   typeof (sc) c = 255;
>>>   return c > 128;
>>> }
>>>
>>> int
>>> main (void)
>>> {
>>>   printf ("%i\n%i\n", foo_1 (), foo_2 ());
>>>   return 0;
>>> }
>>> $ gcc test.c
>>> $ ./a.out
>>> 1
>>> 0
>>>
>>
>> That is not what I meant:
>>
>> hjl@gnu-6 tmp]$ cat y.c
>> #define NUM (-214)
>> typedef typeof (NUM) t1;
>>
>> t1
>> foo1 ()
>> {
>>   return NUM;
>> }
>>
>> typedef long int t2;
>>
>> t2
>> foo2 ()
>> {
>>   return NUM;
>> }
>> [hjl@gnu-6 tmp]$ gcc -S -O2 y.c
>> [hjl@gnu-6 tmp]$ cat y.s
>> .file "y.c"
>> .text
>> .p2align 4,,15
>> .globl foo1
>> .type foo1, @function
>> foo1:
>> .LFB0:
>> .cfi_startproc
>> movl $-214, %eax
>> ^^^^^^^^^^^^^^^^^^^^
>> ret
>> .cfi_endproc
>> .LFE0:
>> .size foo1, .-foo1
>> .p2align 4,,15
>> .globl foo2
>> .type foo2, @function
>> foo2:
>> .LFB1:
>> .cfi_startproc
>> movq $-214, %rax
>> ^^^^^^^^^^^^^^^^^^^^^
>> ret
>> .cfi_endproc
>> .LFE1:
>> .size foo2, .-foo2
>> .ident "GCC: (GNU) 6.1.1 20160621 (Red Hat 6.1.1-3)"
>> .section .note.GNU-stack,"",@progbits
>> [hjl@gnu-6 tmp]$
>>
>>
>
> Right, I see your point now.  However do you think this could be an issue
> with x32 using {INTERNAL,INLINE}_SYSCALL? I see that these are for internal
> usage, so we can be judicious on how to use and the constraints about them.

We need to use long int in {INTERNAL,INLINE}_SYSCALL.  So far it
isn't a problem since we never inline a syscall with 64-bit argument.
If we do need to do it in the future, we can do something similar to
sysdeps/unix/sysv/linux/x86_64/x32/times.c to use long long on
the 64-bit argument.

-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]