This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] mips/o32: fix internal_syscall5/6/7

On 16/08/2017 18:15, Aurelien Jarno wrote:
> On 2017-08-16 11:13, Adhemerval Zanella wrote:
>> On 16/08/2017 10:44, Joseph Myers wrote:
>>> On Wed, 16 Aug 2017, Maciej W. Rozycki wrote:
>>>> On Tue, 15 Aug 2017, Joseph Myers wrote:
>>>>> In which case having a volatile integer variable with value 4, declaring a 
>>>>> VLA whose size is that variable, and storing a pointer to that VLA in a 
>>>>> variable, would be an alternative to alloca to force a frame pointer, but 
>>>>> with deallocation happening when the scope ends rather than the function 
>>>>> ending (and the syscall macro has its own scope, so using it inside a loop 
>>>>> wouldn't be a problem).
>>>>  I suspect using volatile variables will cause unnecessary memory traffic.  
>>>> Passing the size specifier through an empty `asm' might give better code; 
>>>> also I think we can use 0 as the size requested, not to decrease the stack 
>>>> pointer unnecessarily, e.g.:
>>> Sure, as long as (a) the compiler can't know the size is actually constant 
>>> and (b) it can't know the VLA isn't actually used, as if it can tell 
>>> either of those things it can optimize away the variable stack allocation.
>>>>  Also I wonder if there's actually a dependable way to have GCC itself 
>>>> allocate the argument space we require.  For example if we set `s' to 1 
>>>> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
>>>> valid to place arguments #5 and #6 at respectively without the subsequent 
>>>> $sp adjustment we currently have in the syscall `asm' or would it be UB?
>>> You can't tell whether the compiler might have allocated other variables 
>>> on the stack after the dynamic adjustment - that is, whether any 
>>> particular offset from sp is in fact unused or not.
>> What about the below? I can use some help to see if I am handling all the
>> required ABI requirements for the __libc_do_syscall, but on an qemu emulated
> Do we actually have to follow the ABI requirements if we control both
> the caller of __libc_do_syscall and the function itself? The i386 and
> arm version seem to pass as much as possible in the right registers and
> the other values and other way.
> For MIPS, it means we can pass v0, a0-a3 in the correct registers and
> use __libc_do_syscall to just setup the values on the stack. Something
> like that for example:

We do not really to follow ABI requirements and the only requirement is to
unwind correctly backtrace for cancellation work.  However to allow this
optimization we would need to take care different ABI calling convention for
internal symbol on internal symbols  (I noted that for PIC code MIPS adds
a GOT reference plus a R_MIPS_JALR, which linker might relax later).

I think we should aim for simplicity and use as much as C support we can
and optimize this with more asm hackery if we really need to squeeze the
specific cycles out the syscall (which I really think it is overkill for
mostly if not all of them).

Currently with this patch __libc_do_syscall is called on pread, pwrite, 
lseek, llseek, ppoll, posix_fadvice, posix_fallocate, sync_file_range, 
fallocate, preadv, pwritev, preadv2, pwritev2, select, pselect, mmap, 
readahead, epoll_pwait, splice, recvfrom, sendto, recvmmsg, msgsnd, msgrcv, 
msgget, msgctl, semop, semget, semctl, semtimedop, shmat, shmdt, shmget, 
and shmctl.  All with possible exception of posix_fadvice and sysv ctl
are blocking calls which trying to get some cycles really won't make
any difference IMHO.  Also context switch is usually the large factor
of latency.

> ENTRY(__libc_do_syscall)
>        PTR_SUBU sp, 32
>        cfi_adjust_cfa_offset(32)
>        .set noreorder
>        REG_S s2, 16(sp)
>        REG_S s3, 20(sp)
>        REG_S s4, 24(sp)
>        syscall
>        .set reorder
>        PTR_SUBU sp, -32
>        cfi_adjust_cfa_offset(-32)
>        ret
> END (__libc_do_syscall)
> On the caller side the 5th and following arguments should be passed in
> s2, s3, s4. s1 can be used to save ra around the subroutine call.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]