This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Fix p{readv,writev}{64} consolidation implementation
- From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- To: Chris Metcalf <cmetcalf at mellanox dot com>, libc-alpha at sourceware dot org, Mike Frysinger <vapier at gentoo dot org>
- Date: Wed, 15 Jun 2016 17:17:49 -0300
- Subject: Re: [PATCH] Fix p{readv,writev}{64} consolidation implementation
- Authentication-results: sourceware.org; auth=none
- References: <1465941275-3459-1-git-send-email-adhemerval dot zanella at linaro dot org> <20160615053714 dot GM4053 at vapier dot lan> <57615BFB dot 2050101 at linaro dot org> <e9d7c072-9fd7-e4ab-5b65-6974a7597c77 at mellanox dot com>
On 15/06/2016 15:21, Chris Metcalf wrote:
> On 6/15/2016 9:45 AM, Adhemerval Zanella wrote:
>> On 15/06/2016 02:37, Mike Frysinger wrote:
>>> On 14 Jun 2016 18:54, Adhemerval Zanella wrote:
>>>> This patch fixes the p{readv,writev}{64} consolidation implementation
>>>> from commits 4e77815 and af5fdf5. Different from pread/pwrite
>>>> implementation, preadv/pwritev implementation does not require
>>>> __ALIGNMENT_ARG because kernel syscall prototypes define
>>>> the high and low part of the off_t, if it is the case, directly
>>>> (different from pread/pwrite where the architecture ABI for passing
>>>> 64-bit values must be in consideration for passsing the arguments).
>>> i had looked at that specifically but thought it ok because the old code
>>> was using the alignment arg. was the old code broken too ?
>>>
>>> this is what the preadv code looked like:
>>> -ssize_t
>>> -__libc_preadv (int fd, const struct iovec *vector, int count, off_t offset)
>>> -{
>>> - assert (sizeof (offset) == 4);
>>> - return SYSCALL_CANCEL (preadv, fd,
>>> - vector, count, __ALIGNMENT_ARG
>>> - __LONG_LONG_PAIR (offset >> 31, offset));
>>> -}
>>>
>>> although i guess this isn't too surprising as this code was in the
>>> generic sysdeps dir which currently doesn't have as many users as
>>> we wish it did :).
>> I though it too, but before the consolidation patch only nios2 and tile 32-bits
>> used the generic preadv.c implementation. And only tile 32-bits defines
>> __ASSUME_ALIGNED_REGISTER_PAIRS (so __ALIGNMENT_ARG will pad the argument with 0).
>> However since the syscall is defined as in linux source:
>>
>> fs/read_write.c:
>>
>> 991 SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
>> 992 unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
>> 993 {
>> 994 loff_t pos = pos_from_hilo(pos_h, pos_l);
>> 995
>> 996 return do_preadv(fd, vec, vlen, pos, 0);
>> 997 }
>
> Actually, the relevant code at fs/read_write.c:1162 is:
>
> COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
> const struct compat_iovec __user *,vec,
> compat_ulong_t, vlen, u32, pos_low, u32, pos_high)
> {
> loff_t pos = ((loff_t)pos_high << 32) | pos_low;
>
> return do_compat_preadv64(fd, vec, vlen, pos, 0);
> }
>
> but it amounts to the same thing as far the arguments are concerned.
This is for 64-bits kernel that provides the 32-bit compat syscalls.
For 32-bits kernels only it the code I referenced above.
>
>> The idea is not really to align the argument to zero pass, but rather to split
>> the possible 64-bits argument in high and low as required (as the default
>> implementation was doing [2]). On tile, it is working because the preadv.c
>> offset is 32-bits and thus the high word is indeed zero, but it is passing
>> one superfluous argument.
>
> No, what happens is that instead of passing r3=pos_low=lo, r4=pos_high=0, we
> are passing r3=pos_low=0, r4=pos_high=lo, r5=0. This means that we get a crazy
> high offset and either fail or (for a sufficiently large file) get the wrong data.
> Filed as bug 20261.
I was referring to *old* behaviour (pre-consolidation) implementation
(the sysdeps/unix/sysv/linux/generic/wordsize-32/preadv.c), which did:
--
return SYSCALL_CANCEL (preadv, fd,
vector, count, __ALIGNMENT_ARG
__LONG_LONG_PAIR (offset >> 31, offset));
--
It has 2 issue:
1. It passed one superfluous argument to preadv. On tilepro build
I noted that is using internal_syscall6, which means it is passing
{ fd, vector, cound, 0, offset, offset>>31 }. It is not wrong,
since for this code off_t will be the old 32-bit value, but the
semantic is wrong.
2. __LONG_LONG_PAIR is not correct for big-endian.
Now for BZ#20261 I do not think it applicable since this is a fix for
consolidation done in development phase, it does not appear in any
released version.
Now, your analysis is correct for *current* code and it is contains the
__LONG_LONG_PAIR issue due the SYSCALL_LL usage. I will change it
and post a second version.
>
> It appears to be true for both preadv and pwritev, though I only tested preadv.
>
>> Also my understanding is this generic implementation
>> does work correctly in every architecture because __LONG_LONG_PAIR relies
>> on endianess.
>
> In fact that's another bug; __LONG_LONG_PAIR is intended only to
> be used if you are simulating passing a 64-bit value in 32-bit registers,
> where the ABI naturally splits the 64 bits into a high and low part
> according to endianness.
>
> In this example we are calling into the kernel where it expects a pos_low
> and a pos_high in a particular order. On a big-endian machine, the
> __LONG_LONG_PAIR will put the high part first and the kernel will
> see the wrong offset.
>