This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Linux: Implement interfaces for memory protection keys
On 06/11/2017 20:03, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>>> Oh, I was worried that these arguments were expected to be true
>>> 64-bit values eventually. Not today, but perhaps in the future.
>>> Then using unsigned long would be no good on x32. But this does not
>>> seem to be the case, so we can just use unsigned int, to make clear
>>> that these arguments are actually 32 bits only.
>>
>> Even they were 64 bits argument, current {INLINE,INTERNAL}_SYSCALL on
>> x32 pass them correctly in syscall arguments.
>
> But there would only be 32 bits to pass.
>
>>>> So I think there is no reason to use a different symbol signature than
>>>> kernel.
>>>
>>> I still think using unsigned long should be avoided.
>>
>> Is your objection because it will have different argument types for 32
>> and 64 bits?
>
> Yes, if 64 bits were needed, then it would have to be unsigned long
> long on x32. But we need only 32 actually (0 and 3 bits for the
> upcoming POWER support), so unsigned int is the right type, I think.
Right, I would like to try deviate as little as possible from the
kernel but I think we can have a different signature for these function.
>
>> Currently for x86 indeed it does not make difference since for
>> 'wrpkru' instruction on x86_64 the high-order 32-bits of RCX, RDX
>> and RAX are ignored.
>
> The size of the PKRU register does not matter. pkey_alloc does not
> use a bitmap of keys, it returns a key number (which happens to be an
> index into PKRU on x86-64, but this is not exposed in the API).
>
>> But if we aim to make it a generic API I think we should stick to
>> what kernel provides for main reason we can not foresee what kind of
>> usage future architecture might do with this value.
>
> Well, it's a generic interface, so if there's an expectation that the
> generic code might eventually use a flag beyond 32 bits, we'd better
> using uint64_t today. But that seems quite an overkill.
Ok, seems a reasonable approach.
>
>>>>> +/* Compare the two numbers LEFT and RIGHT and report failure if they
>>>>> + are different. */
>>>>> +#define TEST_COMPARE(left, right) \
>>>>> + ({ \
>>>>> + __typeof__ (left) __left_value = (left); \
>>>>> + __typeof__ (right) __right_value = (right); \
>>>>> + _Static_assert (sizeof (__left_value) <= sizeof (long long), \
>>>>> + "left value fits into long long"); \
>>>>> + _Static_assert (sizeof (__right_value) <= sizeof (long long), \
>>>>> + "right value fits into long long"); \
>>>>> + if (__left_value != __right_value \
>>>>> + || ((__left_value > 0) != (__right_value > 0))) \
>>>>> + support_test_compare_failure \
>>>>> + (__FILE__, __LINE__, \
>>>>> + #left, __left_value, __left_value > 0, \
>>>>> + #right, __right_value, __right_value > 0); \
>>>>> + })
>>>>> +
>>>>
>>>> I think the macro name should be more explicit since it is not only
>>>> comparing two number but also their size.
>>>
>>> Huh? No, it checks that conversion to long long plus the sign bit
>>> is not lossy and that the signs match. So I think the name is
>>> accurate.
>>
>> Yeah, but it ties some type (long long) and also some sign (since it
>> is comparing if both sides are positive) to a quite generic name.
>> Maybe TEST_COMPARE_LL_INT_POSITIVE?
>
> It uses long long as a proxy for intmax_t. I don't want to include
> <inttypes.h> due to namespace pollution.
>
> The compile-time check is there to prevent the accidental use of the
> macro with 128-bit integer types.
Ok, would you mind adding this comment on macro implementation?
>
>>> Sorry, I disagree with that. Premature generalization usually does
>>> not work out because you don't know which parts to generalize.
>>>
>>> If we get key revocation on pkey_alloc, pkey_set would have to turn
>>> into a vsyscall anyway (where the code sequence is known to the
>>> kernel, and the revocation code can override the PKRU value the
>>> pkey_set implementation has read, thus closing the race).
>>
>> Alright, that is fair point. I am thinking more about the sense of
>> what should be generic such a maximum number of bits for the mask,
>> the possible mask value and to set them on the internal flag. But
>> I think if it were the case in the future we can work a more generic
>> implementation.
>
> I just saw POWER patches fly by. There, the executable permission
> seems to be in a separate register which cannot be updated directly
> from userspace. 8-/
It is at v9 now [1] (and you seems to be already aware since you replied
about key reuse issue). And they are solving the executable permission
userspace issue it by adding a new syscall (__NR_pkey_modify).
Reading powerpc patchset it seems to me that it will be worth add
some generalization on arch-pkey.h and pkey_{set,get}
- Based on current patchset, powerpc seems to define different
value for PKEY_DISABLE_ACCESS, so I think it would be worth
to parametrize by moving it to arch-defined header.
- What about adding this on arch-pkey.h
---
#define HAVE_ARCH_PKEY 1
#define NR_PKEYS 16
#define PKEY_BITS_PER_PKEY 2
static inline uint32_t
pkey_shift (uint32_t pkey)
{
return pkey * PKEY_BITS_PER_PKEY;
}
---
And define as pkey_get:
---
uint32_t pkey_get (int pkey, unsigned long flags)
{
#if HAVE_ARCH_PKEY
uint32_t mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);
pkey_reg_t pkey_reg = pkey_read ();
pkey_reg_t pkey_shifted = pkey_reg >> pkey_shift (pkey);
return pkey_shifted & mask;
#else
return -1;
#endif
}
---
With this a possible future powerpc patch would just need to
add code on arch-pkey.h instead of reimplement pkey_set.c as
well.
- I would expect something similar to pkey_set, but also issuing
the syscall for executable permission masking.
[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-November/165602.html
>
>> What about new pkey_mprotect (..., 0) after pkey_set (0, ...)? Will
>> it follow the new key setting for key 0 or kernel will use a default
>> value?
>
> It's important that the keys are basically additional tag bits on a
> page. In isolation they do not grant or reject access. These tag
> bits only select which access rights associated with the current
> thread should be applied to the page. So pkey_mprotect and pkey_set
> are completely indepedent: pkey_mprotect changes the tag bits on the
> page (but nothing else) and does not depend on the current thread's
> access rights, and pkey_set changes the current thread's access rights
> only (without changing the page tag bits set by pkey_mprotect before
> or after). Crucially, these keys do not any additional protection
> flags to pages. The tag bits gain meaning only in combination with
> the access rights of a running thread.
My understanding from documentation it key 0 is default one and it
is assigned to any memory regions (through mmap) which has not been
explicitly assigned through pkey_mprotect. My questioning whether we
should filter out key 0 is to:
1. Issue an error when application tries to use a key which has
not been previously allocated by pkey_alloc (since key 0 is
pre-allocated by the kernel).
2. Avoid changing default masking bits for pages which has not
been explicitly assigned through a pkey_mprotect.
However this does not prevent uses from issuing a RDPKRU/WRPKRU
directly.
>
>> My question is of you think we need to filter and/or document key 0
>> (and 1 as you indicate) as being special?
>
> No, we shouldn't do that. The kernel will do it for us as part of
> pkey_alloc, and that should be sufficient.
>