This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Linux: Implement interfaces for memory protection keys

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Florian Weimer <fw at deneb dot enyo dot de>
Cc: libc-alpha at sourceware dot org
Date: Tue, 7 Nov 2017 17:59:26 -0200
Subject: Re: [PATCH] Linux: Implement interfaces for memory protection keys
Authentication-results: sourceware.org; auth=none
References: <20171104174948.A79F6400EA2DF@oldenburg.str.redhat.com> <ddcb66a6-b0d8-a820-a12d-970fa2021b37@linaro.org> <5f7602dd-daf1-36ec-3040-19cfd17f8fc2@redhat.com> <521ff046-f859-e8f4-3e75-f2a6e0d527b0@linaro.org> <878tfjm5ds.fsf@mid.deneb.enyo.de>

On 06/11/2017 20:03, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>>> Oh, I was worried that these arguments were expected to be true
>>> 64-bit values eventually.  Not today, but perhaps in the future.  
>>> Then using unsigned long would be no good on x32.  But this does not
>>> seem to be the case, so we can just use unsigned int, to make clear
>>> that these arguments are actually 32 bits only.
>>
>> Even they were 64 bits argument, current {INLINE,INTERNAL}_SYSCALL on
>> x32 pass them correctly in syscall arguments.
> 
> But there would only be 32 bits to pass.
> 
>>>> So I think there is no reason to use a different symbol signature than
>>>> kernel.
>>>
>>> I still think using unsigned long should be avoided.
>>
>> Is your objection because it will have different argument types for 32
>> and 64 bits?
> 
> Yes, if 64 bits were needed, then it would have to be unsigned long
> long on x32.  But we need only 32 actually (0 and 3 bits for the
> upcoming POWER support), so unsigned int is the right type, I think.

Right, I would like to try deviate as little as possible from the
kernel but I think we can have a different signature for these function.

> 
>> Currently for x86 indeed it does not make difference since for
>> 'wrpkru' instruction on x86_64 the high-order 32-bits of RCX, RDX
>> and RAX are ignored.
> 
> The size of the PKRU register does not matter.  pkey_alloc does not
> use a bitmap of keys, it returns a key number (which happens to be an
> index into PKRU on x86-64, but this is not exposed in the API).
> 
>> But if we aim to make it a generic API I think we should stick to
>> what kernel provides for main reason we can not foresee what kind of
>> usage future architecture might do with this value.
> 
> Well, it's a generic interface, so if there's an expectation that the
> generic code might eventually use a flag beyond 32 bits, we'd better
> using uint64_t today.  But that seems quite an overkill.

Ok, seems a reasonable approach.

> 
>>>>> +/* Compare the two numbers LEFT and RIGHT and report failure if they
>>>>> +   are different.  */
>>>>> +#define TEST_COMPARE(left, right)                                       \
>>>>> +  ({                                                                    \
>>>>> +    __typeof__ (left) __left_value = (left);                            \
>>>>> +    __typeof__ (right) __right_value = (right);                         \
>>>>> +    _Static_assert (sizeof (__left_value) <= sizeof (long long),        \
>>>>> +                    "left value fits into long long");                  \
>>>>> +    _Static_assert (sizeof (__right_value) <= sizeof (long long),       \
>>>>> +                    "right value fits into long long");                 \
>>>>> +    if (__left_value != __right_value                                   \
>>>>> +        || ((__left_value > 0) != (__right_value > 0)))                 \
>>>>> +      support_test_compare_failure                                      \
>>>>> +        (__FILE__, __LINE__,                                            \
>>>>> +         #left, __left_value, __left_value > 0,                         \
>>>>> +         #right, __right_value, __right_value > 0);                     \
>>>>> +  })
>>>>> +
>>>>
>>>> I think the macro name should be more explicit since it is not only
>>>> comparing two number but also their size.
>>>
>>> Huh?  No, it checks that conversion to long long plus the sign bit
>>> is not lossy and that the signs match.  So I think the name is
>>> accurate.
>>
>> Yeah, but it ties some type (long long) and also some sign (since it
>> is comparing if both sides are positive) to a quite generic name.
>> Maybe TEST_COMPARE_LL_INT_POSITIVE?
> 
> It uses long long as a proxy for intmax_t.  I don't want to include
> <inttypes.h> due to namespace pollution.
> 
> The compile-time check is there to prevent the accidental use of the
> macro with 128-bit integer types.

Ok, would you mind adding this comment on macro implementation?

> 
>>> Sorry, I disagree with that.  Premature generalization usually does
>>> not work out because you don't know which parts to generalize.
>>>
>>> If we get key revocation on pkey_alloc, pkey_set would have to turn
>>> into a vsyscall anyway (where the code sequence is known to the
>>> kernel, and the revocation code can override the PKRU value the
>>> pkey_set implementation has read, thus closing the race).
>>
>> Alright, that is fair point. I am thinking more about the sense of
>> what should be generic such a maximum number of bits for the mask,
>> the possible mask value and to set them on the internal flag.  But
>> I think if it were the case in the future we can work a more generic
>> implementation.
> 
> I just saw POWER patches fly by.  There, the executable permission
> seems to be in a separate register which cannot be updated directly
> from userspace. 8-/

It is at v9 now [1] (and you seems to be already aware since you replied
about key reuse issue). And they are solving the executable permission
userspace issue it by adding a new syscall (__NR_pkey_modify).

Reading powerpc patchset it seems to me that it will be worth add
some generalization on arch-pkey.h and pkey_{set,get}

  - Based on current patchset, powerpc seems to define different
    value for PKEY_DISABLE_ACCESS, so I think it would be worth
    to parametrize by moving it to arch-defined header.

  - What about adding this on arch-pkey.h

   ---
   #define HAVE_ARCH_PKEY          1
   #define NR_PKEYS                16
   #define PKEY_BITS_PER_PKEY      2

   static inline uint32_t
   pkey_shift (uint32_t pkey)
   {
     return pkey * PKEY_BITS_PER_PKEY;
   }
   ---

   And define as pkey_get:

   ---
   uint32_t pkey_get (int pkey, unsigned long flags)
   {
   #if HAVE_ARCH_PKEY
     uint32_t mask = (PKEY_DISABLE_ACCESS|PKEY_DISABLE_WRITE);

     pkey_reg_t pkey_reg = pkey_read ();
     pkey_reg_t pkey_shifted = pkey_reg >> pkey_shift (pkey);

     return pkey_shifted & mask;
   #else
     return -1;
   #endif
   }
   ---

   With this a possible future powerpc patch would just need to
   add code on arch-pkey.h instead of reimplement pkey_set.c as
   well.

   - I would expect something similar to pkey_set, but also issuing
     the syscall for executable permission masking.   
 

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-November/165602.html

> 
>> What about new pkey_mprotect (..., 0) after pkey_set (0, ...)? Will
>> it follow the new key setting for key 0 or kernel will use a default
>> value?
> 
> It's important that the keys are basically additional tag bits on a
> page.  In isolation they do not grant or reject access.  These tag
> bits only select which access rights associated with the current
> thread should be applied to the page.  So pkey_mprotect and pkey_set
> are completely indepedent: pkey_mprotect changes the tag bits on the
> page (but nothing else) and does not depend on the current thread's
> access rights, and pkey_set changes the current thread's access rights
> only (without changing the page tag bits set by pkey_mprotect before
> or after).  Crucially, these keys do not any additional protection
> flags to pages.  The tag bits gain meaning only in combination with
> the access rights of a running thread.

My understanding from documentation it key 0 is default one and it
is assigned to any memory regions (through mmap) which has not been 
explicitly assigned through pkey_mprotect. My questioning whether we
should filter out key 0 is to:

  1. Issue an error when application tries to use a key which has
     not been previously allocated by pkey_alloc (since key 0 is
     pre-allocated by the kernel).

  2. Avoid changing default masking bits for pages which has not
     been explicitly assigned through a pkey_mprotect.

However this does not prevent uses from issuing a RDPKRU/WRPKRU
directly.

> 
>> My question is of you think we need to filter and/or document key 0
>> (and 1 as you indicate) as being special?
> 
> No, we shouldn't do that.  The kernel will do it for us as part of
> pkey_alloc, and that should be sufficient.
>

Follow-Ups:
- Re: [PATCH] Linux: Implement interfaces for memory protection keys
  - From: Florian Weimer

References:
- [PATCH] Linux: Implement interfaces for memory protection keys
  - From: Florian Weimer
- Re: [PATCH] Linux: Implement interfaces for memory protection keys
  - From: Adhemerval Zanella
- Re: [PATCH] Linux: Implement interfaces for memory protection keys
  - From: Florian Weimer
- Re: [PATCH] Linux: Implement interfaces for memory protection keys
  - From: Adhemerval Zanella
- Re: [PATCH] Linux: Implement interfaces for memory protection keys
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]