This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Directions for arc4random

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Florian Weimer <fweimer at redhat dot com>
Cc: libc-alpha at sourceware dot org
Date: Mon, 29 Oct 2018 15:14:47 -0300
Subject: Re: Directions for arc4random
References: <87woq9znhw.fsf@oldenburg.str.redhat.com> <2e424ecd-18f4-2030-17a7-1cca079b1278@linaro.org> <87lg6lkfft.fsf@oldenburg.str.redhat.com>

On 25/10/2018 16:13, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 22/10/2018 18:21, Florian Weimer wrote:
>>> I would like to resubmit my arc4random patch.
>>>
>>> Most of the code, and the difficult-to-review parts, are for providing
>>> fork safety even if the process is forked by a system call, and not a
>>> call to the glibc function fork.
>>>
>>> Rich Felker said that he'd prefer to use a proper lock for this in musl:
>>>
>>>   <https://www.openwall.com/lists/musl/2018/07/02/5>
>>>
>>> This would make arc4random far less scalable if one global state is
>>> used.  But the implementation would be greatly simplified by this,
>>> especially if we do not provide async-signal-safety for arc4random.  (If
>>> we always lock around arc4random invocations and the entire
>>> cryptographic operations, then I think this would detect fork misuse due
>>> to hangs on that lock.)
> 
> (I believe this is wrong—locks won't help us, the bit streams will still
> not converge.)

Do you mean this won't help us on async-signal-safety for arc4random or
locking over arc4random crypto operation is not suffice to provide fork
detection?

> 
>>> I would like to know if I should resubmit my last, very complex
>>> approach, or if I should switch to the simple, lock-based approach.
>>
>> It was my backlog list for some time. The first problem I see is the 
>> the huge size which makes reviewing quite complicate.
>>
>> The first step to try simplify patch submission is remove all x86
>> optimizations for AES-NI/SSE2 as you already indicate in your initial
>> submission.
> 
> It's AES-NI and RDRAND, demonstrating the use of both CPU extension
> hooks to verify that they are usable.  I think they do not substantially
> complicate review of the rest of the patch.

My point is the change is orthogonal to the arc4random and it could be send
in a different patch.  The original patch is already quite complex and moving
the arch-specific bits to a separate patch simplifies review on both changes.

> 
>> Another thing is I tend to agree with Rich, the lock-free fork/clone
>> detection is quite complicated and possible troublesome for maintainability. 
>> Since we have a long-standing issue of a no async-signal-safe fork 
>> implementation (BZ#4737) and fork implementation still contain some 
>> lock contention on multithread fork (due atfork list handling, libio 
>> internal list, and malloc) I think such lock-free complexity would not 
>> yield much gain.
> 
> The lock is eliminated not from fork (which doesn't have to care if we
> are willing to reseed in the new process), but from regular arc4random
> calls.

Right, my confusion here, this is indeed another topic.  

Checking on other arc4random implementation (OpenBSD one which seems to 
support Linux), its fork handling seems sketchy: it uses a atfork handler 
and detect a fork by checking a global variable set by the handler plus 
against current pid (get with getpid).  It incurs in an extra syscall, it 
does not handle pid wrap around, and requires a global lock over arc4random 
on fork detect operation.

The MADV_WIPEONFORK might help the above scenario, but it will still 
requires a lock over the symbol but simplifies the fork detection (but we 
will still need to handle kernels without MADV_WIPEONFORK support).

I think the issues with the described fork handling approach without
MADV_WIPEONFORK can be fixed, so the question is whether we should set
multithread performance as implementation focus. The current arc4random
interface, without user-controllable and/or detached PRNGs state, imposes
more resource state and at least some synchronization. I would expect
a performant implementation to dissociate if from the interface, as C++
random for instance, which might require no synchronization at all 
depending of the usage.

So do you expect realword scenarios to require high performant arc4random?
I don't have a strong opinion here, so I am ok we think performance should
be a QoI for arc4random.

> 
> On the other hand, if we want to add backtracking protection, we need
> either state that is fully confined to each thread, separately seeded,
> or a global lock.  Backtracking protection requires serializing all
> operations on the state because the output from one operation is not
> predictable prior to completing the computation and replaces the
> existing state, while in the counter-based approach, you can just skip
> ahead.
> 
>> So I would prefer Rich suggestion of using simpler locks, albeit with
>> less scalability. As you already described in your path, current kernel
>> support of MADV_WIPEONFORK already provides a proper way to handle it.
>> Also, the lock-free optimization is also a future option is current
>> lock seems to be a problem in real-world usage.
> 
> Rich doesn't want to use MADV_WIPEONFORK because it essentially wastes a
> page and the corresponding PTEs for a rarely used feature.  The other
> approach with the paired locks is similar because it also needs a
> dedicated page.

Right, but if I understood correct, at least for our case glibc will only
pay for the extra page if the program actually calls arc4random, right?

> 
>> I also think it would profitable to discuss how to properly handle BZ#4737
>> if fork scalability is an issue. You recently brought if there is a way
>> to detect if the current function has been called (directly or indirectly) 
>> from a signal handler [1], not sure if were your idea to handle BZ#4737 or 
>> something else (with a proper way we could fix the atfork list scalability 
>> issue).
>>
>> [1] https://sourceware.org/ml/libc-help/2016-02/msg00029.html
> 
> Calling this recently stretches things a bit. 8-)
> 
> Wrapper functions for signal handlers are problematic because we cannot
> atomically set handler flags and the function, so a handler could run
> with the wrong flags.

I think it still doable, albeit it would make sigaction less scalable
and signal handling a bit more costly. However I think this discussion is
worth another topic.

Follow-Ups:
- Re: Directions for arc4random
  - From: Florian Weimer

References:
- Directions for arc4random
  - From: Florian Weimer
- Re: Directions for arc4random
  - From: Adhemerval Zanella
- Re: Directions for arc4random
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]