This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Directions for arc4random
On 25/10/2018 16:13, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>> On 22/10/2018 18:21, Florian Weimer wrote:
>>> I would like to resubmit my arc4random patch.
>>>
>>> Most of the code, and the difficult-to-review parts, are for providing
>>> fork safety even if the process is forked by a system call, and not a
>>> call to the glibc function fork.
>>>
>>> Rich Felker said that he'd prefer to use a proper lock for this in musl:
>>>
>>> <https://www.openwall.com/lists/musl/2018/07/02/5>
>>>
>>> This would make arc4random far less scalable if one global state is
>>> used. But the implementation would be greatly simplified by this,
>>> especially if we do not provide async-signal-safety for arc4random. (If
>>> we always lock around arc4random invocations and the entire
>>> cryptographic operations, then I think this would detect fork misuse due
>>> to hangs on that lock.)
>
> (I believe this is wrong—locks won't help us, the bit streams will still
> not converge.)
Do you mean this won't help us on async-signal-safety for arc4random or
locking over arc4random crypto operation is not suffice to provide fork
detection?
>
>>> I would like to know if I should resubmit my last, very complex
>>> approach, or if I should switch to the simple, lock-based approach.
>>
>> It was my backlog list for some time. The first problem I see is the
>> the huge size which makes reviewing quite complicate.
>>
>> The first step to try simplify patch submission is remove all x86
>> optimizations for AES-NI/SSE2 as you already indicate in your initial
>> submission.
>
> It's AES-NI and RDRAND, demonstrating the use of both CPU extension
> hooks to verify that they are usable. I think they do not substantially
> complicate review of the rest of the patch.
My point is the change is orthogonal to the arc4random and it could be send
in a different patch. The original patch is already quite complex and moving
the arch-specific bits to a separate patch simplifies review on both changes.
>
>> Another thing is I tend to agree with Rich, the lock-free fork/clone
>> detection is quite complicated and possible troublesome for maintainability.
>> Since we have a long-standing issue of a no async-signal-safe fork
>> implementation (BZ#4737) and fork implementation still contain some
>> lock contention on multithread fork (due atfork list handling, libio
>> internal list, and malloc) I think such lock-free complexity would not
>> yield much gain.
>
> The lock is eliminated not from fork (which doesn't have to care if we
> are willing to reseed in the new process), but from regular arc4random
> calls.
Right, my confusion here, this is indeed another topic.
Checking on other arc4random implementation (OpenBSD one which seems to
support Linux), its fork handling seems sketchy: it uses a atfork handler
and detect a fork by checking a global variable set by the handler plus
against current pid (get with getpid). It incurs in an extra syscall, it
does not handle pid wrap around, and requires a global lock over arc4random
on fork detect operation.
The MADV_WIPEONFORK might help the above scenario, but it will still
requires a lock over the symbol but simplifies the fork detection (but we
will still need to handle kernels without MADV_WIPEONFORK support).
I think the issues with the described fork handling approach without
MADV_WIPEONFORK can be fixed, so the question is whether we should set
multithread performance as implementation focus. The current arc4random
interface, without user-controllable and/or detached PRNGs state, imposes
more resource state and at least some synchronization. I would expect
a performant implementation to dissociate if from the interface, as C++
random for instance, which might require no synchronization at all
depending of the usage.
So do you expect realword scenarios to require high performant arc4random?
I don't have a strong opinion here, so I am ok we think performance should
be a QoI for arc4random.
>
> On the other hand, if we want to add backtracking protection, we need
> either state that is fully confined to each thread, separately seeded,
> or a global lock. Backtracking protection requires serializing all
> operations on the state because the output from one operation is not
> predictable prior to completing the computation and replaces the
> existing state, while in the counter-based approach, you can just skip
> ahead.
>
>> So I would prefer Rich suggestion of using simpler locks, albeit with
>> less scalability. As you already described in your path, current kernel
>> support of MADV_WIPEONFORK already provides a proper way to handle it.
>> Also, the lock-free optimization is also a future option is current
>> lock seems to be a problem in real-world usage.
>
> Rich doesn't want to use MADV_WIPEONFORK because it essentially wastes a
> page and the corresponding PTEs for a rarely used feature. The other
> approach with the paired locks is similar because it also needs a
> dedicated page.
Right, but if I understood correct, at least for our case glibc will only
pay for the extra page if the program actually calls arc4random, right?
>
>> I also think it would profitable to discuss how to properly handle BZ#4737
>> if fork scalability is an issue. You recently brought if there is a way
>> to detect if the current function has been called (directly or indirectly)
>> from a signal handler [1], not sure if were your idea to handle BZ#4737 or
>> something else (with a proper way we could fix the atfork list scalability
>> issue).
>>
>> [1] https://sourceware.org/ml/libc-help/2016-02/msg00029.html
>
> Calling this recently stretches things a bit. 8-)
>
> Wrapper functions for signal handlers are problematic because we cannot
> atomically set handler flags and the function, so a handler could run
> with the wrong flags.
I think it still doable, albeit it would make sigaction less scalable
and signal handling a bit more costly. However I think this discussion is
worth another topic.