This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Directions for arc4random



On 29/10/2018 15:55, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 25/10/2018 16:13, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> On 22/10/2018 18:21, Florian Weimer wrote:
>>>>> I would like to resubmit my arc4random patch.
>>>>>
>>>>> Most of the code, and the difficult-to-review parts, are for providing
>>>>> fork safety even if the process is forked by a system call, and not a
>>>>> call to the glibc function fork.
>>>>>
>>>>> Rich Felker said that he'd prefer to use a proper lock for this in musl:
>>>>>
>>>>>   <https://www.openwall.com/lists/musl/2018/07/02/5>
>>>>>
>>>>> This would make arc4random far less scalable if one global state is
>>>>> used.  But the implementation would be greatly simplified by this,
>>>>> especially if we do not provide async-signal-safety for arc4random.  (If
>>>>> we always lock around arc4random invocations and the entire
>>>>> cryptographic operations, then I think this would detect fork misuse due
>>>>> to hangs on that lock.)
>>>
>>> (I believe this is wrong—locks won't help us, the bit streams will still
>>> not converge.)
>>
>> Do you mean this won't help us on async-signal-safety for arc4random or
>> locking over arc4random crypto operation is not suffice to provide fork
>> detection?
> 
> The latter.
> 
>>>>> I would like to know if I should resubmit my last, very complex
>>>>> approach, or if I should switch to the simple, lock-based approach.
>>>>
>>>> It was my backlog list for some time. The first problem I see is the 
>>>> the huge size which makes reviewing quite complicate.
>>>>
>>>> The first step to try simplify patch submission is remove all x86
>>>> optimizations for AES-NI/SSE2 as you already indicate in your initial
>>>> submission.
>>>
>>> It's AES-NI and RDRAND, demonstrating the use of both CPU extension
>>> hooks to verify that they are usable.  I think they do not substantially
>>> complicate review of the rest of the patch.
>>
>> My point is the change is orthogonal to the arc4random and it could be
>> send in a different patch.  The original patch is already quite
>> complex and moving the arch-specific bits to a separate patch
>> simplifies review on both changes.
> 
> Sure, but then you don't see how the arch hooks are to be used, and to
> review those bits, you'll have to look at the arch patch anyway.

But on generic version only the architecture would be reviewed and won't
prematurely tied to an specific ABI. I will check how hard would be to
support aarch64 on current proposal.

> 
>>>> Another thing is I tend to agree with Rich, the lock-free fork/clone
>>>> detection is quite complicated and possible troublesome for maintainability. 
>>>> Since we have a long-standing issue of a no async-signal-safe fork 
>>>> implementation (BZ#4737) and fork implementation still contain some 
>>>> lock contention on multithread fork (due atfork list handling, libio 
>>>> internal list, and malloc) I think such lock-free complexity would not 
>>>> yield much gain.
>>>
>>> The lock is eliminated not from fork (which doesn't have to care if we
>>> are willing to reseed in the new process), but from regular arc4random
>>> calls.
>>
>> Right, my confusion here, this is indeed another topic.  
>>
>> Checking on other arc4random implementation (OpenBSD one which seems to 
>> support Linux), its fork handling seems sketchy: it uses a atfork handler 
>> and detect a fork by checking a global variable set by the handler plus 
>> against current pid (get with getpid).  It incurs in an extra syscall, it 
>> does not handle pid wrap around, and requires a global lock over arc4random 
>> on fork detect operation.
> 
> The OpenBSD implementation on OpenBSD uses minhert(2), which gives a
> zeroed pool after fork, and it should therefore reseed in the new
> process.  This is really subtle, a few comments in the code certainly
> wouldn't hurt.

Agree, although the fork handling scheme I described the is the one used
on Linux (lib/libcrypto/arc4random/arc4random_linux.h and it does not support
MADV_WIPEONFORK).

> 
>> The MADV_WIPEONFORK might help the above scenario, but it will still 
>> requires a lock over the symbol but simplifies the fork detection (but we 
>> will still need to handle kernels without MADV_WIPEONFORK support).
> 
> Right, and the double counters cover those kernels.
> 
>> I think the issues with the described fork handling approach without
>> MADV_WIPEONFORK can be fixed, so the question is whether we should set
>> multithread performance as implementation focus. The current arc4random
>> interface, without user-controllable and/or detached PRNGs state, imposes
>> more resource state and at least some synchronization. I would expect
>> a performant implementation to dissociate if from the interface, as C++
>> random for instance, which might require no synchronization at all 
>> depending of the usage.
>>
>> So do you expect realword scenarios to require high performant arc4random?
>> I don't have a strong opinion here, so I am ok we think performance should
>> be a QoI for arc4random.
> 
> We can't have too many generators if each of them needs to be seeded
> separately from /dev/urandom.  Anything finer-grained than thread-local
> will likely be problematic.

At least C++ interfaces allows you use a different seeder depending of the 
requirement, but I am not sure if we would require less secure PRNG besides 
for testing and/or benchmarks.

> 
>> Right, but if I understood correct, at least for our case glibc will only
>> pay for the extra page if the program actually calls arc4random,
>> right?
> 
> Yes, but I want to use arc4random to expand the initial AT_RANDOM cookie
> into more bytes, for more ASLR for mapping shared objects, and for the
> heap protector.  So all programs would use it, unless we add further
> complication.

Right, so AT_RANDOM cookie extension might be potentially used on dlopen
and heap protector on malloc.  I am not sure if dlopen should be considered
a scenario for push for an optimized arc4random, but malloc most likely is.
Do you plan to push the heap protector as default feature or as interposing
malloc for debugging (I am not sure if you already discussed it on mailing
list)?

> 
> We could avoid that once we get to that, but it's going to add even more
> complexity.
> 
>>>> I also think it would profitable to discuss how to properly handle BZ#4737
>>>> if fork scalability is an issue. You recently brought if there is a way
>>>> to detect if the current function has been called (directly or indirectly) 
>>>> from a signal handler [1], not sure if were your idea to handle BZ#4737 or 
>>>> something else (with a proper way we could fix the atfork list scalability 
>>>> issue).
>>>>
>>>> [1] https://sourceware.org/ml/libc-help/2016-02/msg00029.html
>>>
>>> Calling this recently stretches things a bit. 8-)
>>>
>>> Wrapper functions for signal handlers are problematic because we cannot
>>> atomically set handler flags and the function, so a handler could run
>>> with the wrong flags.
>>
>> I think it still doable, albeit it would make sigaction less scalable
>> and signal handling a bit more costly. However I think this discussion is
>> worth another topic.
> 
> Agreed, I'd like to read your thoughts on that.
> 
> Thanks,
> Florian
> 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]