29437 – arc4random is too slow

Bug 29437 - arc4random is too slow

Summary: arc4random is too slow

Status:	NEW

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.36

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2022-08-02 10:46 UTC by Miroslav Lichvar
Modified:	2023-01-09 11:44 UTC (History)
CC List:	6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Miroslav Lichvar 2022-08-02 10:46:15 UTC

The new arc4random using getrandom() seems to have a significant impact on performance of chronyd operating as an NTP server. On an Intel E3-1220 CPU, I see that the maximum number of requests per second dropped by about 25%. That would be an issue for some public NTP servers.

chronyd needs a random generator for the low-order bits below the clock precision in the receive and transmit timestamps to make them less predictable and make off-path attacks more difficult. The generator doesn't necessarily have to be cryptographically secure.

The assumption in the code is that arc4random is fast and buffered, as it is on FreeBSD for instance. It is preferred over getrandom(), for which chrony has to implement its own buffering. With arc4random using getrandom() and no buffering the number of syscalls needed per NTP request is now almost doubled.

This can be fixed easily in the chrony source code, but I thought it might be a good example of a real-world application impacted by this change and there could be others.

If the plan is to fix this by adding getrandom() to the Linux vDSO, would it make sense to wait for it to be available before releasing the new arc4random?

Comment 1 Jason A. Donenfeld 2022-08-02 22:57:10 UTC

I'm working on the vDSO thing. If that lands, it should become really fast. We'll see what happens. But thanks for the report; this is a useful data point.

Comment 2 Jason A. Donenfeld 2022-08-02 22:58:49 UTC

Out of curiosity, though, what kernel version are you seeing the huge speed hit? getrandom() is quite fast on newer kernels.

Comment 3 Miroslav Lichvar 2022-08-03 06:28:30 UTC

I was testing it on 5.18.15-200.fc36.x86_64.

The issue is the overhead of making the syscall. For each timestamp only about 6-22 random bits are needed, but as there is no buffering, the number of syscalls is increased significantly.

The NTP server is basically a loop of

- poll()
- recvmmsg()
- for each valid request:
  - arc4random() for RX timestamp from cmsg
  - arc4random() for TX timestamp taken below
  - clock_gettime() - TX timestamp
  - sendmsg()

With a vDSO clock_gettime() and arc4random calling getrandom() there are now three syscalls in the inner loop instead of one.

Comment 4 Cristian Rodríguez 2022-08-06 13:54:17 UTC

(In reply to Miroslav Lichvar from comment #0)

> This can be fixed easily in the chrony source code, but I thought it might
> be a good example of a real-world application impacted by this change and
> there could be others.

You need a PRNG, seeded by a single call to arc4random_buf.. (xoroshiro128+?)
People are unfortunately rejecting your usecase as invalid because this does not require a csprng. :-( (to me that sums up everything that is wrong, you known you don't need one, but notmal averege joe programmer has no idea of all the subtleties ans is stuck with using random() ..)

Comment 5 Miroslav Lichvar 2022-08-08 11:05:34 UTC

I need a PRNG that is fast and not as easily predictable as random(). That's how arc4random() is documented on FreeBSD and other systems supported by chrony. As I understand it, it's not supposed to be used to generate long-term keys, for instance.

If the reason for adding arc4random() was source compatibility, that won't work well for applications that need high performance. They will still need to support another PRNG.

If new applications assume arc4random() is a high-quality CSPRNG and generate long-term keys with it, they might have security issues on the other systems where this is not the case.

I'm not sure what are the benefits of the current implementation.

Comment 6 Adhemerval Zanella 2022-08-08 12:29:31 UTC

(In reply to Miroslav Lichvar from comment #5)
> I need a PRNG that is fast and not as easily predictable as random(). That's
> how arc4random() is documented on FreeBSD and other systems supported by
> chrony. As I understand it, it's not supposed to be used to generate
> long-term keys, for instance.

Both OpenBSD (and FreeBSD which base its implementation on OpenBSD) [1] and MacOS [2] (which used to be based on OpenBSD, but now has its own implementation) state they are CSRNG (OpenBSD doc is not really clear in the sentence it describes some of the internal, but it is used as CSRNG in the code).

However, the OpenBSD implementation (and all implementations based on the user-managed buffer, like FreeBSDB and libbsd) uses heuristics to trigger the buffer reseed, which is not *suffice*. 

For example, when a virtual machine is forked, restored, or duplicated, it's imperative that the RNG doesn't generate the same outputs. For this reason, there's a small protocol between hypervisors and the kernel that indicates this has happened, alongside some ID, which the RNG uses to reseed immediately, so as not to return the same numbers.

So the kernel becomes the focal point on how to correctly generate and manage cryptographic secure RNG, and one of the drives of arc4random inclusion is it should not provide worse RNG than getrandom syscall.

> 
> If the reason for adding arc4random() was source compatibility, that won't
> work well for applications that need high performance. They will still need
> to support another PRNG.

That one of the reason yes.

> 
> If new applications assume arc4random() is a high-quality CSPRNG and
> generate long-term keys with it, they might have security issues on the
> other systems where this is not the case.

Yes, but it is an improvement for the Linux ecosystem nonetheless. Other systems, like Windows and MacOS are also moving on same direction.

> 
> I'm not sure what are the benefits of the current implementation.

The benefits is to proper provide CSRNG, regaless on how the environment is deployed or configured, to avoid each library deploy wrong and insecure user-based alternatives.

The vDSO discussion to speed up getrandom is still ongoing the main issue seems on who and how should be the one responsible to buffer management. The performance of arc4random was indeed initially taking in consideration [3] (I even spent considerable time to adjust libgcrypt chacha20 optimized implementations).

I wonder if we should add a GNU extension, arc4random_insecure or whatever, similar to what systemd does with 'random_bytes' that uses a user-managed buffer plus chacha20 or any other block cipher.  And document it properly that it was added as PRNG focused on speed. 

[1] https://man.openbsd.org/arc4random.3
[2] https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/arc4random.3.html
[3] https://lore.kernel.org/linux-crypto/Ytx8GKSZfRt+ZrEO@zx2c4.com/
[4] https://github.com/systemd/systemd/blob/main/src/basic/random-util.c

Comment 7 Florian Weimer 2022-08-08 12:35:11 UTC

(In reply to Adhemerval Zanella from comment #6)
> For example, when a virtual machine is forked, restored, or duplicated, it's
> imperative that the RNG doesn't generate the same outputs. For this reason,
> there's a small protocol between hypervisors and the kernel that indicates
> this has happened, alongside some ID, which the RNG uses to reseed
> immediately, so as not to return the same numbers.

Except that this doesn't work because if the suspend/resume happens a little bit later, the unprotected application buffer to which the random data has been written is restore multiple times. So clearly the actual rules around this are different and much more complicated.

Comment 8 Florian Weimer 2022-08-08 12:37:15 UTC

(In reply to Cristian Rodríguez from comment #4)
> (In reply to Miroslav Lichvar from comment #0)
> 
> > This can be fixed easily in the chrony source code, but I thought it might
> > be a good example of a real-world application impacted by this change and
> > there could be others.
> 
> You need a PRNG, seeded by a single call to arc4random_buf.. (xoroshiro128+?)
> People are unfortunately rejecting your usecase as invalid because this does
> not require a csprng. :-( (to me that sums up everything that is wrong, you
> known you don't need one, but notmal averege joe programmer has no idea of
> all the subtleties ans is stuck with using random() ..)

Absolutely not. The idea behind arc4random is that an entire distribution can use this function without worrying performance or properties of the generated stream of randomness. It was designed for a search-and-replace of calls to random, without case-by-case reviews if the change is actually required for security reasons. If the glibc implementation does not meet that, we have failed.

Comment 9 Adhemerval Zanella 2022-08-08 12:38:40 UTC

(In reply to Florian Weimer from comment #7)
> (In reply to Adhemerval Zanella from comment #6)
> > For example, when a virtual machine is forked, restored, or duplicated, it's
> > imperative that the RNG doesn't generate the same outputs. For this reason,
> > there's a small protocol between hypervisors and the kernel that indicates
> > this has happened, alongside some ID, which the RNG uses to reseed
> > immediately, so as not to return the same numbers.
> 
> Except that this doesn't work because if the suspend/resume happens a little
> bit later, the unprotected application buffer to which the random data has
> been written is restore multiple times. So clearly the actual rules around
> this are different and much more complicated.

I am not following, which buffer are you referring? The kernel one application will be poll with arc4random/getrandom or the possible one maintained by application itself with a previous arc4random call? For the latter, it is an application vulnerability where it is caching CSRNG state.

Comment 10 Jason A. Donenfeld 2022-08-08 12:39:21 UTC

> I wonder if we should add a GNU extension, arc4random_insecure or whatever

Oh, no. Don't do that. No more footguns! Even if you call it "_insecure" people will still use it.

Comment 11 Adhemerval Zanella 2022-08-08 12:42:46 UTC

(In reply to Florian Weimer from comment #8)
> (In reply to Cristian Rodríguez from comment #4)
> > (In reply to Miroslav Lichvar from comment #0)
> > 
> > > This can be fixed easily in the chrony source code, but I thought it might
> > > be a good example of a real-world application impacted by this change and
> > > there could be others.
> > 
> > You need a PRNG, seeded by a single call to arc4random_buf.. (xoroshiro128+?)
> > People are unfortunately rejecting your usecase as invalid because this does
> > not require a csprng. :-( (to me that sums up everything that is wrong, you
> > known you don't need one, but notmal averege joe programmer has no idea of
> > all the subtleties ans is stuck with using random() ..)
> 
> Absolutely not. The idea behind arc4random is that an entire distribution
> can use this function without worrying performance or properties of the
> generated stream of randomness. It was designed for a search-and-replace of
> calls to random, without case-by-case reviews if the change is actually
> required for security reasons. If the glibc implementation does not meet
> that, we have failed.

I disagree that we have failed, the main problem is that arc4random implementation from OpenBSD (which is copied on multiple systems and libraries) did not take into consideration the problems Jason has raised, and thus now callers expect a certain performance on any possible system (even if the cryptographic secure part can not be safely enforced).

So the question is should we trade performance or correctness?

Comment 12 Florian Weimer 2022-08-08 12:43:36 UTC

(In reply to Adhemerval Zanella from comment #9)
> (In reply to Florian Weimer from comment #7)
> > (In reply to Adhemerval Zanella from comment #6)
> > > For example, when a virtual machine is forked, restored, or duplicated, it's
> > > imperative that the RNG doesn't generate the same outputs. For this reason,
> > > there's a small protocol between hypervisors and the kernel that indicates
> > > this has happened, alongside some ID, which the RNG uses to reseed
> > > immediately, so as not to return the same numbers.
> > 
> > Except that this doesn't work because if the suspend/resume happens a little
> > bit later, the unprotected application buffer to which the random data has
> > been written is restore multiple times. So clearly the actual rules around
> > this are different and much more complicated.
> 
> I am not following, which buffer are you referring? The kernel one
> application will be poll with arc4random/getrandom or the possible one
> maintained by application itself with a previous arc4random call?

If suspended right after arc4random () returms, this

printf ("%u\n", (unsigned int) arc4random ());

will print the same number after each resumption.

Comment 13 Florian Weimer 2022-08-08 12:44:26 UTC

(In reply to Adhemerval Zanella from comment #11)
> I disagree that we have failed, the main problem is that arc4random
> implementation from OpenBSD (which is copied on multiple systems and
> libraries) did not take into consideration the problems Jason has raised,
> and thus now callers expect a certain performance on any possible system
> (even if the cryptographic secure part can not be safely enforced).

I have been told that they did take these issues into account. They just reached different conclusions.

Comment 14 Jason A. Donenfeld 2022-08-08 12:45:13 UTC

> So the question is should we trade performance or correctness?

Or do neither, and just hold your horses until we can sort this vDSO stuff out in the kernel? That might take some time, but remember: arc4random() didn't even exist in glibc a few weeks ago. It was you all who rushed this in over my objections. Don't now follow that bad decision making to its next bad step of introducing intentional correctness issues. Instead just chill out for a little bit, don't take further actions, and let's see what is possible in the kernel.

Just stop spinning your wheels over this stuff for a while, okay?

Comment 15 Adhemerval Zanella 2022-08-08 12:51:24 UTC

(In reply to Florian Weimer from comment #12)
> (In reply to Adhemerval Zanella from comment #9)
> > (In reply to Florian Weimer from comment #7)
> > > (In reply to Adhemerval Zanella from comment #6)
> > > > For example, when a virtual machine is forked, restored, or duplicated, it's
> > > > imperative that the RNG doesn't generate the same outputs. For this reason,
> > > > there's a small protocol between hypervisors and the kernel that indicates
> > > > this has happened, alongside some ID, which the RNG uses to reseed
> > > > immediately, so as not to return the same numbers.
> > > 
> > > Except that this doesn't work because if the suspend/resume happens a little
> > > bit later, the unprotected application buffer to which the random data has
> > > been written is restore multiple times. So clearly the actual rules around
> > > this are different and much more complicated.
> > 
> > I am not following, which buffer are you referring? The kernel one
> > application will be poll with arc4random/getrandom or the possible one
> > maintained by application itself with a previous arc4random call?
> 
> If suspended right after arc4random () returms, this
> 
> printf ("%u\n", (unsigned int) arc4random ());
> 
> will print the same number after each resumption.

So explicit_bzero will not clear all the buffer if the machine is stop/resumed in the middle of the operation.  It is still used an hardening mechanism.

Comment 16 Jason A. Donenfeld 2022-08-08 12:56:40 UTC

1) It's not just VM forks. It's all reseeds, and the kernel is in the best position to determine when that's necessary. I've made this point over and over in that email thread. Can you just go back and re-read my emails? I do not appreciate you wasting time here by rehashing that.

2) The VM fork case you pointed out is solved with another mechanism we're working on in the kernel, meant for use with ephemeral keys, and if you gimp arc4random(), you'll jeopardize that.

3) Please stop this discussion and just sit still or contribute something useful to the upstream kernel discussion. You're CC'd on that thread and can help move things forward there. Spinning wheels here just wastes my time.