Bug 16522 - On sha* password generation, select hash rounds to achieve given computation time based on hash computation speed
Summary: On sha* password generation, select hash rounds to achieve given computation ...
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: crypt (show other bugs)
Version: 2.18
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-03 17:11 UTC by David Jaša
Modified: 2018-01-26 20:50 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Jaša 2014-02-03 17:11:20 UTC
When glibc encodes a password using sha256 or sha512 method and no indication of desired rounds is given (e.g. in login.defs or libuser.conf), the rounds count is set to 5000:
https://sourceware.org/git/?p=glibc.git;a=blob;f=crypt/sha256-crypt.c#l86
https://sourceware.org/git/?p=glibc.git;a=blob;f=crypt/sha512-crypt.c#l86
This behaviour is getting suboptimal in light of recent rise hash-crunching on GPUs and dedicated chips (driven by cryptocurrencies mining but usable for password cracking as well). IMO glibc should employ similar approach to cryptsetup's: choose password hash computation time as long as possible without causing user inconvenience or DoS potential. For example, cryptsetup/LUKS uses 1 s:
# cryptsetup --help
...
Default PBKDF2 iteration time for LUKS: 1000 (ms)

Which yields this theoretical and practical iteration count for on my 2010 machine:
# cryptsetup benchmark
...
PBKDF2-sha256     204161 iterations per second
PBKDF2-sha512     134570 iterations per second
# cryptsetup luksDump /dev/sda2 | grep -B1 Iterations
Key Slot 0: ENABLED
	Iterations:         	160803
--
Key Slot 1: ENABLED
	Iterations:         	154588

In other words cryptsetup/LUKS with default settings gives about 30x better protection to user password than glibc with default settings. While glibc must take into account DoS scenarios that do not apply to cryptsetup (such as attacker trying to log in over as many ssh connections as possible), these considerations should result in shorter computation time (my guess: 500 ms instead of 1000) instead of choosing fixed rounds count that actually provides constantly decreasing security level.

The most conservative approach wrt compatibility would be to take time to run password verification on 5% of slowest devices where glibc is deployed and use this (or a value within an order of magnitude), this way glibc behaviour on slow machines won't change while fast machines will get much better protection.


The problem was spotted on glibc 2.18 (Fedora 20) but quick look to source code (above in the report) shows that git HEAD is still affected.
Comment 1 Carlos O'Donell 2014-02-03 18:04:27 UTC
(In reply to David Jaša from comment #0)
> In other words cryptsetup/LUKS with default settings gives about 30x better
> protection to user password than glibc with default settings. While glibc
> must take into account DoS scenarios that do not apply to cryptsetup (such
> as attacker trying to log in over as many ssh connections as possible),
> these considerations should result in shorter computation time (my guess:
> 500 ms instead of 1000) instead of choosing fixed rounds count that actually
> provides constantly decreasing security level.

Agreed. Do you have a patch to make this work?
Comment 2 Rich Felker 2014-02-04 06:21:03 UTC
I'm a bit concerned about this proposal. What happens when your hashes are shared between multiple machines (e.g. a very fast server and multiple thin clients) or when you're setting up a VE image for cpu-limited hosting or a system image to run on a lower-end machine using a higher-end one? I think it's flawed to assume that the machine on which hashes will later be validated is as capable as the machine on which the original hashes are generated. Whether this is an acceptable flaw (i.e. whether the benefit is worth dealing with the side effects of this flaw) is a matter for discussion.
Comment 3 David Jaša 2014-02-04 15:26:00 UTC
(In reply to Carlos O'Donell from comment #1)
> ...
> Agreed. Do you have a patch to make this work?

I prefer to leave code writing to people who can actually do it. :)

(In reply to Rich Felker from comment #2)
> I'm a bit concerned about this proposal. What happens when your hashes are
> shared between multiple machines (e.g. a very fast server and multiple thin
> clients) 

Very fast serve means these days a lot of cores, doesn't it? If so, then the performance gap is still big, but manageable. I'd still prefer "reasonably secure by default even if it causes noticeable nuisance on worst 10 % of devices" over "it works everywhere b

> or when you're setting up a VE image for cpu-limited hosting or a
> system image to run on a lower-end machine using a higher-end one? 

Aren't you using ssh keys for such use cases anyway? What is their complexity (time to verify login)?

> I think
> it's flawed to assume that the machine on which hashes will later be
> validated is as capable as the machine on which the original hashes are
> generated. Whether this is an acceptable flaw (i.e. whether the benefit is
> worth dealing with the side effects of this flaw) is a matter for discussion.

Consider partial disk encryption: on my system, cryptsetup uses PBKDF2 with 155-160k iterations to achieve 1s verification time. When I set maximum rounds count for crypt/sha512 of 10M, the verification takes 16 s, yielding 650k rounds to achieve 1 s running time, so 5k rounds should take 5/650 = 1/130 = 0.007 s. If the crypt/sha512 to pbkdf2 running time ratio is constant and users choose the same password for disk encryption and user account verification, the latter forms an avenue for side-channel attack to the former, reducing its effective strenght by 2^7 bits...

IMO good target would be to have bearable performance on Raspberry Pi. Cryptsetup FAQ [1] states:
> Note: Passphrase iteration is determined by cryptsetup depending on CPU
> power. On a slow device, this may be lower than you want. I recently
> benchmarked this on a Raspberry Pi and it came out at about 1/15 of the
> iteration count for a typical PC. If security is paramount, you may want to
> increase the time spent in iteration, at the cost of a slower unlock later.

So e.g. choosing 0.3 s target time on recent computer (= 195k rounds on my system) would result in 4.5 s verification time on Raspberry Pi. Slow but bearable and _way_ more secure than default 5k.

[1] http://code.google.com/p/cryptsetup/wiki/FrequentlyAskedQuestions#2._Setup
Comment 4 David Jaša 2014-02-04 15:33:56 UTC
(In reply to David Jaša from comment #3)
> ...
> 
> Very fast serve means these days a lot of cores, doesn't it? If so, then the
> performance gap is still big, but manageable. I'd still prefer "reasonably
> secure by default even if it causes noticeable nuisance on worst 10 % of
> devices" over "it works everywhere b

... but it's easily breakable by casual attackers.
Comment 5 Rich Felker 2014-02-04 19:14:52 UTC
I see we're working with very different versions of "bearable". To me, 0.3s is hardly "bearable" and 4.5s is utterly atrocious. And this is all assuming there's only a single password being validated at a time and that you don't care about 0.3 to 4.5 seconds of 100% cpu load interfering with whatever else is running on the system.

Yes, for ssh login, most people should be using public key authentication, not passwords. But password hashes are used for lots of other purposes too other than unix account login. In cases where password hashes are being used, their strength really needs to be tuned to the application, which is highly dependent on things like number of users, expected cpu load, etc. and not just a function of the raw cpu speed. So I'm skeptical of attempts to automate choosing a strength parameter based on the latter...
Comment 6 David Jaša 2014-02-06 16:13:39 UTC
> I see we're working with very different versions of "bearable".

I'm assuming that:
  * password verification is fairly infrequent event:
    * local users log in (unlock screen) only every now and then
    * web app users log in once through web form and since then, their
      browser uses cookies to persist session
  * password verification takes place in context of other delays such as:
    * password typing (local login)
    * network latency (web app)
That makes me confident that delay that is noticeable but not really long is OK. That should be around 0.5 s.

> But password hashes are used for lots of other purposes too other than
> unix account login.

What are the other purposes? Password hashes should be by definition as slow as possible without annoying their users. Where their hard brute-force reversal property is not important, they should be replaced by plain fast hashes...

> In cases where password hashes are being used, their strength really needs
> to be tuned to the application,

for that case, application can always override rounds=%i parameter to a value that would suit it (ms=%i parameter similar to cryptsetup would make more sense though).

The point is that standard procedures are long-lived while computers (and dedicated devices) are getting more powerful over time so the results generated by unchanged procedure should somehow generate stronger results at later time. Basing number of rounds on arbitrary time seems thus quite good choice as even barely-noticeable 0.1 s yields stronger result to the default round count. Would RFEs to base rounds on running time and using that as a default method be less controversial with you?
Comment 7 David Jaša 2014-02-06 16:43:55 UTC
Increase to number of rounds to number makes this extra work for user and attacker:

rounds   user time    user work factor   p/s         attacker work factor
5000     << 0.1 s     1                  ~190        1
625000   ~ 1 s        125                ~3          ~ 60
5000000  ~ 8 s        1000               0.16-0.19   ~ 1000-1200

the question is how would this fare with GPU or FPGA/ASIC-equipped attacker (that seems quite likely). [1] suggests that sha512 is less friendly to GPUs or HW chips to sha256 and HW development will focus for some time on sha256 as that is what cryptocurrency mining requires so pretty much any increase in rounds should be safe short- and medium-term but given the inertia inherent in such low-level base system things, it seems a good time to look for alternatives to both parameters of current schemes and the schemes themselves.

[1] http://www.openwall.com/presentations/Passwords12-The-Future-Of-Hashing/Passwords12-The-Future-Of-Hashing.pdf
Comment 8 Rich Felker 2014-02-06 17:30:45 UTC
No, what I'm objecting to is entirely making the behavior of this function dependent on the cpu speed on which it runs. I consider that bad practice (non-determinism) and also an invalid assumption (that the cpu the hash is generated on will have any relation to the cpu it's later verified on). Tuning this via run time rather than raw cpu speed would be even worse, since an attacker could artificially load the cpu to increase run time and thereby control the rounds parameter.
Comment 9 David Jaša 2014-02-07 16:31:50 UTC
IMO basing the hash strength on CPU speed is good as long as sensible minimum is set. If the minimum is raised from current minimum to current default, the worst case scenario is that security will be the same as status quo while it will be better on average. This behaviour is good enough IMO for the time being.

I see another problem here though: when sha512 hashing gets definitely cheap, keeping the minimum amount of rounds low will create "pockets" of low-spec password hashes on systems that were using the minimums while on average, hashes will still be safe (and it will take quite some time to update all affected systems) so it would be helpful to have an idea what are the slowest devices where glibc runs these days so that the minimum rounds count can be tailored to them so that that gap is as small as possible.
Comment 10 David Jaša 2014-02-07 16:36:50 UTC
Addendum to comment 7: the passwors/s numbers are per core so if attacker were using my own system, her effective work factor would be half of the one in table.
Comment 11 Carlos O'Donell 2014-02-07 16:44:53 UTC
(In reply to David Jaša from comment #9)
> IMO basing the hash strength on CPU speed is good as long as sensible
> minimum is set. If the minimum is raised from current minimum to current
> default, the worst case scenario is that security will be the same as status
> quo while it will be better on average. This behaviour is good enough IMO
> for the time being.

Agreed.
 
> I see another problem here though: when sha512 hashing gets definitely
> cheap, keeping the minimum amount of rounds low will create "pockets" of
> low-spec password hashes on systems that were using the minimums while on
> average, hashes will still be safe (and it will take quite some time to
> update all affected systems) so it would be helpful to have an idea what are
> the slowest devices where glibc runs these days so that the minimum rounds
> count can be tailored to them so that that gap is as small as possible.

I think the default count will need to increase to meet newer security requirements. Keeping the count low for these systems does not offer them the best security. If they need to loosen the security of their systems the need to do so consciously by changing the compile time constants (with tunnables we plan to put all of them into a single file which you might edit before building glibc... at your own risk).

I don't fully agree with Rich's determinism statement. However, I also have no evidence to back up my claim that a completely deterministic count is useful or not useful. Other than to say that that the count should be selected at program startup and remain constant for the life of the program. That way it can be inspected by a debugger and used to reproduce a similar environment. Either setup is deterministic in that case, one is just determined by the CPU, and both can act in some ways as timing oracles (revealing some information about the system to an attacker). Making things random or constant time removes the timing oracle.