linux-kernel - Re: [PATCH] random: Don't overwrite CRNG state in crng

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170209041931.xgkmysquazppiewx@thunk.org>
Date:   Wed, 8 Feb 2017 23:19:31 -0500
From:   Theodore Ts'o <tytso@....edu>
To:     Alden Tondettar <alden.tondettar@...il.com>
Cc:     Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] random: Don't overwrite CRNG state in crng_initialize()

On Wed, Feb 08, 2017 at 08:31:26PM -0700, Alden Tondettar wrote:
> The new non-blocking system introduced in commit e192be9d9a30 ("random:
> replace non-blocking pool with a Chacha20-based CRNG") can under
> some circumstances report itself initialized while it still contains
> dangerously little entropy, as follows:
> 
> Approximately every 64th call to add_interrupt_randomness(), the "fast"
> pool of interrupt-timing-based entropy is fed into one of two places. At
> calls numbered <= 256, the fast pool is XORed into the primary CRNG state.
> At call 256, the CRNG is deemed initialized, getrandom(2) is unblocked,
> and reading from /dev/urandom no longer gives warnings.
> 
> At calls > 256, the fast pool is fed into the input pool, leaving the CRNG
> untouched.
> 
> The problem arises between call number 256 and 320. If crng_initialize()
> is called at this time, it will overwrite the _entire_ CRNG state with
> 48 bytes generated from the input pool.

So in practice this isn't a problem because crng_initialize is called
in early init.   For reference, the ordering of init calls are:

	"early",  <--- crng_initialize is here()
	"core",   <---- ftrace is initialized here()
	"postcore",
	"arch",
	"subsys",  <---- acpi_init is here()
	"fs",
	"device",  <---- device probing is here
	"late",

So in practice, call 256 typically happens **well** after
crng_initialize.  You can see where it is the boot messages, which is
after 2.5 seconds into the boot:

[    2.570733] rtc_cmos 00:02: alarms up to one month, y3k, 114 bytes nvram, hpet irqs
[    2.570863] usbcore: registered new interface driver i2c-tiny-usb
[    2.571035] device-mapper: uevent: version 1.0.3
[    2.571215] random: fast init done  <-------------
[    2.571316] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: dm-devel@...hat.com
[    2.571678] device-mapper: multipath round-robin: version 1.1.0 loaded
[    2.571728] intel_pstate: Intel P-state driver initializing
[    2.572331] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
[    2.572462] intel_pstate: HWP enabled
[    2.572464] sdhci: Secure Digital Host Controller Interface driver

When is crng_initialize() called?  Sometime *before* 0.05 seconds into
the boot on my laptop:

[    0.054529] ftrace: allocating 29140 entries in 114 pages

> In short, the situation is:
> 
> A) No usable hardware RNG or arch_get_random() (or we don't trust it...)
> B) add_interrupt_randomness() called 256-320 times but other
>    add_*_randomness() functions aren't adding much entropy.
> C) then crng_initialize() is called
> D) not enough calls to add_*_randomness() to push the entropy
>    estimate over 128 (yet)
> E) getrandom(2) or /dev/urandom used for something important
> 
> Based on a few experiments with VMs, A) through D) can occur easily in
> practice. And with no HDD we have a window of about a minute or two for
> E) to happen before add_interrupt_randomness() finally pushes the
> estimate over 128 on its own.

How did you determine when crng_initialize() was being called?  On a
VM generally there are fewer interrupts than on real hardware.  On
KVM, for I see the random: fast_init message being printed 3.6 seconds
into the boot.

On Google Compute Engine, the fast_init message happens 52 seconds into the
boot.

So what VM where you using?  I'm trying to figure out whether this is
hypothetical or real problem, and on what systems.

	     	     	      	     - Ted