linux-ext4 - Re: Linux 5.3-rc8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190910115635.GB2740@mit.edu>
Date:   Tue, 10 Sep 2019 07:56:35 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     "Ahmed S. Darwish" <darwish.07@...il.com>
Cc:     Andreas Dilger <adilger.kernel@...ger.ca>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jan Kara <jack@...e.cz>, zhangjs <zachary@...shancloud.com>,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Linux 5.3-rc8

On Tue, Sep 10, 2019 at 06:21:07AM +0200, Ahmed S. Darwish wrote:
> 
> The commit b03755ad6f33 (ext4: make __ext4_get_inode_loc plug), [1]
> which was merged in v5.3-rc1, *always* leads to a blocked boot on my
> system due to low entropy.
> 
> The hardware is not a VM: it's a Thinkpad E480 (i5-8250U CPU), with
> a standard Arch user-space.

Hmm, I'm not seeing this on a Dell XPS 13 (model 9380) using a Debian
Bullseye (Testing) running a rc4+ kernel.

This could be because Debian is simply doing more I/O; or it could be
because I don't have some package installed which is trying to reading
from /dev/random or calling getrandom(2).  Previously, Fedora ran into
blocking issues because of some FIPS compliance patches to some
userspace daemons.  So it's going to be very user space dependent and
package dependent.

> It seems that batching the directory lookup I/O requests (which are
> possibly a lot during boot) is minimizing sources of disk-activity-
> induced entropy? [2] [3]
> 
> Can this even be considered a user-space breakage? I'm honestly not
> sure. On my modern RDRAND-capable x86, just running rng-tools rngd(8)
> early-on fixes the problem. I'm not sure about the status of older
> CPUs though.

You can probably also fix this problem by adding random.trust_cpu=true
to the boot command line, or by enabling CONFIG_RANDOM_TRUST_CPU.
This obviously assumes that you trust Intel's implementation of
RDRAND, but that's true regardless of whether of whether you use rngd
or the kernel config option.

As far as whether it's considered user-space breakage; that's though.
File system performance improvements can cause a reduced amount of
I/O, and that can cause less entropy to be collected, and depending on
a complex combination of kernel config options, distribution-specific
patches, and what packages are loaded, that could potentially cause
boot hangs waiting for entropy.  Does that we we're can't make any
file system performace improvements?  Surely that doesn't seem like
the right answer.

It would be useful to figure out what process is blocking waiting on
entropy, since in general, trying to rely on cryptographic entropy in
early boot, especially if it is to generate cryptographic keys, is
going to be more dangerous compared to a "just in time" approach to
generating crypto keys.  So this could also be considered a userspace
bug, depending on your point of view...

					- Ted