linux-ext4 - Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <C5BE6F0F-15B1-404B-A490-5B4E5C8C61A0@amacapital.net>
Date:   Fri, 20 Sep 2019 12:52:28 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     Willy Tarreau <w@....eu>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Ahmed S. Darwish" <darwish.07@...il.com>,
        Lennart Poettering <mzxreary@...inter.de>,
        "Theodore Y. Ts'o" <tytso@....edu>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        "Alexander E. Patrakov" <patrakov@...il.com>,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Matthew Garrett <mjg59@...f.ucam.org>,
        lkml <linux-kernel@...r.kernel.org>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Linux API <linux-api@...r.kernel.org>,
        linux-man <linux-man@...r.kernel.org>
Subject: Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()



> On Sep 20, 2019, at 12:37 PM, Willy Tarreau <w@....eu> wrote:
> 
> On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
>> Perhaps userland could register a helper that takes over and does
>> something better?
> 
> If userland sees the failure it can do whatever the developer/distro
> packager thought suitable for the system facing this condition.
> 
>> But I think the kernel really should do something
>> vaguely reasonable all by itself.
> 
> Definitely, that's what Linus' proposal was doing. Sleeping for some time
> is what I call "vaguely reasonable".

I don’t buy it. We have existing programs that can deadlock on boot. Just throwing -EAGAIN at them in a syscall that didn’t previously block does not strike me as reasonable.

> 
>> If nothing else, we want the ext4
>> patch that provoked this whole discussion to be applied,
> 
> Oh absolutely!
> 
>> which means
>> that we need to unbreak userspace somehow, and returning garbage it to
>> is not a good choice.
> 
> It depends how it's used. I'd claim that we certainly use randoms for
> other things (such as ASLR/hashtables) *before* using them to generate
> long lived keys thus we can have a bit more time to get some more
> entropy before reaching the point of producing these keys.

The problem is that we don’t know what userspace is doing with the output from getrandom(..., 0), so I think we have to be conservative. New kernels need to work with old user code. It’s okay if they’re slower to boot than they could be.

> 
>> Here are some possible approaches that come to mind:
>> 
>> int count;
>> while (crng isn't inited) {
>>  msleep(1);
>> }
>> 
>> and modify add_timer_randomness() to at least credit a tiny bit to
>> crng_init_cnt.
> 
> Without a timeout it's sure we'll still face some situations where
> it blocks forever, which is the current problem.

The point is that we keep the timer running by looping like this, which should cause add_timer_randomness() to get called continuously, which should prevent the deadlock.  I assume the deadlock is because we go into nohz-idle and we sit there with nothing happening at all.

> 
>> Or we do something like intentionally triggering readahead on some
>> offset on the root block device.
> 
> You don't necessarily have such a device, especially when you're
> in an initramfs. It's precisely where userland can be smarter. When
> the caller is sfdisk for example, it does have more chances to try
> to perform I/O than when it's a tiny http server starting to present
> a configuration page.

What I mean is: allow user code to register a usermode helper that helps get entropy. Or just convince distros to bundle some useful daemon that starts at early boot and lives in the initramfs.