[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C5BE6F0F-15B1-404B-A490-5B4E5C8C61A0@amacapital.net>
Date: Fri, 20 Sep 2019 12:52:28 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Willy Tarreau <w@....eu>
Cc: Andy Lutomirski <luto@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Ahmed S. Darwish" <darwish.07@...il.com>,
Lennart Poettering <mzxreary@...inter.de>,
"Theodore Y. Ts'o" <tytso@....edu>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
"Alexander E. Patrakov" <patrakov@...il.com>,
Michael Kerrisk <mtk.manpages@...il.com>,
Matthew Garrett <mjg59@...f.ucam.org>,
lkml <linux-kernel@...r.kernel.org>,
Ext4 Developers List <linux-ext4@...r.kernel.org>,
Linux API <linux-api@...r.kernel.org>,
linux-man <linux-man@...r.kernel.org>
Subject: Re: [PATCH RFC v4 1/1] random: WARN on large getrandom() waits and introduce getrandom2()
> On Sep 20, 2019, at 12:37 PM, Willy Tarreau <w@....eu> wrote:
>
> On Fri, Sep 20, 2019 at 12:22:17PM -0700, Andy Lutomirski wrote:
>> Perhaps userland could register a helper that takes over and does
>> something better?
>
> If userland sees the failure it can do whatever the developer/distro
> packager thought suitable for the system facing this condition.
>
>> But I think the kernel really should do something
>> vaguely reasonable all by itself.
>
> Definitely, that's what Linus' proposal was doing. Sleeping for some time
> is what I call "vaguely reasonable".
I don’t buy it. We have existing programs that can deadlock on boot. Just throwing -EAGAIN at them in a syscall that didn’t previously block does not strike me as reasonable.
>
>> If nothing else, we want the ext4
>> patch that provoked this whole discussion to be applied,
>
> Oh absolutely!
>
>> which means
>> that we need to unbreak userspace somehow, and returning garbage it to
>> is not a good choice.
>
> It depends how it's used. I'd claim that we certainly use randoms for
> other things (such as ASLR/hashtables) *before* using them to generate
> long lived keys thus we can have a bit more time to get some more
> entropy before reaching the point of producing these keys.
The problem is that we don’t know what userspace is doing with the output from getrandom(..., 0), so I think we have to be conservative. New kernels need to work with old user code. It’s okay if they’re slower to boot than they could be.
>
>> Here are some possible approaches that come to mind:
>>
>> int count;
>> while (crng isn't inited) {
>> msleep(1);
>> }
>>
>> and modify add_timer_randomness() to at least credit a tiny bit to
>> crng_init_cnt.
>
> Without a timeout it's sure we'll still face some situations where
> it blocks forever, which is the current problem.
The point is that we keep the timer running by looping like this, which should cause add_timer_randomness() to get called continuously, which should prevent the deadlock. I assume the deadlock is because we go into nohz-idle and we sit there with nothing happening at all.
>
>> Or we do something like intentionally triggering readahead on some
>> offset on the root block device.
>
> You don't necessarily have such a device, especially when you're
> in an initramfs. It's precisely where userland can be smarter. When
> the caller is sfdisk for example, it does have more chances to try
> to perform I/O than when it's a tiny http server starting to present
> a configuration page.
What I mean is: allow user code to register a usermode helper that helps get entropy. Or just convince distros to bundle some useful daemon that starts at early boot and lives in the initramfs.
Powered by blists - more mailing lists