linux-ext4 - Re: Linux 5.3-rc8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6ae36cda-5045-6873-9727-1d36bf45b84e@gmail.com>
Date:   Tue, 17 Sep 2019 21:58:31 +0500
From:   "Alexander E. Patrakov" <patrakov@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Martin Steigerwald <martin@...htvoll.de>
Cc:     Willy Tarreau <w@....eu>, Matthew Garrett <mjg59@...f.ucam.org>,
        "Ahmed S. Darwish" <darwish.07@...il.com>,
        "Theodore Y. Ts'o" <tytso@....edu>,
        Vito Caputo <vcaputo@...garu.com>,
        Lennart Poettering <mzxreary@...inter.de>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Jan Kara <jack@...e.cz>, Ray Strode <rstrode@...hat.com>,
        William Jon McCann <mccann@....edu>,
        zhangjs <zachary@...shancloud.com>, linux-ext4@...r.kernel.org,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: Linux 5.3-rc8

17.09.2019 21:27, Linus Torvalds пишет:
> On Tue, Sep 17, 2019 at 12:33 AM Martin Steigerwald <martin@...htvoll.de> wrote:
>>
>> So yes, that would it make it harder to abuse the API, but not
>> impossible. Which may still be good, I don't know.
> 
> So the real problem is not people abusing the ABI per se. Yes, I was a
> bit worried about that too, but it's not the cause of the immediate
> issue.
> 
> The real problem is that "getrandom(0)" is really _convenient_ for
> people who just want random numbers - and not at all the "secure"
> kind.
> 
> And it's convenient, and during development and testing, it always
> "just works", because it doesn't ever block in any normal situation.
> 
> And then you deploy it, and on some poor users machine it *does*
> block, because the program now encounters the "oops, no entropy"
> situation that it never ever encountered on the development machine,
> because the testing there was mainly done not during booting, but the
> developer also probably had a much more modern machine that had
> rdrand, and that quite possibly also had more services enabled at
> bootup etc so even without rdrand it got tons of entropy.
> 
> That's why
> 
>   (a) killing the process is _completely_ silly.  It misses the whole
> point of the problem in the first place and only makes things much
> worse.
> 
>   (b) we should just change getrandom() and add that GRND_SECURE flag
> instead. Because the current API is fundamentally confusing. If you
> want secure random numbers, you should really deeply _know_ about it,
> and think about it, rather than have it be the "oh, don't even bother
> passing any flags, it's secure by default".
> 
>   (c) the timeout approach isn't wonderful, but it at least helps with
> the "this was never tested under those circumstances" kind of problem.
> 
> Note that the people who actually *thought* about getrandom() and use
> it correctly should already handle error returns (even for the
> blocking version), because getrandom() can already return EINTR. So
> the argument that we should cater primarily to the secure key people
> is not all that strong. We should be able to return EINTR, and the
> people who *thought* about blocking and about entropy should be fine.
> 
> And gdm and other silly random users that never wanted entropy in the
> first place, just "random" random numbers, wouldn't be in the
> situation they are now.
> 
> That said - looking at some of the problematic traces that Ahmed
> posted for his bootup problem, I actually think we can use *another*
> heuristic to solve the problem. Namely just looking at how much
> randomness the caller wants.
> 
> The processes that ask for randomness for an actual secure key have a
> very fundamental constraint: they need enough randomness for the key
> to be secure in the first place.
> 
> But look at what gnome-shell and gnome-session-b does:
> 
>      https://lore.kernel.org/linux-ext4/20190912034421.GA2085@darwi-home-pc/
> 
> and most of them already set GRND_NONBLOCK, but look at the
> problematic one that actually causes the boot problem:
> 
>      gnome-session-b-327   4.400620: getrandom(16 bytes, flags = 0)
> 
> and here the big clue is: "Hey, it only asks for 128 bits of randomness".
> 
> Does anybody believe that 128 bits of randomness is a good basis for a
> long-term secure key? Even if the key itself contains than that, if
> you are generating a long-term secure key in this day and age, you had
> better be asking for more than 128 bits of actual unpredictable base
> data. So just based on the size of the request we can determine that
> this is not hugely important.
> 
> Compare that to the case later on for something that seems to ask for
> actual interesting randomness. and - just judging by the name -
> probably even has a reason for it:
> 
>        gsd-smartcard-388   51.433924: getrandom(110 bytes, flags = 0)
>        gsd-smartcard-388   51.433936: getrandom(256 bytes, flags = 0)
> 
> big difference.
> 
> End result: I would propose the attached patch.
> 
> Ahmed, can you just verify that it works for you (obviously with the
> ext4 plugging reinstated)? It looks like it should "obviously" fix
> things, but still...

I have looked at the patch, but have not tested it.

I am worried that the getrandom delays will be serialized, because 
processes sometimes run one after another. If there are enough 
chained/dependent processes that ask for randomness before it is ready, 
the end result is still a too-big delay, essentially a failed boot.

In other words: your approach of adding delays only makes sense for 
heavily parallelized boot, which may not be the case, especially for 
embedded systems that don't like systemd.

-- 
Alexander E. Patrakov