lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871a40ac-2e69-a0db-39a3-4f17abbd8b6b@gmail.com>
Date:   Tue, 17 Sep 2019 17:46:56 +0500
From:   "Alexander E. Patrakov" <patrakov@...il.com>
To:     "Ahmed S. Darwish" <darwish.07@...il.com>,
        "Theodore Y. Ts'o" <tytso@....edu>
Cc:     Martin Steigerwald <martin@...htvoll.de>, Willy Tarreau <w@....eu>,
        Matthew Garrett <mjg59@...f.ucam.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Vito Caputo <vcaputo@...garu.com>,
        Lennart Poettering <mzxreary@...inter.de>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Jan Kara <jack@...e.cz>, Ray Strode <rstrode@...hat.com>,
        William Jon McCann <mccann@....edu>,
        zhangjs <zachary@...shancloud.com>, linux-ext4@...r.kernel.org,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: Linux 5.3-rc8

17.09.2019 17:30, Ahmed S. Darwish пишет:
> On Tue, Sep 17, 2019 at 08:11:56AM -0400, Theodore Y. Ts'o wrote:
>> On Tue, Sep 17, 2019 at 09:33:40AM +0200, Martin Steigerwald wrote:
>>> Willy Tarreau - 17.09.19, 07:24:38 CEST:
>>>> On Mon, Sep 16, 2019 at 06:46:07PM -0700, Matthew Garrett wrote:
>>>>>> Well, the patch actually made getrandom() return en error too, but
>>>>>> you seem more interested in the hypotheticals than in arguing
>>>>>> actualities.>
>>>>> If you want to be safe, terminate the process.
>>>>
>>>> This is an interesting approach. At least it will cause bug reports in
>>>> application using getrandom() in an unreliable way and they will
>>>> check for other options. Because one of the issues with systems that
>>>> do not finish to boot is that usually the user doesn't know what
>>>> process is hanging.
>>>
>>
>> I would be happy with a change which changes getrandom(0) to send a
>> kill -9 to the process if it is called too early, with a new flag,
>> getrandom(GRND_BLOCK) which blocks until entropy is available.  That
>> leaves it up to the application developer to decide what behavior they
>> want.
>>
> 
> Yup, I'm convinced that's the sanest option too. I'll send a final RFC
> patch tonight implementing the following:
> 
> config GETRANDOM_CRNG_ENTROPY_MAX_WAIT_MS
> 	int
> 	default 3000
> 	help
> 	  Default max wait in milliseconds, for the getrandom(2) system
> 	  call when asking for entropy from the urandom source, until
> 	  the Cryptographic Random Number Generator (CRNG) gets
> 	  initialized.  Any process exceeding this duration for entropy
> 	  wait will get killed by kernel. The maximum wait can be
> 	  overriden through the "random.getrandom_max_wait_ms" kernel
> 	  boot parameter. Rationale follows.
> 
> 	  When the getrandom(2) system call was created, it came with
> 	  the clear warning: "Any userspace program which uses this new
> 	  functionality must take care to assure that if it is used
> 	  during the boot process, that it will not cause the init
> 	  scripts or other portions of the system startup to hang
> 	  indefinitely.
> 
> 	  Unfortunately, due to multiple factors, including not having
> 	  this warning written in a scary enough language in the
> 	  manpages, and due to glibc since v2.25 implementing a BSD-like
> 	  getentropy(3) in terms of getrandom(2), modern user-space is
> 	  calling getrandom(2) in the boot path everywhere.
> 
> 	  Embedded Linux systems were first hit by this, and reports of
> 	  embedded system "getting stuck at boot" began to be
> 	  common. Over time, the issue began to even creep into consumer
> 	  level x86 laptops: mainstream distributions, like Debian
> 	  Buster, began to recommend installing haveged as a workaround,
> 	  just to let the system boot.
> 
> 	  Filesystem optimizations in EXT4 and XFS exagerated the
> 	  problem, due to aggressive batching of IO requests, and thus
> 	  minimizing sources of entropy at boot. This led to large
> 	  delays until the kernel's Cryptographic Random Number
> 	  Generator (CRNG) got initialized, and thus having reports of
> 	  getrandom(2) inidifinitely stuck at boot.
> 
> 	  Solve this problem by setting a conservative upper bound for
> 	  getrandom(2) wait. Kill the process, instead of returning an
> 	  error code, because otherwise crypto-sensitive applications
> 	  may revert to less secure mechanisms (e.g. /dev/urandom). We
> 	  __deeply encourage__ system integrators and distribution
> 	  builders not to considerably increase this value: during
> 	  system boot, you either have entropy, or you don't. And if you
> 	  didn't have entropy, it will stay like this forever, because
> 	  if you had, you wouldn't have blocked in the first place. It's
> 	  an atomic "either/or" situation, with no middle ground. Please
> 	  think twice.
> 
> 	  Ideally, systems would be configured with hardware random
> 	  number generators, and/or configured to trust the CPU-provided
> 	  RNG's (CONFIG_RANDOM_TRUST_CPU) or boot-loader provided ones
> 	  (CONFIG_RANDOM_TRUST_BOOTLOADER).  In addition, userspace
> 	  should generate cryptographic keys only as late as possible,
> 	  when they are needed, instead of during early boot.  (For
> 	  non-cryptographic use cases, such as dictionary seeds or MIT
> 	  Magic Cookies, other mechanisms such as /dev/urandom or
> 	  random(3) may be more appropropriate.)
> 
> Sounds good?
> 
> thanks,
> 
> --
> Ahmed Darwish
> http://darwish.chasingpointers.com
> 

This would fail the litmus test that started this thread, re-explained 
below.

0. Linus applies your patch.
1. A kernel release happens, and it boots fine.
2. Ted Ts'o invents yet another brilliant ext4 optimization, and it gets 
merged.
3. Somebody discovers that the new kernel kills all his processes, up to 
and including gnome-session, and that's obviously a regression.
4. Linus is forced to revert (2), nobody wins.

-- 
Alexander E. Patrakov


Download attachment "smime.p7s" of type "application/pkcs7-signature" (4052 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ