linux-kernel - Re: [PATCH v4] random: fix crash on multiple early calls to add_bootloader

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b2a5b4a7-4aee-558c-5558-549fd2998165@zx2c4.com>
Date:   Fri, 3 Dec 2021 17:47:41 +0100
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Dominik Brodowski <linux@...inikbrodowski.net>
Cc:     Theodore Ts'o <tytso@....edu>, "Ivan T. Ivanov" <iivanov@...e.de>,
        Ard Biesheuvel <ardb@...nel.org>, linux-efi@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, hsinyi@...omium.org
Subject: Re: [PATCH v4] random: fix crash on multiple early calls to
 add_bootloader_randomness()

On 12/3/21 16:39, Jason A. Donenfeld wrote:
> Hi Dominik,
> 
> Thanks for your analysis. Some more questions:
> 
> On Fri, Dec 3, 2021 at 8:59 AM Dominik Brodowski
> <linux@...inikbrodowski.net> wrote:
>> On subsequent calls to add_bootloader_randomness() and then to
>> add_hwgenerator_randomness(), crng_fast_load() will be skipped. Instead,
>> wait_event_interruptible() (which makes no sense for the init process)
>> and then credit_entropy_bits() will be called. If the entropy count for
>> that second seed is large enough, that proceeds to crng_reseed().
>> However, crng_reseed() may depend on workqueues being available, which
>> is not the case early during boot.
> 
> It sounds like *the* issue you've identified is that crng_reseed()
> calls into workqueue functions too early in init, right? The bug is
> about paths into crng_reseed() that might cause that?
> 
> If so, then specifically, are you referring to crng_reseed()'s call to
> numa_crng_init()? In other words, the cause of the bug would be
> 6c1e851c4edc ("random: fix possible sleeping allocation from irq
> context")? If that's the case, then I wonder if the problem you're
> seeing goes away if you revert both 6c1e851c4edc ("random: fix
> possible sleeping allocation from irq context") and its primary
> predecessor, 8ef35c866f88 ("random: set up the NUMA crng instances
> after the CRNG is fully initialized"). These fix an actual bug, so I'm
> not suggesting we actually revert these in the tree, but for the
> purpose of testing, I'm wondering if this is actually the root cause
> of the bug you're seeing.

If the above holds, I wonder if something more basic like the below 
would do the trick -- deferring the bringup of the secondary pools until 
later on in rand_initialize.

diff --git a/drivers/char/random.c b/drivers/char/random.c
index c81485e2f126..e71b34bf9a2a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -832,7 +832,6 @@ static void __init crng_initialize_primary(struct 
crng_state *crng)
  	_extract_entropy(&input_pool, &crng->state[4], sizeof(__u32) * 12, 0);
  	if (crng_init_try_arch_early(crng) && trust_cpu) {
  		invalidate_batched_entropy();
-		numa_crng_init();
  		crng_init = 2;
  		pr_notice("crng done (trusting CPU's manufacturer)\n");
  	}
@@ -840,13 +839,13 @@ static void __init crng_initialize_primary(struct 
crng_state *crng)
  }

  #ifdef CONFIG_NUMA
-static void do_numa_crng_init(struct work_struct *work)
+static void numa_crng_init(void)
  {
  	int i;
  	struct crng_state *crng;
  	struct crng_state **pool;

-	pool = kcalloc(nr_node_ids, sizeof(*pool), GFP_KERNEL|__GFP_NOFAIL);
+	pool = kcalloc(nr_node_ids, sizeof(*pool), GFP_KERNEL | __GFP_NOFAIL);
  	for_each_online_node(i) {
  		crng = kmalloc_node(sizeof(struct crng_state),
  				    GFP_KERNEL | __GFP_NOFAIL, i);
@@ -861,13 +860,6 @@ static void do_numa_crng_init(struct work_struct *work)
  		kfree(pool);
  	}
  }
-
-static DECLARE_WORK(numa_crng_init_work, do_numa_crng_init);
-
-static void numa_crng_init(void)
-{
-	schedule_work(&numa_crng_init_work);
-}
  #else
  static void numa_crng_init(void) {}
  #endif
@@ -977,7 +969,6 @@ static void crng_reseed(struct crng_state *crng, 
struct entropy_store *r)
  	spin_unlock_irqrestore(&crng->lock, flags);
  	if (crng == &primary_crng && crng_init < 2) {
  		invalidate_batched_entropy();
-		numa_crng_init();
  		crng_init = 2;
  		process_random_ready_list();
  		wake_up_interruptible(&crng_init_wait);
@@ -1787,6 +1778,7 @@ int __init rand_initialize(void)
  {
  	init_std_data(&input_pool);
  	crng_initialize_primary(&primary_crng);
+	numa_crng_init();
  	crng_global_init_time = jiffies;
  	if (ratelimit_disable) {
  		urandom_warning.interval = 0;