lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230615074844.GC10301@redhat.com>
Date:   Thu, 15 Jun 2023 08:48:44 +0100
From:   "Richard W.M. Jones" <rjones@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Aaron Thompson <dev@...ont.org>, linux-kernel@...r.kernel.org,
        Alexandre Belloni <alexandre.belloni@...tlin.com>
Subject: Re: printk.time causes rare kernel boot hangs

On Thu, Jun 15, 2023 at 09:40:40AM +0200, Alexandre Belloni wrote:
> Hello,
> 
> On 14/06/2023 18:34:30+0100, Richard W.M. Jones wrote:
> > 
> > FWIW attached is a test program that runs the qemu instances in
> > parallel (up to 8 threads), which seems to be a quicker way to hit the
> > problem for me.  Even on Intel, with this test I can hit the bug in a
> > few hundred iteration.
> > 
> 
> I'm just chiming in to say that we do hit the same issue on the Yocto
> Project CI. We are using qemu 8.0.0 on Intel hardware and a 6.1 kernel.
> 
> I see that f31dcb152a3d0816e2f1deab4e64572336da197d hasn't been
> backported so it may not be the culprit. However, this seems to have
> started happening when we switched from 5.15 to 6.1.

I don't know if it's related or not, or even valid, but it was pointed
out to me[1] that you can get the exact same failure this way:

  - Linux git @ b6dad5178ceaf23f369c3711062ce1f2afc33644
  - Revert f31dcb152a3d0816e2f1deab4e64572336da197d
  - Add the following patch:

diff --git a/init/main.c b/init/main.c
index af50044deed5..c2774865a83f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1552,6 +1552,7 @@ static noinline void __init kernel_init_freeable(void)
 
 	cad_pid = get_pid(task_pid(current));
 
+	msleep(1);
 	smp_prepare_cpus(setup_max_cpus);
 
 	workqueue_init();

So is sleeping in kernel_init_freeable valid?  It seems as if it
wouldn't be an atomic context.  And is the fact that the failure looks
precisely the same coincidence?

Rich.

[1] https://news.ycombinator.com/item?id=36336059

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ