lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 2 Feb 2024 06:56:28 -1000
From: Tejun Heo <tj@...nel.org>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Petr Mladek <pmladek@...e.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	John Ogness <john.ogness@...utronix.de>,
	Sergey Senozhatsky <senozhatsky@...omium.org>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	linux-kernel@...r.kernel.org, rcu@...r.kernel.org
Subject: Re: [BUG] workqueues and printk not playing nice since next-20240130

Hello,

On Fri, Feb 02, 2024 at 08:35:51AM -0800, Paul E. McKenney wrote:
> Good point, and if this sort of thing happens frequently, perhaps there
> should be an easy way of doing this.  One crude hack that might come
> pretty close would be to redefine the barrier() macro to be smp_mb().
> 
> But as noted earlier, -ENOREPRODUCE on today's -next.  I will try the
> next several -next releases.  But if they all get -ENOREPRODUCE, I owe
> everyone on CC an apology for having sent this report out before trying
> next-20240202.  :-/

I think I saw that problem too but could reproduce it with or without the
workqueue changes, so I did the lazy thing "oh well, somebody is gonna fix
that" and just tested as-is. It's a bit worrying that ppl don't seem to
already know what the culprit is. Hmm... I can't reproduce it anymore
either.

So, there is some chance that this may really be a subtle breakage. If you
ever see it happening again, triggering sysrq-t and capturing the dmesg
output (network should still work fine, so these shouldn't be too difficult)
may help. sysrq-t has workqueue state dump at the end which should clearly
indicate if anything is stalled in workqueue.

That said, another data point. In my test setup, I use the earlyprintk boot
option which enables console output way before than workqueue becomes
operational, so having on console output at all is highly unlikely to be
indicative of workqueue problem. My memory is hazy but it seems like I can
no longer reproduce the problem on the same git commit. Maybe it was a
problem on the qemu side?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ