lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zy4WKKq18GunXa6S@localhost.localdomain>
Date: Fri, 8 Nov 2024 14:46:16 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Mingcong Bai <jeffbai@...c.io>
Cc: Thorsten Leemhuis <regressions@...mhuis.info>,
	Linux regressions mailing list <regressions@...ts.linux.dev>,
	LKML <linux-kernel@...r.kernel.org>,
	"Paul E. McKenney" <paulmck@...nel.org>, rcu <rcu@...r.kernel.org>,
	sakiiily@...c.io, Kexy Biscuit <kexybiscuit@...c.io>
Subject: Re: [Regression] wifi problems since tg3 started throwing rcu stall
 warnings

Le Fri, Nov 08, 2024 at 12:29:40AM +0800, Mingcong Bai a écrit :
> Hi Frederic,
> 
> <snip>
> 
> > Sorry for the lag, I still don't understand how this specific commit
> > can produce this issue. Can you please retry with and without this
> > commit
> > reverted?
> 
> Just tested v6.12-rc6 with and without the revert. Without the revert, the
> touchpad and the wireless adapter both stopped working, whereas with the
> revert, both devices functions as normal.
> 
> I have attached the dmesg for both kernels below. Unlike the log we got last
> time, there is no direct reference to tg3 any more, but the NMI backtrace
> still pointed to NetworkManager and net/netlink-related functions (perhaps a
> debug kernel would be more helpful?). Here's a snippet:
> 
> [   10.337720] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks: { P683 } 21 jiffies s: 781 root: 0x0/T
> [   10.339168] rcu: blocking rcu_node structures (internal RCU debug):
> [   10.591480] loop0: detected capacity change from 0 to 8
> [   11.777733] rcu: INFO: rcu_preempt detected expedited stalls on
> CPUs/tasks: { 3-.... } 21 jiffies s: 1077 root: 0x8/.
> [   11.779210] rcu: blocking rcu_node structures (internal RCU debug):
> [   11.780630] Sending NMI from CPU 1 to CPUs 3:
> [   11.780659] NMI backtrace for cpu 3
> [   11.780663] CPU: 3 UID: 0 PID: 1027 Comm: NetworkManager Not tainted
> 6.12.0-aosc-main #1

Funny, this happens on bootup and no CPU has ever gone offline, so the path
modified by this patch shouldn't have been taken. And yet this commit has
an influence to the point of reliably triggering that stall.

I'm running off of ideas, Paul any clue?

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ