lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 Mar 2023 00:11:49 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Guenter Roeck <linux@...ck-us.net>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 6.3-rc2

On Mon, Mar 13, 2023 at 11:21:44AM -0700, Linus Torvalds wrote:
> On Mon, Mar 13, 2023 at 8:53 AM Guenter Roeck <linux@...ck-us.net> wrote:
> >
> > Warning backtraces in calls from ct_nmi_enter(),
> > seen randomly.
> 
> Hmm.
> 
> I suspect this one is a bug in the warning, not in the kernel,
> although I have no idea why it would have started happening now.
> 
> This happens from an irq event, but that check is not *supposed* to
> happen at all from interrupts:
> 
>          * We dont accurately track softirq state in e.g.
>          * hardirq contexts (such as on 4KSTACKS), so only
>          * check if not in hardirq contexts:
> 
> but I think that the ct_nmi_enter() function was called before the
> hardirq count had even been incremented.

Indeed, ct_nmi_enter() is called very early on irq_enter(), before
HARDIRQ_OFFSET is added and the warning triggers at:

	if (!hardirq_count()) {
		if (softirq_count()) {
			/* like the above, but with softirqs */
			DEBUG_LOCKS_WARN_ON(current->softirqs_enabled); <---- HERE
		}

So the hardirq interrupted some code that has softirqs disabled (or
servicing) from the preempt mask POV but not from lockdep POV.

It says softirqs were last enabled/disabled at some random point, but the
function looks ok:

	 [   28.765386] softirqs last  enabled at (6328): [<c0103814>] vfp_sync_hwstate+0x48/0x8c
	 [   28.765575] softirqs last disabled at (6326): [<c01037cc>] vfp_sync_hwstate+0x0/0x8c

It would be interesting to see what the IRQ is interrupting. For example does it
happen while softirqs are serviced or just disabled? Or are we even outside any
of that? Any chance we can have a deeper stack trace? If not at least a print of
preempt_count() would be helpful.

Both would be awesome.

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 50d4863974e7..a7d1a65e5425 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5523,6 +5523,7 @@ static noinstr void check_flags(unsigned long flags)
 	 */
 	if (!hardirq_count()) {
 		if (softirq_count()) {
+			printk("preempt_count(): %x", preempt_count());
 			/* like the above, but with softirqs */
 			DEBUG_LOCKS_WARN_ON(current->softirqs_enabled);
 		} else {


Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ