lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fce02d50-65a7-4aa7-8949-6a82321da292@roeck-us.net>
Date:   Mon, 13 Mar 2023 13:30:22 -0700
From:   Guenter Roeck <linux@...ck-us.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 6.3-rc2

On Mon, Mar 13, 2023 at 11:21:44AM -0700, Linus Torvalds wrote:
> On Mon, Mar 13, 2023 at 8:53 AM Guenter Roeck <linux@...ck-us.net> wrote:
> >
> > Warning backtraces in calls from ct_nmi_enter(),
> > seen randomly.
> 
> Hmm.
> 
> I suspect this one is a bug in the warning, not in the kernel,
> although I have no idea why it would have started happening now.
> 
> This happens from an irq event, but that check is not *supposed* to
> happen at all from interrupts:
> 
>          * We dont accurately track softirq state in e.g.
>          * hardirq contexts (such as on 4KSTACKS), so only
>          * check if not in hardirq contexts:
> 
> but I think that the ct_nmi_enter() function was called before the
> hardirq count had even been incremented.
> 
> > Sample decoded stack trace:
> 
> Hmm. That WARNING backtrace doesn't actually seem to follow the stack
> chain, so it only shows the irq stack, not where the irq happened.
> 
> > Seen if CONFIG_DEBUG_LOCK_ALLOC=y and CONFIG_CONTEXT_TRACKING_IDLE=y.
> > It seems that rcu_read_lock_sched_held() can be true when entering an interrupt.
> >
> > The problem is not seen in v6.2, but occurs randomly on ToT with various
> > arm emulations.
> 
> Strange. I must be wrong about this being a race on the warning
> itself, because that warning has been there for a long long time.
> 
> Adding in some people who might have more of a clue. I'm thinking
> Frederic and Paul might know what's up with the context tracking, but
> I don't see why this would be arm-related or have started recently.
> But I do note that PeterZ did some rcuidle tracing cleanups that do
> end up affecting arm too.
> 
> So adding PeterZ too.
> 
> Original email with full details at
> 
>    https://lore.kernel.org/lkml/d915df60-d06b-47d4-8b47-8aa1bbc2aac7@roeck-us.net/
> 
> for added peeps.
> 
> Anybody?
> 

It gets weird. Bisect log below. Reverting the identified patch does
indeed seem to fix the problem, only I have no clue why this might
be the case. The patch looks completely innocent to me. Yet, I can
reliably reproduce the problem with v6.3-rc2, but at least so far I
have not been able to reproduce it with commit f3dd0c53370 reverted
(and I am trying on five different servers in parallel).

Guenter

---
# bad: [a5c95ca18a98d742d0a4a04063c32556b5b66378] Merge tag 'drm-next-2023-02-23' of git://anongit.freedesktop.org/drm/drm
# good: [c9c3395d5e3dcc6daee66c6908354d47bf98cb0c] Linux 6.2
git bisect start 'a5c95ca18a98' 'v6.2'
# good: [36289a03bcd3aabdf66de75cb6d1b4ee15726438] Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good 36289a03bcd3aabdf66de75cb6d1b4ee15726438
# bad: [0175ec3a28c695562a08fdccf73f2ec5ed744e2f] Merge tag 'regulator-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
git bisect bad 0175ec3a28c695562a08fdccf73f2ec5ed744e2f
# good: [cb6b2e11a42decea2afc77df73ec7326db1ac25f] devlink: Fix memleak in health diagnose callback
git bisect good cb6b2e11a42decea2afc77df73ec7326db1ac25f
# good: [3365777a6a2243f1cca5a441f2c89002d16fc580] net: phy: marvell: Use the unlocked genphy_c45_ethtool_get_eee()
git bisect good 3365777a6a2243f1cca5a441f2c89002d16fc580
# good: [700ed3bbb7a0bd5eeb805a2c2ba47a6d7b286745] ASoC: SOF: core/ipc4/mtl: Add support for PCM delay
git bisect good 700ed3bbb7a0bd5eeb805a2c2ba47a6d7b286745
# good: [4d4266e3fd321fadb628ce02de641b129522c39c] page_pool: add a comment explaining the fragment counter usage
git bisect good 4d4266e3fd321fadb628ce02de641b129522c39c
# good: [76f5aaabce492aa6991c28c96bb78b00b05d06c5] ASoC: soc-ac97: Return correct error codes
git bisect good 76f5aaabce492aa6991c28c96bb78b00b05d06c5
# good: [5661706efa200252d0e9fea02421b0a5857808c3] Merge branch 'topic/apple-gmux' into for-next
git bisect good 5661706efa200252d0e9fea02421b0a5857808c3
# bad: [603ac530f13506e6ce5db4ab953ede4d292c5327] Merge tag 'regmap-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
git bisect bad 603ac530f13506e6ce5db4ab953ede4d292c5327
# good: [b60417a9f2b890a8094477b2204d4f73c535725e] selftest: fib_tests: Always cleanup before exit
git bisect good b60417a9f2b890a8094477b2204d4f73c535725e
# bad: [064d7dcf51a82b480e953a15cca47e5df0426502] Merge tag 'sound-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect bad 064d7dcf51a82b480e953a15cca47e5df0426502
# good: [5b7c4cabbb65f5c469464da6c5f614cbd7f730f2] Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect good 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2
# good: [7933b90b42896f5b6596e6a829bb31c5121fc2a9] Merge branch 'for-linus' into for-next
git bisect good 7933b90b42896f5b6596e6a829bb31c5121fc2a9
# bad: [f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc] bpf: add missing header file include
git bisect bad f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc
# first bad commit: [f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc] bpf: add missing header file include

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ