netdev - Re: Invalid wait context in qman_update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1c7aeddb-da26-f7c0-0e7b-620d2eb089b9@seco.com>
Date:   Thu, 23 Mar 2023 11:58:00 -0400
From:   Sean Anderson <sean.anderson@...o.com>
To:     Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org
Cc:     Madalin Bucur <madalin.bucur@....com>,
        Camelia Groza <camelia.groza@....com>
Subject: Re: Invalid wait context in qman_update_cgr()

On 3/23/23 11:39, Vladimir Oltean wrote:
> Hi,
> 
> Since commit 914f8b228ede ("soc: fsl: qbman: Add CGR update function"),
> I have started seeing the following stack trace on the NXP T1040RDB
> board:
> 
> [   10.215392] =============================
> [   10.219403] [ BUG: Invalid wait context ]
> [   10.223413] 6.2.0-rc8-07010-ga9b9500ffaac-dirty #18 Not tainted
> [   10.229338] -----------------------------
> [   10.233347] swapper/0/0 is trying to lock:
> [   10.237442] c0000000ff1cda20 (&portal->cgr_lock){+.+.}-{3:3}, at: .qman_update_cgr+0x40/0xb0
> [   10.254270] other info that might help us debug this:
> [   10.259320] context-{2:2}
> [   10.259323] no locks held by swapper/0/0.
> [   10.259327] stack backtrace:
> [   10.259329] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc8-07010-ga9b9500ffaac-dirty #18
> [   10.259336] Hardware name: fsl,T1040RDB e5500 0x80241021 CoreNet Generic
> [   10.259341] Call Trace:
> [   10.259344] [c000000002163280] [c0000000015263d0] .dump_stack_lvl+0x8c/0xd0
> [   10.273180]  (unreliable)
> [   10.288587] [c000000002163300] [c0000000000e1714] .__lock_acquire+0x24c4/0x2500
> [   10.288598] [c000000002163450] [c0000000000e24cc] .lock_acquire+0x13c/0x410
> [   10.288608] [c000000002163560] [c00000000156983c] ._raw_spin_lock_irqsave+0x6c/0x120
> [   10.297764] [c0000000021635f0] [c000000000938990] .qman_update_cgr+0x40/0xb0
> [   10.312820] [c000000002163680] [c000000000938a20] .qman_update_cgr_smp_call+0x20/0x40
> [   10.312830] [c000000002163700] [c0000000001609c8] .__flush_smp_call_function_queue+0x118/0x3f0
> [   10.324241] [c0000000021637a0] [c000000000023f04] .smp_ipi_demux_relaxed+0xb4/0xc0
> [   10.324258] [c000000002163830] [c000000000020bf4] .doorbell_exception+0x114/0x410
> [   10.338529] [c0000000021638d0] [c00000000001dde4] exc_0x280_common+0x110/0x114
> [   10.338540] --- interrupt: 280 at .e500_idle+0x30/0x6c
> [   10.338547] NIP:  c00000000001f104 LR: c00000000001f104 CTR: c00000000001f0d4
> [   10.338552] REGS: c000000002163940 TRAP: 0280   Not tainted  (6.2.0-rc8-07010-ga9b9500ffaac-dirty)
> [   10.338557] MSR:  0000000080029002
> [   10.355087] <CE,EE,ME>  CR: 24042284  XER: 00000000
> [   10.355102] IRQMASK: 0
> [   10.355102] GPR00: 0000000000000000 c000000002163be0 c000000001c0a000 c00000000001f0f4
> [   10.355102] GPR04: ffffffffffffffff
> [   10.363650] c000000002187f50 0000000000000000 00000000fd236000
> [   10.363650] GPR08: 0000000000000001 0000000000000001 0000000000000001 c000000002138f80
> [   10.363650] GPR12:
> [   10.373940] 0000000024042282 c000000002cf4000 000000007ff9382c 000000007fb2d3d0
> [   10.373940] GPR16: 000000007ff9381c 0000000000000000 0000000008d77cf3 000000007ff190dc
> [   10.373940] GPR20: 0000000000000001 000000007fb2d460 0000000000000000 000000007ffb5338
> [   10.373940] GPR24: 000000007fb2d3d4 0000000000000003 0000000000080000 c000000002187ff8
> [   10.373940] GPR28: 0000000000000001 c000000002187f50 0000000000000001 c000000002138f80
> [   10.558780] NIP [c00000000001f104] .e500_idle+0x30/0x6c
> [   10.564012] LR [c00000000001f104] .e500_idle+0x30/0x6c
> [   10.569156] --- interrupt: 280
> [   10.572210] [c000000002163be0] [c000000000008a54] .arch_cpu_idle+0x34/0xb0 (unreliable)
> [   10.580234] [c000000002163c50] [c0000000015691d8] .default_idle_call+0x98/0xf8
> [   10.587471] [c000000002163cc0] [c0000000000bda0c] .do_idle+0x13c/0x1e0
> [   10.594014] [c000000002163d60] [c0000000000bde08] .cpu_startup_entry+0x28/0x30
> [   10.601250] [c000000002163dd0] [c0000000000024f0] .rest_init+0x190/0x22c
> [   10.607963] [c000000002163e60] [c000000001d57958] .arch_post_acpi_subsys_init+0x0/0x4
> [   10.615809] [c000000002163ed0] [c000000001d58254] .start_kernel+0x8e4/0x934
> [   10.622783] [c000000002163f90] [c000000000000a5c] start_here_common+0x1c/0x20
> 
> Do you have any clues what is wrong?

Do you have PREEMPT_RT+PROVE_RAW_LOCK_NESTING enabled?

If so, the problem seems to be that we're in unthreaded hardirq context
(LD_WAIT_SPIN), but the lock is LD_WAIT_CONFIG. Maybe we should be
using some other smp_call function? Maybe we should be using
spin_lock (like qman_create_cgr) and not spin_lock_irqsave (like
qman_delete_cgr)?

--Sean