[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1c7aeddb-da26-f7c0-0e7b-620d2eb089b9@seco.com>
Date: Thu, 23 Mar 2023 11:58:00 -0400
From: Sean Anderson <sean.anderson@...o.com>
To: Vladimir Oltean <vladimir.oltean@....com>, netdev@...r.kernel.org
Cc: Madalin Bucur <madalin.bucur@....com>,
Camelia Groza <camelia.groza@....com>
Subject: Re: Invalid wait context in qman_update_cgr()
On 3/23/23 11:39, Vladimir Oltean wrote:
> Hi,
>
> Since commit 914f8b228ede ("soc: fsl: qbman: Add CGR update function"),
> I have started seeing the following stack trace on the NXP T1040RDB
> board:
>
> [ 10.215392] =============================
> [ 10.219403] [ BUG: Invalid wait context ]
> [ 10.223413] 6.2.0-rc8-07010-ga9b9500ffaac-dirty #18 Not tainted
> [ 10.229338] -----------------------------
> [ 10.233347] swapper/0/0 is trying to lock:
> [ 10.237442] c0000000ff1cda20 (&portal->cgr_lock){+.+.}-{3:3}, at: .qman_update_cgr+0x40/0xb0
> [ 10.254270] other info that might help us debug this:
> [ 10.259320] context-{2:2}
> [ 10.259323] no locks held by swapper/0/0.
> [ 10.259327] stack backtrace:
> [ 10.259329] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc8-07010-ga9b9500ffaac-dirty #18
> [ 10.259336] Hardware name: fsl,T1040RDB e5500 0x80241021 CoreNet Generic
> [ 10.259341] Call Trace:
> [ 10.259344] [c000000002163280] [c0000000015263d0] .dump_stack_lvl+0x8c/0xd0
> [ 10.273180] (unreliable)
> [ 10.288587] [c000000002163300] [c0000000000e1714] .__lock_acquire+0x24c4/0x2500
> [ 10.288598] [c000000002163450] [c0000000000e24cc] .lock_acquire+0x13c/0x410
> [ 10.288608] [c000000002163560] [c00000000156983c] ._raw_spin_lock_irqsave+0x6c/0x120
> [ 10.297764] [c0000000021635f0] [c000000000938990] .qman_update_cgr+0x40/0xb0
> [ 10.312820] [c000000002163680] [c000000000938a20] .qman_update_cgr_smp_call+0x20/0x40
> [ 10.312830] [c000000002163700] [c0000000001609c8] .__flush_smp_call_function_queue+0x118/0x3f0
> [ 10.324241] [c0000000021637a0] [c000000000023f04] .smp_ipi_demux_relaxed+0xb4/0xc0
> [ 10.324258] [c000000002163830] [c000000000020bf4] .doorbell_exception+0x114/0x410
> [ 10.338529] [c0000000021638d0] [c00000000001dde4] exc_0x280_common+0x110/0x114
> [ 10.338540] --- interrupt: 280 at .e500_idle+0x30/0x6c
> [ 10.338547] NIP: c00000000001f104 LR: c00000000001f104 CTR: c00000000001f0d4
> [ 10.338552] REGS: c000000002163940 TRAP: 0280 Not tainted (6.2.0-rc8-07010-ga9b9500ffaac-dirty)
> [ 10.338557] MSR: 0000000080029002
> [ 10.355087] <CE,EE,ME> CR: 24042284 XER: 00000000
> [ 10.355102] IRQMASK: 0
> [ 10.355102] GPR00: 0000000000000000 c000000002163be0 c000000001c0a000 c00000000001f0f4
> [ 10.355102] GPR04: ffffffffffffffff
> [ 10.363650] c000000002187f50 0000000000000000 00000000fd236000
> [ 10.363650] GPR08: 0000000000000001 0000000000000001 0000000000000001 c000000002138f80
> [ 10.363650] GPR12:
> [ 10.373940] 0000000024042282 c000000002cf4000 000000007ff9382c 000000007fb2d3d0
> [ 10.373940] GPR16: 000000007ff9381c 0000000000000000 0000000008d77cf3 000000007ff190dc
> [ 10.373940] GPR20: 0000000000000001 000000007fb2d460 0000000000000000 000000007ffb5338
> [ 10.373940] GPR24: 000000007fb2d3d4 0000000000000003 0000000000080000 c000000002187ff8
> [ 10.373940] GPR28: 0000000000000001 c000000002187f50 0000000000000001 c000000002138f80
> [ 10.558780] NIP [c00000000001f104] .e500_idle+0x30/0x6c
> [ 10.564012] LR [c00000000001f104] .e500_idle+0x30/0x6c
> [ 10.569156] --- interrupt: 280
> [ 10.572210] [c000000002163be0] [c000000000008a54] .arch_cpu_idle+0x34/0xb0 (unreliable)
> [ 10.580234] [c000000002163c50] [c0000000015691d8] .default_idle_call+0x98/0xf8
> [ 10.587471] [c000000002163cc0] [c0000000000bda0c] .do_idle+0x13c/0x1e0
> [ 10.594014] [c000000002163d60] [c0000000000bde08] .cpu_startup_entry+0x28/0x30
> [ 10.601250] [c000000002163dd0] [c0000000000024f0] .rest_init+0x190/0x22c
> [ 10.607963] [c000000002163e60] [c000000001d57958] .arch_post_acpi_subsys_init+0x0/0x4
> [ 10.615809] [c000000002163ed0] [c000000001d58254] .start_kernel+0x8e4/0x934
> [ 10.622783] [c000000002163f90] [c000000000000a5c] start_here_common+0x1c/0x20
>
> Do you have any clues what is wrong?
Do you have PREEMPT_RT+PROVE_RAW_LOCK_NESTING enabled?
If so, the problem seems to be that we're in unthreaded hardirq context
(LD_WAIT_SPIN), but the lock is LD_WAIT_CONFIG. Maybe we should be
using some other smp_call function? Maybe we should be using
spin_lock (like qman_create_cgr) and not spin_lock_irqsave (like
qman_delete_cgr)?
--Sean
Powered by blists - more mailing lists