[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230323184701.4awirfstyx2xllnz@skbuf>
Date: Thu, 23 Mar 2023 20:47:01 +0200
From: Vladimir Oltean <vladimir.oltean@....com>
To: Sean Anderson <sean.anderson@...o.com>
Cc: netdev@...r.kernel.org, Madalin Bucur <madalin.bucur@....com>,
Camelia Groza <camelia.groza@....com>
Subject: Re: Invalid wait context in qman_update_cgr()
On Thu, Mar 23, 2023 at 11:58:00AM -0400, Sean Anderson wrote:
> > Do you have any clues what is wrong?
>
> Do you have PREEMPT_RT+PROVE_RAW_LOCK_NESTING enabled?
No, just CONFIG_PROVE_RAW_LOCK_NESTING.
> If so, the problem seems to be that we're in unthreaded hardirq context
> (LD_WAIT_SPIN), but the lock is LD_WAIT_CONFIG. Maybe we should be
> using some other smp_call function? Maybe we should be using
> spin_lock (like qman_create_cgr) and not spin_lock_irqsave (like
> qman_delete_cgr)?
Plain spin_lock() has the same wait context as spin_lock_irqsave(),
and so, by itself, would not help. Maybe you mean raw_spin_lock() which
always has a wait context compatible with LD_WAIT_SPIN here.
Note - I'm not suggesting that replacing with a raw spinlock is the
correct solution here.
FWIW, a straight conversion from spinlocks to raw spinlocks produces
this other stack trace. It would be good if you could take a look too.
The lockdep usage tracker is clean prior to commit 914f8b228ede ("soc:
fsl: qbman: Add CGR update function").
[ 56.650501] ================================
[ 56.654782] WARNING: inconsistent lock state
[ 56.659063] 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028 Not tainted
[ 56.665170] --------------------------------
[ 56.669449] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[ 56.675467] swapper/2/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[ 56.680625] ffff1dc165e124e0 (&portal->cgr_lock){?.+.}-{2:2}, at: qman_update_cgr+0x60/0xfc
[ 56.689054] {HARDIRQ-ON-W} state was registered at:
[ 56.693943] lock_acquire+0x1e4/0x2fc
[ 56.697720] _raw_spin_lock+0x5c/0xc0
[ 56.701494] qman_create_cgr+0xbc/0x2b4
[ 56.705440] dpaa_eth_cgr_init+0xc0/0x160
[ 56.709560] dpaa_eth_probe+0x6a8/0xf44
[ 56.713506] platform_probe+0x68/0xdc
[ 56.717282] really_probe+0x148/0x2ac
[ 56.721053] __driver_probe_device+0x78/0xe0
[ 56.725432] driver_probe_device+0xd8/0x160
[ 56.729724] __driver_attach+0x9c/0x1ac
[ 56.733668] bus_for_each_dev+0x74/0xd4
[ 56.737612] driver_attach+0x24/0x30
[ 56.741294] bus_add_driver+0xe4/0x1e8
[ 56.745151] driver_register+0x60/0x128
[ 56.749096] __platform_driver_register+0x28/0x34
[ 56.753911] dpaa_load+0x34/0x74
[ 56.757250] do_one_initcall+0x74/0x2f0
[ 56.761192] kernel_init_freeable+0x2ac/0x510
[ 56.765660] kernel_init+0x24/0x1dc
[ 56.769261] ret_from_fork+0x10/0x20
[ 56.772943] irq event stamp: 274366
[ 56.776441] hardirqs last enabled at (274365): [<ffffdc95dfdae554>] cpuidle_enter_state+0x158/0x540
[ 56.785601] hardirqs last disabled at (274366): [<ffffdc95dfdac1b0>] el1_interrupt+0x24/0x64
[ 56.794063] softirqs last enabled at (274330): [<ffffdc95de6104d8>] __do_softirq+0x438/0x4ec
[ 56.802609] softirqs last disabled at (274323): [<ffffdc95de616610>] ____do_softirq+0x10/0x1c
[ 56.811156]
[ 56.811156] other info that might help us debug this:
[ 56.817692] Possible unsafe locking scenario:
[ 56.817692]
[ 56.823620] CPU0
[ 56.826075] ----
[ 56.828530] lock(&portal->cgr_lock);
[ 56.832306] <Interrupt>
[ 56.834934] lock(&portal->cgr_lock);
[ 56.838883]
[ 56.838883] *** DEADLOCK ***
[ 56.838883]
[ 56.844811] no locks held by swapper/2/0.
[ 56.848832]
[ 56.848832] stack backtrace:
[ 56.853199] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028
[ 56.861917] Hardware name: LS1043A RDB Board (DT)
[ 56.866634] Call trace:
[ 56.869090] dump_backtrace+0x9c/0xf8
[ 56.872772] show_stack+0x18/0x24
[ 56.876104] dump_stack_lvl+0x60/0xac
[ 56.879788] dump_stack+0x18/0x24
[ 56.883123] print_usage_bug.part.0+0x290/0x348
[ 56.887678] mark_lock+0x77c/0x960
[ 56.891102] __lock_acquire+0xa54/0x1f90
[ 56.895046] lock_acquire+0x1e4/0x2fc
[ 56.898731] _raw_spin_lock_irqsave+0x6c/0xdc
[ 56.903112] qman_update_cgr+0x60/0xfc
[ 56.906885] qman_update_cgr_smp_call+0x1c/0x30
[ 56.911440] __flush_smp_call_function_queue+0x15c/0x2f4
[ 56.916775] generic_smp_call_function_single_interrupt+0x14/0x20
[ 56.922891] ipi_handler+0xb4/0x304
[ 56.926404] handle_percpu_devid_irq+0x8c/0x144
[ 56.930959] generic_handle_domain_irq+0x2c/0x44
[ 56.935596] gic_handle_irq+0x44/0xc4
[ 56.939281] call_on_irq_stack+0x24/0x4c
[ 56.943225] do_interrupt_handler+0x80/0x84
[ 56.947431] el1_interrupt+0x34/0x64
[ 56.951030] el1h_64_irq_handler+0x18/0x24
[ 56.955151] el1h_64_irq+0x64/0x68
[ 56.958570] cpuidle_enter_state+0x15c/0x540
[ 56.962865] cpuidle_enter+0x38/0x50
[ 56.966467] do_idle+0x218/0x2a0
[ 56.969714] cpu_startup_entry+0x28/0x2c
[ 56.973654] secondary_start_kernel+0x138/0x15c
[ 56.978209] __secondary_switched+0xb8/0xbc
Powered by blists - more mailing lists