[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0c086cd-544c-e1e9-8a76-4f56c9cb85a1@seco.com>
Date: Thu, 23 Mar 2023 17:41:04 -0400
From: Sean Anderson <sean.anderson@...o.com>
To: Vladimir Oltean <vladimir.oltean@....com>
Cc: netdev@...r.kernel.org, Madalin Bucur <madalin.bucur@....com>,
Camelia Groza <camelia.groza@....com>
Subject: Re: Invalid wait context in qman_update_cgr()
On 3/23/23 14:47, Vladimir Oltean wrote:
> On Thu, Mar 23, 2023 at 11:58:00AM -0400, Sean Anderson wrote:
>> > Do you have any clues what is wrong?
>>
>> Do you have PREEMPT_RT+PROVE_RAW_LOCK_NESTING enabled?
>
> No, just CONFIG_PROVE_RAW_LOCK_NESTING.
>
>> If so, the problem seems to be that we're in unthreaded hardirq context
>> (LD_WAIT_SPIN), but the lock is LD_WAIT_CONFIG. Maybe we should be
>> using some other smp_call function? Maybe we should be using
>> spin_lock (like qman_create_cgr) and not spin_lock_irqsave (like
>> qman_delete_cgr)?
>
> Plain spin_lock() has the same wait context as spin_lock_irqsave(),
> and so, by itself, would not help. Maybe you mean raw_spin_lock() which
> always has a wait context compatible with LD_WAIT_SPIN here.
>
> Note - I'm not suggesting that replacing with a raw spinlock is the
> correct solution here.
Well, it's either this or switch to another function like
smp_call_function which calls its callback in softirq/threaded hardirq
context.
> FWIW, a straight conversion from spinlocks to raw spinlocks produces
> this other stack trace. It would be good if you could take a look too.
> The lockdep usage tracker is clean prior to commit 914f8b228ede ("soc:
> fsl: qbman: Add CGR update function").
Presumably you mean ef2a8d5478b9 ("net: dpaa: Adjust queue depth on rate
change"), which is the first commit to introduce a user for
qman_update_cgr_safe?
> [ 56.650501] ================================
> [ 56.654782] WARNING: inconsistent lock state
> [ 56.659063] 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028 Not tainted
> [ 56.665170] --------------------------------
> [ 56.669449] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [ 56.675467] swapper/2/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [ 56.680625] ffff1dc165e124e0 (&portal->cgr_lock){?.+.}-{2:2}, at: qman_update_cgr+0x60/0xfc
> [ 56.689054] {HARDIRQ-ON-W} state was registered at:
> [ 56.693943] lock_acquire+0x1e4/0x2fc
> [ 56.697720] _raw_spin_lock+0x5c/0xc0
I think we just need to use raw_spin_lock_irqsave in qman_create_cgr.
> [ 56.701494] qman_create_cgr+0xbc/0x2b4
> [ 56.705440] dpaa_eth_cgr_init+0xc0/0x160
> [ 56.709560] dpaa_eth_probe+0x6a8/0xf44
> [ 56.713506] platform_probe+0x68/0xdc
> [ 56.717282] really_probe+0x148/0x2ac
> [ 56.721053] __driver_probe_device+0x78/0xe0
> [ 56.725432] driver_probe_device+0xd8/0x160
> [ 56.729724] __driver_attach+0x9c/0x1ac
> [ 56.733668] bus_for_each_dev+0x74/0xd4
> [ 56.737612] driver_attach+0x24/0x30
> [ 56.741294] bus_add_driver+0xe4/0x1e8
> [ 56.745151] driver_register+0x60/0x128
> [ 56.749096] __platform_driver_register+0x28/0x34
> [ 56.753911] dpaa_load+0x34/0x74
> [ 56.757250] do_one_initcall+0x74/0x2f0
> [ 56.761192] kernel_init_freeable+0x2ac/0x510
> [ 56.765660] kernel_init+0x24/0x1dc
> [ 56.769261] ret_from_fork+0x10/0x20
> [ 56.772943] irq event stamp: 274366
> [ 56.776441] hardirqs last enabled at (274365): [<ffffdc95dfdae554>] cpuidle_enter_state+0x158/0x540
> [ 56.785601] hardirqs last disabled at (274366): [<ffffdc95dfdac1b0>] el1_interrupt+0x24/0x64
> [ 56.794063] softirqs last enabled at (274330): [<ffffdc95de6104d8>] __do_softirq+0x438/0x4ec
> [ 56.802609] softirqs last disabled at (274323): [<ffffdc95de616610>] ____do_softirq+0x10/0x1c
> [ 56.811156]
> [ 56.811156] other info that might help us debug this:
> [ 56.817692] Possible unsafe locking scenario:
> [ 56.817692]
> [ 56.823620] CPU0
> [ 56.826075] ----
> [ 56.828530] lock(&portal->cgr_lock);
> [ 56.832306] <Interrupt>
> [ 56.834934] lock(&portal->cgr_lock);
> [ 56.838883]
> [ 56.838883] *** DEADLOCK ***
> [ 56.838883]
> [ 56.844811] no locks held by swapper/2/0.
> [ 56.848832]
> [ 56.848832] stack backtrace:
> [ 56.853199] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028
> [ 56.861917] Hardware name: LS1043A RDB Board (DT)
> [ 56.866634] Call trace:
> [ 56.869090] dump_backtrace+0x9c/0xf8
> [ 56.872772] show_stack+0x18/0x24
> [ 56.876104] dump_stack_lvl+0x60/0xac
> [ 56.879788] dump_stack+0x18/0x24
> [ 56.883123] print_usage_bug.part.0+0x290/0x348
> [ 56.887678] mark_lock+0x77c/0x960
> [ 56.891102] __lock_acquire+0xa54/0x1f90
> [ 56.895046] lock_acquire+0x1e4/0x2fc
> [ 56.898731] _raw_spin_lock_irqsave+0x6c/0xdc
> [ 56.903112] qman_update_cgr+0x60/0xfc
> [ 56.906885] qman_update_cgr_smp_call+0x1c/0x30
> [ 56.911440] __flush_smp_call_function_queue+0x15c/0x2f4
> [ 56.916775] generic_smp_call_function_single_interrupt+0x14/0x20
> [ 56.922891] ipi_handler+0xb4/0x304
> [ 56.926404] handle_percpu_devid_irq+0x8c/0x144
> [ 56.930959] generic_handle_domain_irq+0x2c/0x44
> [ 56.935596] gic_handle_irq+0x44/0xc4
> [ 56.939281] call_on_irq_stack+0x24/0x4c
> [ 56.943225] do_interrupt_handler+0x80/0x84
> [ 56.947431] el1_interrupt+0x34/0x64
> [ 56.951030] el1h_64_irq_handler+0x18/0x24
> [ 56.955151] el1h_64_irq+0x64/0x68
> [ 56.958570] cpuidle_enter_state+0x15c/0x540
> [ 56.962865] cpuidle_enter+0x38/0x50
> [ 56.966467] do_idle+0x218/0x2a0
> [ 56.969714] cpu_startup_entry+0x28/0x2c
> [ 56.973654] secondary_start_kernel+0x138/0x15c
> [ 56.978209] __secondary_switched+0xb8/0xbc
--Sean
Powered by blists - more mailing lists