netdev - Re: Invalid wait context in qman_update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0c086cd-544c-e1e9-8a76-4f56c9cb85a1@seco.com>
Date:   Thu, 23 Mar 2023 17:41:04 -0400
From:   Sean Anderson <sean.anderson@...o.com>
To:     Vladimir Oltean <vladimir.oltean@....com>
Cc:     netdev@...r.kernel.org, Madalin Bucur <madalin.bucur@....com>,
        Camelia Groza <camelia.groza@....com>
Subject: Re: Invalid wait context in qman_update_cgr()

On 3/23/23 14:47, Vladimir Oltean wrote:
> On Thu, Mar 23, 2023 at 11:58:00AM -0400, Sean Anderson wrote:
>> > Do you have any clues what is wrong?
>>
>> Do you have PREEMPT_RT+PROVE_RAW_LOCK_NESTING enabled?
> 
> No, just CONFIG_PROVE_RAW_LOCK_NESTING.
>
>> If so, the problem seems to be that we're in unthreaded hardirq context
>> (LD_WAIT_SPIN), but the lock is LD_WAIT_CONFIG. Maybe we should be
>> using some other smp_call function? Maybe we should be using
>> spin_lock (like qman_create_cgr) and not spin_lock_irqsave (like
>> qman_delete_cgr)?
> 
> Plain spin_lock() has the same wait context as spin_lock_irqsave(),
> and so, by itself, would not help. Maybe you mean raw_spin_lock() which
> always has a wait context compatible with LD_WAIT_SPIN here.
> 
> Note - I'm not suggesting that replacing with a raw spinlock is the
> correct solution here.

Well, it's either this or switch to another function like
smp_call_function which calls its callback in softirq/threaded hardirq
context.

> FWIW, a straight conversion from spinlocks to raw spinlocks produces
> this other stack trace. It would be good if you could take a look too.
> The lockdep usage tracker is clean prior to commit 914f8b228ede ("soc:
> fsl: qbman: Add CGR update function").

Presumably you mean ef2a8d5478b9 ("net: dpaa: Adjust queue depth on rate
change"), which is the first commit to introduce a user for
qman_update_cgr_safe?

> [   56.650501] ================================
> [   56.654782] WARNING: inconsistent lock state
> [   56.659063] 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028 Not tainted
> [   56.665170] --------------------------------
> [   56.669449] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> [   56.675467] swapper/2/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> [   56.680625] ffff1dc165e124e0 (&portal->cgr_lock){?.+.}-{2:2}, at: qman_update_cgr+0x60/0xfc
> [   56.689054] {HARDIRQ-ON-W} state was registered at:
> [   56.693943]   lock_acquire+0x1e4/0x2fc
> [   56.697720]   _raw_spin_lock+0x5c/0xc0

I think we just need to use raw_spin_lock_irqsave in qman_create_cgr.

> [   56.701494]   qman_create_cgr+0xbc/0x2b4
> [   56.705440]   dpaa_eth_cgr_init+0xc0/0x160
> [   56.709560]   dpaa_eth_probe+0x6a8/0xf44
> [   56.713506]   platform_probe+0x68/0xdc
> [   56.717282]   really_probe+0x148/0x2ac
> [   56.721053]   __driver_probe_device+0x78/0xe0
> [   56.725432]   driver_probe_device+0xd8/0x160
> [   56.729724]   __driver_attach+0x9c/0x1ac
> [   56.733668]   bus_for_each_dev+0x74/0xd4
> [   56.737612]   driver_attach+0x24/0x30
> [   56.741294]   bus_add_driver+0xe4/0x1e8
> [   56.745151]   driver_register+0x60/0x128
> [   56.749096]   __platform_driver_register+0x28/0x34
> [   56.753911]   dpaa_load+0x34/0x74
> [   56.757250]   do_one_initcall+0x74/0x2f0
> [   56.761192]   kernel_init_freeable+0x2ac/0x510
> [   56.765660]   kernel_init+0x24/0x1dc
> [   56.769261]   ret_from_fork+0x10/0x20
> [   56.772943] irq event stamp: 274366
> [   56.776441] hardirqs last  enabled at (274365): [<ffffdc95dfdae554>] cpuidle_enter_state+0x158/0x540
> [   56.785601] hardirqs last disabled at (274366): [<ffffdc95dfdac1b0>] el1_interrupt+0x24/0x64
> [   56.794063] softirqs last  enabled at (274330): [<ffffdc95de6104d8>] __do_softirq+0x438/0x4ec
> [   56.802609] softirqs last disabled at (274323): [<ffffdc95de616610>] ____do_softirq+0x10/0x1c
> [   56.811156]
> [   56.811156] other info that might help us debug this:
> [   56.817692]  Possible unsafe locking scenario:
> [   56.817692]
> [   56.823620]        CPU0
> [   56.826075]        ----
> [   56.828530]   lock(&portal->cgr_lock);
> [   56.832306]   <Interrupt>
> [   56.834934]     lock(&portal->cgr_lock);
> [   56.838883]
> [   56.838883]  *** DEADLOCK ***
> [   56.838883]
> [   56.844811] no locks held by swapper/2/0.
> [   56.848832]
> [   56.848832] stack backtrace:
> [   56.853199] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028
> [   56.861917] Hardware name: LS1043A RDB Board (DT)
> [   56.866634] Call trace:
> [   56.869090]  dump_backtrace+0x9c/0xf8
> [   56.872772]  show_stack+0x18/0x24
> [   56.876104]  dump_stack_lvl+0x60/0xac
> [   56.879788]  dump_stack+0x18/0x24
> [   56.883123]  print_usage_bug.part.0+0x290/0x348
> [   56.887678]  mark_lock+0x77c/0x960
> [   56.891102]  __lock_acquire+0xa54/0x1f90
> [   56.895046]  lock_acquire+0x1e4/0x2fc
> [   56.898731]  _raw_spin_lock_irqsave+0x6c/0xdc
> [   56.903112]  qman_update_cgr+0x60/0xfc
> [   56.906885]  qman_update_cgr_smp_call+0x1c/0x30
> [   56.911440]  __flush_smp_call_function_queue+0x15c/0x2f4
> [   56.916775]  generic_smp_call_function_single_interrupt+0x14/0x20
> [   56.922891]  ipi_handler+0xb4/0x304
> [   56.926404]  handle_percpu_devid_irq+0x8c/0x144
> [   56.930959]  generic_handle_domain_irq+0x2c/0x44
> [   56.935596]  gic_handle_irq+0x44/0xc4
> [   56.939281]  call_on_irq_stack+0x24/0x4c
> [   56.943225]  do_interrupt_handler+0x80/0x84
> [   56.947431]  el1_interrupt+0x34/0x64
> [   56.951030]  el1h_64_irq_handler+0x18/0x24
> [   56.955151]  el1h_64_irq+0x64/0x68
> [   56.958570]  cpuidle_enter_state+0x15c/0x540
> [   56.962865]  cpuidle_enter+0x38/0x50
> [   56.966467]  do_idle+0x218/0x2a0
> [   56.969714]  cpu_startup_entry+0x28/0x2c
> [   56.973654]  secondary_start_kernel+0x138/0x15c
> [   56.978209]  __secondary_switched+0xb8/0xbc

--Sean