netdev - Re: Invalid wait context in qman_update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230323184701.4awirfstyx2xllnz@skbuf>
Date:   Thu, 23 Mar 2023 20:47:01 +0200
From:   Vladimir Oltean <vladimir.oltean@....com>
To:     Sean Anderson <sean.anderson@...o.com>
Cc:     netdev@...r.kernel.org, Madalin Bucur <madalin.bucur@....com>,
        Camelia Groza <camelia.groza@....com>
Subject: Re: Invalid wait context in qman_update_cgr()

On Thu, Mar 23, 2023 at 11:58:00AM -0400, Sean Anderson wrote:
> > Do you have any clues what is wrong?
>
> Do you have PREEMPT_RT+PROVE_RAW_LOCK_NESTING enabled?

No, just CONFIG_PROVE_RAW_LOCK_NESTING.

> If so, the problem seems to be that we're in unthreaded hardirq context
> (LD_WAIT_SPIN), but the lock is LD_WAIT_CONFIG. Maybe we should be
> using some other smp_call function? Maybe we should be using
> spin_lock (like qman_create_cgr) and not spin_lock_irqsave (like
> qman_delete_cgr)?

Plain spin_lock() has the same wait context as spin_lock_irqsave(),
and so, by itself, would not help. Maybe you mean raw_spin_lock() which
always has a wait context compatible with LD_WAIT_SPIN here.

Note - I'm not suggesting that replacing with a raw spinlock is the
correct solution here.

FWIW, a straight conversion from spinlocks to raw spinlocks produces
this other stack trace. It would be good if you could take a look too.
The lockdep usage tracker is clean prior to commit 914f8b228ede ("soc:
fsl: qbman: Add CGR update function").

[   56.650501] ================================
[   56.654782] WARNING: inconsistent lock state
[   56.659063] 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028 Not tainted
[   56.665170] --------------------------------
[   56.669449] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[   56.675467] swapper/2/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[   56.680625] ffff1dc165e124e0 (&portal->cgr_lock){?.+.}-{2:2}, at: qman_update_cgr+0x60/0xfc
[   56.689054] {HARDIRQ-ON-W} state was registered at:
[   56.693943]   lock_acquire+0x1e4/0x2fc
[   56.697720]   _raw_spin_lock+0x5c/0xc0
[   56.701494]   qman_create_cgr+0xbc/0x2b4
[   56.705440]   dpaa_eth_cgr_init+0xc0/0x160
[   56.709560]   dpaa_eth_probe+0x6a8/0xf44
[   56.713506]   platform_probe+0x68/0xdc
[   56.717282]   really_probe+0x148/0x2ac
[   56.721053]   __driver_probe_device+0x78/0xe0
[   56.725432]   driver_probe_device+0xd8/0x160
[   56.729724]   __driver_attach+0x9c/0x1ac
[   56.733668]   bus_for_each_dev+0x74/0xd4
[   56.737612]   driver_attach+0x24/0x30
[   56.741294]   bus_add_driver+0xe4/0x1e8
[   56.745151]   driver_register+0x60/0x128
[   56.749096]   __platform_driver_register+0x28/0x34
[   56.753911]   dpaa_load+0x34/0x74
[   56.757250]   do_one_initcall+0x74/0x2f0
[   56.761192]   kernel_init_freeable+0x2ac/0x510
[   56.765660]   kernel_init+0x24/0x1dc
[   56.769261]   ret_from_fork+0x10/0x20
[   56.772943] irq event stamp: 274366
[   56.776441] hardirqs last  enabled at (274365): [<ffffdc95dfdae554>] cpuidle_enter_state+0x158/0x540
[   56.785601] hardirqs last disabled at (274366): [<ffffdc95dfdac1b0>] el1_interrupt+0x24/0x64
[   56.794063] softirqs last  enabled at (274330): [<ffffdc95de6104d8>] __do_softirq+0x438/0x4ec
[   56.802609] softirqs last disabled at (274323): [<ffffdc95de616610>] ____do_softirq+0x10/0x1c
[   56.811156]
[   56.811156] other info that might help us debug this:
[   56.817692]  Possible unsafe locking scenario:
[   56.817692]
[   56.823620]        CPU0
[   56.826075]        ----
[   56.828530]   lock(&portal->cgr_lock);
[   56.832306]   <Interrupt>
[   56.834934]     lock(&portal->cgr_lock);
[   56.838883]
[   56.838883]  *** DEADLOCK ***
[   56.838883]
[   56.844811] no locks held by swapper/2/0.
[   56.848832]
[   56.848832] stack backtrace:
[   56.853199] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 6.3.0-rc2-00993-gdadb180cb16f-dirty #2028
[   56.861917] Hardware name: LS1043A RDB Board (DT)
[   56.866634] Call trace:
[   56.869090]  dump_backtrace+0x9c/0xf8
[   56.872772]  show_stack+0x18/0x24
[   56.876104]  dump_stack_lvl+0x60/0xac
[   56.879788]  dump_stack+0x18/0x24
[   56.883123]  print_usage_bug.part.0+0x290/0x348
[   56.887678]  mark_lock+0x77c/0x960
[   56.891102]  __lock_acquire+0xa54/0x1f90
[   56.895046]  lock_acquire+0x1e4/0x2fc
[   56.898731]  _raw_spin_lock_irqsave+0x6c/0xdc
[   56.903112]  qman_update_cgr+0x60/0xfc
[   56.906885]  qman_update_cgr_smp_call+0x1c/0x30
[   56.911440]  __flush_smp_call_function_queue+0x15c/0x2f4
[   56.916775]  generic_smp_call_function_single_interrupt+0x14/0x20
[   56.922891]  ipi_handler+0xb4/0x304
[   56.926404]  handle_percpu_devid_irq+0x8c/0x144
[   56.930959]  generic_handle_domain_irq+0x2c/0x44
[   56.935596]  gic_handle_irq+0x44/0xc4
[   56.939281]  call_on_irq_stack+0x24/0x4c
[   56.943225]  do_interrupt_handler+0x80/0x84
[   56.947431]  el1_interrupt+0x34/0x64
[   56.951030]  el1h_64_irq_handler+0x18/0x24
[   56.955151]  el1h_64_irq+0x64/0x68
[   56.958570]  cpuidle_enter_state+0x15c/0x540
[   56.962865]  cpuidle_enter+0x38/0x50
[   56.966467]  do_idle+0x218/0x2a0
[   56.969714]  cpu_startup_entry+0x28/0x2c
[   56.973654]  secondary_start_kernel+0x138/0x15c
[   56.978209]  __secondary_switched+0xb8/0xbc