[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <s2soy7omtrouwk6hdrpcmid526cy7gsh6eopvt52hd77lqi5jd@yovvnge7grt6>
Date: Thu, 28 Aug 2025 17:45:21 +0900
From: Koichiro Den <den@...inux.co.jp>
To: Marc Zyngier <maz@...nel.org>
Cc: linux-arm-kernel@...ts.infradead.org, tglx@...utronix.de,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] irqchip/gic-v3-its: Fix invalid wait context lockdep
report
On Thu, Aug 28, 2025 at 08:56:01AM +0100, Marc Zyngier wrote:
> On Thu, 28 Aug 2025 04:09:00 +0100,
> Koichiro Den <den@...inux.co.jp> wrote:
> >
> > On Wed, Aug 27, 2025 at 01:48:33PM +0100, Marc Zyngier wrote:
> > > On Wed, 27 Aug 2025 08:38:48 +0100,
> > > Koichiro Den <den@...inux.co.jp> wrote:
> > > >
> > > > its_irq_set_vcpu_affinity() always runs under a raw_spin_lock wait
> > > > context, so calling kcalloc there is not permitted and RT-unsafe since
> > > > ___slab_alloc() may acquire a local lock. The below is the actual
> > > > lockdep report observed:
> > > >
> > > > =============================
> > > > [ BUG: Invalid wait context ]
> > > > 6.16.0-rc3-irqchip-next-7e28bba92c5c+ #1 Tainted: G S
> > > > -----------------------------
> > > > qemu-system-aar/2129 is trying to lock:
> > > > ffff0085b74f2178 (batched_entropy_u32.lock){..-.}-{3:3}, at: get_random_u32+0x9c/0x708
> > > > other info that might help us debug this:
> > > > context-{5:5}
> > > > 6 locks held by qemu-system-aar/2129:
> > > > #0: ffff0000b84a0738 (&vdev->igate){+.+.}-{4:4}, at: vfio_pci_core_ioctl+0x40c/0x748 [vfio_pci_core]
> > > > #1: ffff8000883cef68 (lock#6){+.+.}-{4:4}, at: irq_bypass_register_producer+0x64/0x2f0
> > > > #2: ffff0000ac0df960 (&its->its_lock){+.+.}-{4:4}, at: kvm_vgic_v4_set_forwarding+0x224/0x6f0
> > > > #3: ffff000086dc4718 (&irq->irq_lock#3){....}-{2:2}, at: kvm_vgic_v4_set_forwarding+0x288/0x6f0
> > > > #4: ffff0001356200c8 (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0xc8/0x158
> > > > #5: ffff00009eae4850 (&dev->event_map.vlpi_lock){....}-{2:2}, at: its_irq_set_vcpu_affinity+0x8c/0x528
> > > > ...
> > > > Call trace:
> > > > show_stack+0x30/0x98 (C)
> > > > dump_stack_lvl+0x9c/0xd0
> > > > dump_stack+0x1c/0x34
> > > > __lock_acquire+0x814/0xb40
> > > > lock_acquire.part.0+0x16c/0x2a8
> > > > lock_acquire+0x8c/0x178
> > > > get_random_u32+0xd4/0x708
> > > > __get_random_u32_below+0x20/0x80
> > > > shuffle_freelist+0x5c/0x1b0
> > > > allocate_slab+0x15c/0x348
> > > > new_slab+0x48/0x80
> > > > ___slab_alloc+0x590/0x8b8
> > > > __slab_alloc.isra.0+0x3c/0x80
> > > > __kmalloc_noprof+0x174/0x520
> > > > its_vlpi_map+0x834/0xce0
> > > > its_irq_set_vcpu_affinity+0x21c/0x528
> > > > irq_set_vcpu_affinity+0x160/0x1b0
> > > > its_map_vlpi+0x90/0x100
> > > > kvm_vgic_v4_set_forwarding+0x3c4/0x6f0
> > > > kvm_arch_irq_bypass_add_producer+0xac/0x108
> > > > __connect+0x138/0x1b0
> > > > irq_bypass_register_producer+0x16c/0x2f0
> > > > vfio_msi_set_vector_signal+0x2c0/0x5a8 [vfio_pci_core]
> > > > vfio_msi_set_block+0x8c/0x120 [vfio_pci_core]
> > > > vfio_pci_set_msi_trigger+0x120/0x3d8 [vfio_pci_core]
> > >
> > > Huh. I guess this is due to RT not being completely compatible with
> > > GFP_ATOMIC... Why you'd want RT and KVM at the same time is beyond
> > > me, but hey.
> >
> > For the record, I didn't run KVM on RT, though I still believe it's better
> > to conform to the wait context rule and avoid triggering the lockdep
> > splat.
>
> Then I don't understand how you get this, because I have not seen it
> so far.
>
> >
> > I don't know if there are any plans which make kmalloc with GFP_ATOMIC
> > workable under a stricter wait context (getting rid of the local lock
> > in some way?), but I think it would be nicer.
>
> GFP_ATOMIC is documented as being compatible with raw spinlocks in the
> absence of RT, making the above trace pretty odd.
I got the report on my ARM64 env with CONFIG_PROVE_LOCKING=y, which
leads to CONFIG_PROVE_RAW_LOCK_NESTING=y on ARM64. And this report says
that it was trying to acquire a local_lock_t (LD_WAIT_CONFIG) while any
raw_spinlock_t (LD_WAIT_SPIN) being held.
So I still believe we're on the same page; while I got the report on
non-RT, the report just pre-warns the danger for RT. There's no
immediate harm on non-RT.
>
> >
> > >
> > > > ...
> > > >
> > > > To avoid this, simply pre-allocate vlpi_maps when creating an ITS v4
> > > > device with LPIs allcation. The trade-off is some wasted memory
> > > > depending on nr_lpis, if none of those LPIs are never upgraded to VLPIs.
> > > >
> > > > An alternative would be to move the vlpi_maps allocation out of
> > > > its_map_vlpi() and introduce a two-stage prepare/commit flow, allowing a
> > > > caller (KVM in the lockdep splat shown above) to do the allocation
> > > > outside irq_set_vcpu_affinity(). However, this would unnecessarily add
> > > > complexity.
> > >
> > > That's debatable. It is probably fine for now, but if this was to
> > > grow, we'd need to revisit this.
> >
> > Just curious but do you have any plans to replace the current
> > irq_set_vcpu_affinity() approach with something else?
>
> Who knows. This is the Linux kernel, everything changes all the time
> without the need for a good reason. More significantly, the amount of
> *data* being associated with a VLPI could become much higher in the
> future, and add more unnecessary allocation.
Alright, thank you.
-Koichiro
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists