lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aUnB3WNLYape2Nap@google.com>
Date: Mon, 22 Dec 2025 14:10:37 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Ankit Soni <Ankit.Soni@....com>, Marc Zyngier <maz@...nel.org>, 
	Oliver Upton <oliver.upton@...ux.dev>, Joerg Roedel <joro@...tes.org>, 
	David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>, 
	linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev, 
	kvm@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org, 
	Sairaj Kodilkar <sarunkod@....com>, Vasant Hegde <vasant.hegde@....com>, 
	Maxim Levitsky <mlevitsk@...hat.com>, Joao Martins <joao.m.martins@...cle.com>, 
	Francesco Lavra <francescolavra.fl@...il.com>, David Matlack <dmatlack@...gle.com>, 
	Naveen Rao <Naveen.Rao@....com>
Subject: Re: possible deadlock due to irq_set_thread_affinity() calling into
 the scheduler (was Re: [PATCH v3 38/62] KVM: SVM: Take and hold ir_list_lock
 across IRTE updates in IOMMU)

On Mon, Dec 22, 2025, Paolo Bonzini wrote:
> On 12/22/25 20:34, Sean Christopherson wrote:
> > On Mon, Dec 22, 2025, Paolo Bonzini wrote:
> > > notably, __setup_irq() calls wake_up_process outside desc->lock.  Therefore
> > > I'd like so much to treat it as a kernel/irq/ bug; and the simplest (perhaps
> > > too simple...) fix is to drop the wake_up_process().  The only cost is extra
> > > latency on the next interrupt after an affinity change.
> > 
> > Alternatively, what if we rework the KVM<=>IOMMU exchange to decouple updating
> > the IRTE from binding the metadata to the vCPU?  KVM already has the necessary
> > exports to do "out-of-band" updates due to the AVIC architecture requiring IRTE
> > updates on scheduling changes.
> 
> In fact this was actually my first idea, exactly because it makes
> svm->ir_list_lock a leaf lock!
> 
> I threw it away because it makes amd_ir_set_vcpu_affinity() weird, passing
> back the ir_data but not really doing anything else.  Basically its role
> becomes little more than violate abstractions, which seemed wrong.  On the
> other hand, drivers/iommu is already very much tied to the KVM vendor
> modules (in particular avic.c already calls
> amd_iommu_{,de}activate_guest_mode), so who am I to judge what the IOMMU
> driver does.

Yeah, I 100% agree the whole thing is a bit gross, but practically speaking it's
just not feasible to properly abstract the interaction, because in reality the
IOMMU implementation is tightly coupled to the CPU implementation.  E.g. passing
in the address of a PID isn't going to work well with an AMD IOMMU, and passing
in the address of a vCPI isn't going to work well with an Intel IOMMU.

And FWIW, the lack of true abstraction isn't limited to x86.  ARM's GIGv4 passes
around "struct its_cmd_info" and PPC uses "struct kvmppc_xive_irq_state".

I mean, we could do what GICv4 does and use irq_set_vcpu_affinity() to pass
different commands to the IOMMU, e.g. by formalizing "enum avic_vcpu_action"
between KVM and IOMMU so that avic_update_iommu_vcpu_affinity() wouldn't need to
_directly_ call AMD IOMMU code.  But IMO that would be a net negative because in
practice all it would do is make it harder to understand what's going on.  And
it would more directly create this potential deadlock.

Huh.  Which begs the question of whether or not ARM is also affected by this
deadlock, without the extra hop through svm->ir_list_lock:

  (a) irq_set_thread_affinity() triggers the scheduler via wake_up_process(),
      while irq_desc->lock is taken

  (b) the scheduler calls into KVM with rq_lock taken, and KVM uses
      irq_set_vcpu_affinity() via vgic_v4_{load,put}()

If ARM is affected, then maybe fixing this in irq_set_thread_affinity() is indeed
better than fudging around the deadlock in avic.c.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ