[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aD4lJyqHswt-Mofy@google.com>
Date: Mon, 2 Jun 2025 15:26:47 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Sairaj Kodilkar <sarunkod@....com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Joerg Roedel <joro@...tes.org>,
David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>, kvm@...r.kernel.org,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
Vasant Hegde <vasant.hegde@....com>, Maxim Levitsky <mlevitsk@...hat.com>,
Joao Martins <joao.m.martins@...cle.com>, Francesco Lavra <francescolavra.fl@...il.com>,
David Matlack <dmatlack@...gle.com>
Subject: Re: [PATCH v2 41/59] iommu/amd: KVM: SVM: Add IRTE metadata to
affined vCPU's list if AVIC is inhibited
On Fri, May 30, 2025, Sairaj Kodilkar wrote:
> On 5/23/2025 6:29 AM, Sean Christopherson wrote:
>
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index 718bd9604f71..becef69a306d 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -3939,7 +3939,10 @@ static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *info)
> > ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
> > ir_data->ga_vector = pi_data->vector;
> > ir_data->ga_tag = pi_data->ga_tag;
> > - ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> > + if (pi_data->is_guest_mode)
> > + ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> > + else
> > + ret = amd_iommu_deactivate_guest_mode(ir_data);
>
> Hi Sean,
> Why the extra nesting here ?
> Its much more cleaner to do..
>
> if (pi_data && pi_data->is_guest_mode) {
> ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
> ir_data->ga_vector = pi_data->vector;
> ir_data->ga_tag = pi_data->ga_tag;
> ret = amd_iommu_activate_guest_mode(ir_data, pi_data->cpu);
> } else {
> ret = amd_iommu_deactivate_guest_mode(ir_data);
> }
Because the intent of the change (and the long-term code) is to affine/bind the
vCPU to the IRTE metadata, while leaving the actual IRTE in remapped mode. I.e.
connect the passed in pi_data (@info) to the the chip data:
pi_data->ir_data = ir_data;
and set the GA root, vector and tag in the chip data.
ir_data->ga_root_ptr = (pi_data->vapic_addr >> 12);
ir_data->ga_vector = pi_data->vector;
ir_data->ga_tag = pi_data->ga_tag;
That way if KVM enables AVIC, KVM can call amd_iommu_activate_guest_mode() to
switch the IRTE to vAPIC mode.
If KVM doesn't bind to the IRTE, KVM would need to track all host IRQs (Linux's
"virtual" IRQ numbers) that can be posted to the vCPU in order to active vAPIC
mode. It would also require taking VM-wide locks in KVM in order to guarantee
accurate IRQ routing information.
FWIW, I don't love that KVM essentially backdoors into the AMD IOMMU via
amd_iommu_(de)activate_guest_mode(), but I also don't see a better alternative.
E.g. on Intel, KVM just leaves the IRTE in posted mode, and relies on the notification
vector IRQ to kick the vCPU into host mode so that KVM can manually process the
PIR.
But that trick doesn't work as well on AMD, because the "guest isn't running" IRQ
will hit whatever CPU is handling the IOMMU interrupts, not the CPU that's running
the vCPU. I.e. it _could_ functionally be made to work, but it would likely yield
pretty poor performance (and would require a decent amount of new KVM code).
Powered by blists - more mailing lists