[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fc722315-bc9e-45c6-9aad-3b64b04404bd@oracle.com>
Date: Wed, 23 Apr 2025 11:21:08 +0100
From: Joao Martins <joao.m.martins@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>,
Vasant Hegde <vasant.hegde@....com>
Cc: kvm@...r.kernel.org, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
Maxim Levitsky <mlevitsk@...hat.com>,
David Matlack <dmatlack@...gle.com>,
Alejandro Jimenez <alejandro.j.jimenez@...cle.com>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
Joerg Roedel <joro@...tes.org>, David Woodhouse <dwmw2@...radead.org>,
Lu Baolu <baolu.lu@...ux.intel.com>,
Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH 64/67] iommu/amd: KVM: SVM: Allow KVM to control need for
GA log interrupts
On 18/04/2025 19:48, Sean Christopherson wrote:
> On Fri, Apr 18, 2025, Vasant Hegde wrote:
>> On 4/9/2025 5:26 PM, Joao Martins wrote:
>>> On 04/04/2025 20:39, Sean Christopherson wrote:
>>>> Add plumbing to the AMD IOMMU driver to allow KVM to control whether or
>>>> not an IRTE is configured to generate GA log interrupts. KVM only needs a
>>>> notification if the target vCPU is blocking, so the vCPU can be awakened.
>>>> If a vCPU is preempted or exits to userspace, KVM clears is_run, but will
>>>> set the vCPU back to running when userspace does KVM_RUN and/or the vCPU
>>>> task is scheduled back in, i.e. KVM doesn't need a notification.
>>>>
>>>> Unconditionally pass "true" in all KVM paths to isolate the IOMMU changes
>>>> from the KVM changes insofar as possible.
>>>>
>>>> Opportunistically swap the ordering of parameters for amd_iommu_update_ga()
>>>> so that the match amd_iommu_activate_guest_mode().
>>>
>>> Unfortunately I think this patch and the next one might be riding on the
>>> assumption that amd_iommu_update_ga() is always cheap :( -- see below.
>>>
>>> I didn't spot anything else flawed in the series though, just this one. I would
>>> suggest holding off on this and the next one, while progressing with the rest of
>>> the series.
>>>
>>>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>>>> index 2e016b98fa1b..27b03e718980 100644
>>>> --- a/drivers/iommu/amd/iommu.c
>>>> +++ b/drivers/iommu/amd/iommu.c
>>>> -static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu)
>>>> +static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu,
>>>> + bool ga_log_intr)
>>>> {
>>>> if (cpu >= 0) {
>>>> entry->lo.fields_vapic.destination =
>>>> @@ -3783,12 +3784,14 @@ static void __amd_iommu_update_ga(struct irte_ga *entry, int cpu)
>>>> entry->hi.fields.destination =
>>>> APICID_TO_IRTE_DEST_HI(cpu);
>>>> entry->lo.fields_vapic.is_run = true;
>>>> + entry->lo.fields_vapic.ga_log_intr = false;
>>>> } else {
>>>> entry->lo.fields_vapic.is_run = false;
>>>> + entry->lo.fields_vapic.ga_log_intr = ga_log_intr;
>>>> }
>>>> }
>>>>
>>>
>>> isRun, Destination and GATag are not cached. Quoting the update from a few years
>>> back (page 93 of IOMMU spec dated Feb 2025):
>>>
>>> | When virtual interrupts are enabled by setting MMIO Offset 0018h[GAEn] and
>>> | IRTE[GuestMode=1], IRTE[IsRun], IRTE[Destination], and if present IRTE[GATag],
>>> | are not cached by the IOMMU. Modifications to these fields do not require an
>>> | invalidation of the Interrupt Remapping Table.
>>>
>>> This is the reason we were able to get rid of the IOMMU invalidation in
>>> amd_iommu_update_ga() ... which sped up vmexit/vmenter flow with iommu avic.
>>> Besides the lock contention that was observed at the time, we were seeing stalls
>>> in this path with enough vCPUs IIRC; CCing Alejandro to keep me honest.
>>>
>>> Now this change above is incorrect as is and to make it correct: you will need
>>> xor with the previous content of the IRTE::ga_log_intr and then if it changes
>>> then you re-add back an invalidation command via
>>> iommu_flush_irt_and_complete()). The latter is what I am worried will
>>> reintroduce these above problem :(
>>>
>>> The invalidation command (which has a completion barrier to serialize
>>> invalidation execution) takes some time in h/w, and will make all your vcpus
>>> content on the irq table lock (as is). Even assuming you somehow move the
>>> invalidation outside the lock, you will content on the iommu lock (for the>
>> command queue) or best case assuming no locks (which I am not sure it is
>>> possible) you will need to wait for the command to complete until you can
>>> progress forward with entering/exiting.
>>>
>>> Unless the GALogIntr bit is somehow also not cached too which wouldn't need the
>>> invalidation command (which would be good news!). Adding Suravee/Vasant here.
>>
>> I have checked with HW architects. Its not cached. So we don't need invalidation
>> after updating GALogIntr field in IRTE.
>
> Woot! Better to be lucky than good :-)
Probably worth using this thread Message ID as a Link: tag while the IOMMU
manual isn't yet up to date with this information. That usually takes a while
until formalized.
Powered by blists - more mailing lists