[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32d2a380-ec66-4087-83f6-339f22745889@amd.com>
Date: Thu, 9 Oct 2025 12:48:27 +0530
From: Sairaj Kodilkar <sarunkod@....com>
To: Jason Gunthorpe <jgg@...dia.com>, "Suthikulpanit, Suravee"
<suravee.suthikulpanit@....com>
CC: <nicolinc@...dia.com>, <linux-kernel@...r.kernel.org>,
<robin.murphy@....com>, <will@...nel.org>, <joro@...tes.org>,
<kevin.tian@...el.com>, <jsnitsel@...hat.com>, <vasant.hegde@....com>,
<iommu@...ts.linux.dev>, <santosh.shukla@....com>,
<sairaj.arunkodilkar@....com>, <jon.grimm@....com>,
<prashanthpra@...gle.com>, <wvw@...gle.com>, <wnliu@...gle.com>,
<gptran@...gle.com>, <kpsingh@...gle.com>, <joao.m.martins@...cle.com>,
<alejandro.j.jimenez@...cle.com>
Subject: Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain
attach/detach
On 10/8/2025 5:13 AM, Jason Gunthorpe wrote:
> On Tue, Oct 07, 2025 at 02:22:30PM -0500, Suthikulpanit, Suravee wrote:
>>
>> On 10/6/2025 9:59 AM, Jason Gunthorpe wrote:
>>> On Wed, Oct 01, 2025 at 06:09:53AM +0000, Suravee Suthikulpanit wrote:
>>>> ....
>>>> + if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
>>>> + return;
>>>> +
>>>> + amd_iommu_make_clear_dte(dev_data, &new);
>>>> +
>>>> + new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
>>>> + new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
>>>> + new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
>>>> + new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);
>>>> + if (pdom->dirty_tracking)
>>>> + new.data[0] |= DTE_FLAG_HAD;
>>>> +
>>>> + if (dev_data->ats_enabled)
>>>> + new.data[1] |= DTE_FLAG_IOTLB;
>>> This sequence should be in some set_dte_gcr3() ??
>> Not sure what you mean. This logic was in set_dte_entry(), and duplicated
>> here in the set_dte_nested() since we no longer calling set_dte_entry() from
>> the nested path. Also, it's not really related to GCR3 table.
> Sorry some make_ste_v1()
>
> This logic should be the same for any v1 page table.
>>>> + /*
>>>> + * Restore cached persistent DTE bits, which can be set by information
>>>> + * in IVRS table. See set_dev_entry_from_acpi().
>>>> + */
>>>> + initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
>>>> + if (initial_dte) {
>>>> + new.data128[0] |= initial_dte->data128[0];
>>>> + new.data128[1] |= initial_dte->data128[1];
>>>> + }
>>> This should go into amd_iommu_make_clear_dte() I think, and refactor
>>> it out of iommu_update_dte256() ?
>>> Every created DTE needs these bits set, right?
>> Currently, the amd_iommu_make_clear_dte() clears all the DTE bits and set
>> the DTE[V] (valid) bit. This is used when preparing the DTE for programming,
>> detaching domain, and when setting the blocking domain. Putting this logic
>> in the function would change the behavior.
> Yes, but it seems like that is the right behavior?
>
>> These bits affect Interrupt remapping (Lint0/Lint1/NMI/INIT/ExtInt interrupt
>> pass-through) and System management message behavior.
> Because these bits shouldn't be cleared on a blocking domain!
> Interrupts are still expected to work.
>
>
>>>> +
>>>> + /* Guest translation stuff */
>>>> + new.data[0] |= (gdte->data[0] &
>>>> + (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));
>>>> +
>>>> + /* GCR3 table */
>>>> + new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
>>>> + new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
>>>> +
>>>> + /* Guest paging mode */
>>>> + new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);
>>> I didn't see anything validating gdte has only permitted set bits in
>>> the prior patch?
>> Not sure which on are you referring to.
> You have to validate gdte has only valid bits.
>
>>> If this is goint to decode array item by item then why not use struct
>>> iommu_hwpt_amd_guest in the nested_domain ?
>> The struct dev_table_entry *gdte is basically the same information as in the
>> struct iommu_hwpt_amd_guest.dte that we copied from the userspace into the
>> more appropriate in-kernel data structure type, which is used within the
>> driver.
> The appropriate type is the userspace type if you don't actually ever
> use anything unique to the kernel type. You can avoid the special
> assert/etc since this way of accessing it is not sensitive to layout.
>
>> Here, we just select only what we needed for configuring guest page table
>> specifically to be programmed onto the host DTE.
> Everything you don't pick up should be checked to be 0. VMM needs to
> filter out unsuopported things or generate a bad DTE error.
An alternative can be to introduce a struct which only contains relevant
fields.
Something similar to -->
struct iommu_hwpt_amd_guest {
void *gcr3;
unsigned char guest_paging_mode;
}
VMM can take care of writing to these fields.
Thanks
Sairaj
Powered by blists - more mailing lists