[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251007234313.GE3474167@nvidia.com>
Date: Tue, 7 Oct 2025 20:43:13 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: "Suthikulpanit, Suravee" <suravee.suthikulpanit@....com>
Cc: nicolinc@...dia.com, linux-kernel@...r.kernel.org, robin.murphy@....com,
will@...nel.org, joro@...tes.org, kevin.tian@...el.com,
jsnitsel@...hat.com, vasant.hegde@....com, iommu@...ts.linux.dev,
santosh.shukla@....com, sairaj.arunkodilkar@....com,
jon.grimm@....com, prashanthpra@...gle.com, wvw@...gle.com,
wnliu@...gle.com, gptran@...gle.com, kpsingh@...gle.com,
joao.m.martins@...cle.com, alejandro.j.jimenez@...cle.com
Subject: Re: [PATCH v2 11/12] iommu/amd: Add support for nested domain
attach/detach
On Tue, Oct 07, 2025 at 02:22:30PM -0500, Suthikulpanit, Suravee wrote:
>
>
> On 10/6/2025 9:59 AM, Jason Gunthorpe wrote:
> > On Wed, Oct 01, 2025 at 06:09:53AM +0000, Suravee Suthikulpanit wrote:
> > > ....
> > > + if (WARN_ON(!ndom || !pdom || (pdom->iop.mode == PAGE_MODE_NONE)))
> > > + return;
> > > +
> > > + amd_iommu_make_clear_dte(dev_data, &new);
> > > +
> > > + new.data[0] |= iommu_virt_to_phys(pdom->iop.root);
> > > + new.data[0] |= FIELD_PREP(DTE_MODE_MASK, pdom->iop.mode);
> > > + new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> > > + new.data[0] |= (DTE_FLAG_PPR & gdte->data[0]);
> >
> > > + if (pdom->dirty_tracking)
> > > + new.data[0] |= DTE_FLAG_HAD;
> > > +
> > > + if (dev_data->ats_enabled)
> > > + new.data[1] |= DTE_FLAG_IOTLB;
> >
> > This sequence should be in some set_dte_gcr3() ??
>
> Not sure what you mean. This logic was in set_dte_entry(), and duplicated
> here in the set_dte_nested() since we no longer calling set_dte_entry() from
> the nested path. Also, it's not really related to GCR3 table.
Sorry some make_ste_v1()
This logic should be the same for any v1 page table.
> > > + /*
> > > + * Restore cached persistent DTE bits, which can be set by information
> > > + * in IVRS table. See set_dev_entry_from_acpi().
> > > + */
> > > + initial_dte = amd_iommu_get_ivhd_dte_flags(iommu->pci_seg->id, dev_data->devid);
> > > + if (initial_dte) {
> > > + new.data128[0] |= initial_dte->data128[0];
> > > + new.data128[1] |= initial_dte->data128[1];
> > > + }
> >
> > This should go into amd_iommu_make_clear_dte() I think, and refactor
> > it out of iommu_update_dte256() ?
> > Every created DTE needs these bits set, right?
>
> Currently, the amd_iommu_make_clear_dte() clears all the DTE bits and set
> the DTE[V] (valid) bit. This is used when preparing the DTE for programming,
> detaching domain, and when setting the blocking domain. Putting this logic
> in the function would change the behavior.
Yes, but it seems like that is the right behavior?
> These bits affect Interrupt remapping (Lint0/Lint1/NMI/INIT/ExtInt interrupt
> pass-through) and System management message behavior.
Because these bits shouldn't be cleared on a blocking domain!
Interrupts are still expected to work.
> > > +
> > > + /* Guest translation stuff */
> > > + new.data[0] |= (gdte->data[0] &
> > > + (DTE_GLX | DTE_FLAG_GV | DTE_FLAG_GIOV));
> > > +
> > > + /* GCR3 table */
> > > + new.data[0] |= (gdte->data[0] & DTE_GCR3_14_12);
> > > + new.data[1] |= (gdte->data[1] & (DTE_GCR3_30_15 | DTE_GCR3_51_31));
> > > +
> > > + /* Guest paging mode */
> > > + new.data[2] |= (gdte->data[2] & DTE_GPT_LEVEL_MASK);
> >
> > I didn't see anything validating gdte has only permitted set bits in
> > the prior patch?
>
> Not sure which on are you referring to.
You have to validate gdte has only valid bits.
> > If this is goint to decode array item by item then why not use struct
> > iommu_hwpt_amd_guest in the nested_domain ?
>
> The struct dev_table_entry *gdte is basically the same information as in the
> struct iommu_hwpt_amd_guest.dte that we copied from the userspace into the
> more appropriate in-kernel data structure type, which is used within the
> driver.
The appropriate type is the userspace type if you don't actually ever
use anything unique to the kernel type. You can avoid the special
assert/etc since this way of accessing it is not sensitive to layout.
> Here, we just select only what we needed for configuring guest page table
> specifically to be programmed onto the host DTE.
Everything you don't pick up should be checked to be 0. VMM needs to
filter out unsuopported things or generate a bad DTE error.
Jason
Powered by blists - more mailing lists