[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62ba61a5-d172-400e-beb5-e593297129ef@amd.com>
Date: Sat, 8 Nov 2025 22:56:38 +0530
From: Vasant Hegde <vasant.hegde@....com>
To: Jason Gunthorpe <jgg@...dia.com>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Cc: nicolinc@...dia.com, linux-kernel@...r.kernel.org, robin.murphy@....com,
will@...nel.org, joro@...tes.org, kevin.tian@...el.com, jsnitsel@...hat.com,
iommu@...ts.linux.dev, santosh.shukla@....com, sairaj.arunkodilkar@....com,
jon.grimm@....com, prashanthpra@...gle.com, wvw@...gle.com,
wnliu@...gle.com, gptran@...gle.com, kpsingh@...gle.com,
joao.m.martins@...cle.com, alejandro.j.jimenez@...cle.com
Subject: Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host
page table in DTE
Jason,
On 10/23/2025 6:38 PM, Jason Gunthorpe wrote:
> On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
>> @@ -2088,37 +2104,28 @@ static void set_dte_entry(struct amd_iommu *iommu,
>> else
>> domid = domain->id;
>>
>> /*
>> * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
>> * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
>> * do_iommu_domain_alloc().
>> */
>> WARN_ON(amd_iommu_snp_en && (domid == 0));
>>
>> + amd_iommu_make_clear_dte(dev_data, &new);
>>
>> + old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
>
> This old_domid stuff doesn't make any sense. I think the commit that
> added it is bonkers: 36b7200f67df ("iommu/amd: Flush old domains in kdump kernel")
>
> The problem is that the dom ids that are present in the re-used DTE
> table have to be reserved from the domain id alloctor at boot.
> > If the kdump kernel tries to create new DTEs it MUST NOT re-use any
> IDs that are actively being using in DTEs already or you get data
> corruption. Randomly flushing IDs is just getting lucky..
Good catch. Thanks!
Looks like commit 38e5f33ee359 ("iommu/amd: Reuse device table for kdump") broke
domain ID reservation in kdump path :-( We will fix it.
>
> Not for this series, but something to think about.
>
>> + if (gcr3_info && gcr3_info->gcr3_tbl)
>> + set_dte_gcr3_table(dev_data, &new);
>> + else if (domain->iop.mode != PAGE_MODE_NONE)
>> + amd_iommu_set_dte_v1(dev_data, domain, &new);
>>
>> + /* Note: The IR, IW, TV, DOMID are needed for both v1 and gcr3 table */
>> + new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
>> + new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid);
>>
>> + if (dev_data->ats_enabled)
>> + new.data[1] |= DTE_FLAG_IOTLB;
>
> These three should be merged into the two functions so they stand
> alone. These sets have to be made in the next patch, doesn't make
> sense to open code them in callers.
>
> Like this, it is simple and readable. It directly answers the question
> 'what bits are set when the driver creates this kind of DTE'
>
> static void set_dte_gcr3_table(struct amd_iommu *iommu,
> struct iommu_dev_data *dev_data,
> struct dev_table_entry *target)
> {
> struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
>
> target->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> DTE_FLAG_GV | FIELD_PREP(DTE_GLX, gcr3_info->glx) |
> FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12) |
> (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
> (pdom_is_v2_pgtbl_mode(dev_data->domain) ?
> target->data[0] |= DTE_FLAG_GIOV :
> 0);
>
> target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
> FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31) |
> FIELD_PREP(DTE_DOMID_MASK, dev_data->gcr3_info.domid);
>
> /* Guest page table can only support 4 and 5 levels */
> if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
> else
> target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
> }
>
> void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
> struct protection_domain *domain,
> struct dev_table_entry *new)
> {
> u64 htrp = iommu_virt_to_phys(domain->iop.root);
>
> new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> FIELD_PREP(DTE_MODE_MASK, domain->iop.mode) |
> FIELD_PREP(DTE_HOST_TRP, htrp >> 12) |
> (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
> (domain->dirty_tracking ? DTE_FLAG_HAD : 0);
> new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domain->id);
> }
>
> (It is nice to sort the fields by the spec order, I didn't do that)
>
> Looks like an identity one is needed too:
>
> void set_dte_identity(struct dev_table_entry *new)
> {
> new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0);
> }
>
> Then the whole function is pretty much just:
>
> amd_iommu_make_clear_dte(dev_data, &new);
> if (gcr3_info && gcr3_info->gcr3_tbl)
> set_dte_gcr3_table(dev_data, &new);
> else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> set_dte_identity(&new)
> else if (domain->domain.type == IOMMU_DOMAIN_PAGING && io.mode == V1)
> amd_iommu_set_dte_v1(dev_data, domain, &new);
> else
> WARN_ON(true);
>
> ?
>
> (though how does IDENTITY on a device with a PASID installed work?)
Probably set_dte_identity() should call set_dte_gcr3_table () and update
relevant fields.
-Vasant
Powered by blists - more mailing lists