lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62ba61a5-d172-400e-beb5-e593297129ef@amd.com>
Date: Sat, 8 Nov 2025 22:56:38 +0530
From: Vasant Hegde <vasant.hegde@....com>
To: Jason Gunthorpe <jgg@...dia.com>,
 Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Cc: nicolinc@...dia.com, linux-kernel@...r.kernel.org, robin.murphy@....com,
 will@...nel.org, joro@...tes.org, kevin.tian@...el.com, jsnitsel@...hat.com,
 iommu@...ts.linux.dev, santosh.shukla@....com, sairaj.arunkodilkar@....com,
 jon.grimm@....com, prashanthpra@...gle.com, wvw@...gle.com,
 wnliu@...gle.com, gptran@...gle.com, kpsingh@...gle.com,
 joao.m.martins@...cle.com, alejandro.j.jimenez@...cle.com
Subject: Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host
 page table in DTE

Jason,


On 10/23/2025 6:38 PM, Jason Gunthorpe wrote:
> On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
>> @@ -2088,37 +2104,28 @@ static void set_dte_entry(struct amd_iommu *iommu,
>>  	else
>>  		domid = domain->id;
>>  
>>  	/*
>>  	 * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
>>  	 * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
>>  	 * do_iommu_domain_alloc().
>>  	 */
>>  	WARN_ON(amd_iommu_snp_en && (domid == 0));
>>  
>> +	amd_iommu_make_clear_dte(dev_data, &new);
>>  
>> +	old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;
> 
> This old_domid stuff doesn't make any sense. I think the commit that
> added it is bonkers: 36b7200f67df ("iommu/amd: Flush old domains in kdump kernel")
> 
> The problem is that the dom ids that are present in the re-used DTE
> table have to be reserved from the domain id alloctor at boot.
> > If the kdump kernel tries to create new DTEs it MUST NOT re-use any
> IDs that are actively being using in DTEs already or you get data
> corruption. Randomly flushing IDs is just getting lucky..

Good catch. Thanks!

Looks like commit 38e5f33ee359 ("iommu/amd: Reuse device table for kdump") broke
domain ID reservation in kdump path :-( We will fix it.


> 
> Not for this series, but something to think about.
> 
>> +	if (gcr3_info && gcr3_info->gcr3_tbl)
>> +		set_dte_gcr3_table(dev_data, &new);
>> +	else if (domain->iop.mode != PAGE_MODE_NONE)
>> +		amd_iommu_set_dte_v1(dev_data, domain, &new);
>>  
>> +	/* Note: The IR, IW, TV, DOMID are needed for both v1 and gcr3 table */
>> +	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
>> +	new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid);
>>  
>> +	if (dev_data->ats_enabled)
>> +		new.data[1] |= DTE_FLAG_IOTLB;
> 
> These three should be merged into the two functions so they stand
> alone. These sets have to be made in the next patch, doesn't make
> sense to open code them in callers.
> 
> Like this, it is simple and readable. It directly answers the question
> 'what bits are set when the driver creates this kind of DTE'
> 
> static void set_dte_gcr3_table(struct amd_iommu *iommu,
> 			       struct iommu_dev_data *dev_data,
> 			       struct dev_table_entry *target)
> {
> 	struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
> 	u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);
> 
> 	target->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> 			   DTE_FLAG_GV | FIELD_PREP(DTE_GLX, gcr3_info->glx) |
> 			   FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12) |
> 			   (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
> 			   (pdom_is_v2_pgtbl_mode(dev_data->domain) ?
> 				    target->data[0] |= DTE_FLAG_GIOV :
> 				    0);
> 
> 	target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
> 			   FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31) |
> 			   FIELD_PREP(DTE_DOMID_MASK, dev_data->gcr3_info.domid);
> 
> 	/* Guest page table can only support 4 and 5 levels  */
> 	if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
> 		target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
> 	else
> 		target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
> }
> 
> void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
> 			  struct protection_domain *domain,
> 			  struct dev_table_entry *new)
> {
> 	u64 htrp = iommu_virt_to_phys(domain->iop.root);
> 
> 	new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> 			FIELD_PREP(DTE_MODE_MASK, domain->iop.mode) |
> 			FIELD_PREP(DTE_HOST_TRP, htrp >> 12) |
> 			(dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
> 			(domain->dirty_tracking ? DTE_FLAG_HAD : 0);
> 	new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domain->id);
> }
> 
> (It is nice to sort the fields by the spec order, I didn't do that)
> 
> Looks like an identity one is needed too:
> 
> void set_dte_identity(struct dev_table_entry *new)
> {
> 	new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
> 			(dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0);
> }
> 
> Then the whole function is pretty much just:
> 
> 	amd_iommu_make_clear_dte(dev_data, &new);
> 	if (gcr3_info && gcr3_info->gcr3_tbl)
> 		set_dte_gcr3_table(dev_data, &new);
> 	else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
> 		set_dte_identity(&new)
> 	else if (domain->domain.type == IOMMU_DOMAIN_PAGING && io.mode == V1)
> 		amd_iommu_set_dte_v1(dev_data, domain, &new);
> 	else
> 		WARN_ON(true);
> 
> ?
> 
> (though how does IDENTITY on a device with a PASID installed work?)

Probably set_dte_identity() should call  set_dte_gcr3_table () and update
relevant fields.

-Vasant


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ