lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251023130833.GF262900@nvidia.com>
Date: Thu, 23 Oct 2025 10:08:33 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
Cc: nicolinc@...dia.com, linux-kernel@...r.kernel.org, robin.murphy@....com,
	will@...nel.org, joro@...tes.org, kevin.tian@...el.com,
	jsnitsel@...hat.com, vasant.hegde@....com, iommu@...ts.linux.dev,
	santosh.shukla@....com, sairaj.arunkodilkar@....com,
	jon.grimm@....com, prashanthpra@...gle.com, wvw@...gle.com,
	wnliu@...gle.com, gptran@...gle.com, kpsingh@...gle.com,
	joao.m.martins@...cle.com, alejandro.j.jimenez@...cle.com
Subject: Re: [PATCH v4 15/16] iommu/amd: Refactor logic to program the host
 page table in DTE

On Tue, Oct 21, 2025 at 01:43:23AM +0000, Suravee Suthikulpanit wrote:
> @@ -2088,37 +2104,28 @@ static void set_dte_entry(struct amd_iommu *iommu,
>  	else
>  		domid = domain->id;
>  
>  	/*
>  	 * When SNP is enabled, we can only support TV=1 with non-zero domain ID.
>  	 * This is prevented by the SNP-enable and IOMMU_DOMAIN_IDENTITY check in
>  	 * do_iommu_domain_alloc().
>  	 */
>  	WARN_ON(amd_iommu_snp_en && (domid == 0));
>  
> +	amd_iommu_make_clear_dte(dev_data, &new);
>  
> +	old_domid = READ_ONCE(dte->data[1]) & DTE_DOMID_MASK;

This old_domid stuff doesn't make any sense. I think the commit that
added it is bonkers: 36b7200f67df ("iommu/amd: Flush old domains in kdump kernel")

The problem is that the dom ids that are present in the re-used DTE
table have to be reserved from the domain id alloctor at boot.

If the kdump kernel tries to create new DTEs it MUST NOT re-use any
IDs that are actively being using in DTEs already or you get data
corruption. Randomly flushing IDs is just getting lucky..

Not for this series, but something to think about.

> +	if (gcr3_info && gcr3_info->gcr3_tbl)
> +		set_dte_gcr3_table(dev_data, &new);
> +	else if (domain->iop.mode != PAGE_MODE_NONE)
> +		amd_iommu_set_dte_v1(dev_data, domain, &new);
>  
> +	/* Note: The IR, IW, TV, DOMID are needed for both v1 and gcr3 table */
> +	new.data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV;
> +	new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domid);
>  
> +	if (dev_data->ats_enabled)
> +		new.data[1] |= DTE_FLAG_IOTLB;

These three should be merged into the two functions so they stand
alone. These sets have to be made in the next patch, doesn't make
sense to open code them in callers.

Like this, it is simple and readable. It directly answers the question
'what bits are set when the driver creates this kind of DTE'

static void set_dte_gcr3_table(struct amd_iommu *iommu,
			       struct iommu_dev_data *dev_data,
			       struct dev_table_entry *target)
{
	struct gcr3_tbl_info *gcr3_info = &dev_data->gcr3_info;
	u64 gcr3 = iommu_virt_to_phys(gcr3_info->gcr3_tbl);

	target->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
			   DTE_FLAG_GV | FIELD_PREP(DTE_GLX, gcr3_info->glx) |
			   FIELD_PREP(DTE_GCR3_14_12, gcr3 >> 12) |
			   (dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
			   (pdom_is_v2_pgtbl_mode(dev_data->domain) ?
				    target->data[0] |= DTE_FLAG_GIOV :
				    0);

	target->data[1] |= FIELD_PREP(DTE_GCR3_30_15, gcr3 >> 15) |
			   FIELD_PREP(DTE_GCR3_51_31, gcr3 >> 31) |
			   FIELD_PREP(DTE_DOMID_MASK, dev_data->gcr3_info.domid);

	/* Guest page table can only support 4 and 5 levels  */
	if (amd_iommu_gpt_level == PAGE_MODE_5_LEVEL)
		target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_5_LEVEL);
	else
		target->data[2] |= FIELD_PREP(DTE_GPT_LEVEL_MASK, GUEST_PGTABLE_4_LEVEL);
}

void amd_iommu_set_dte_v1(struct iommu_dev_data *dev_data,
			  struct protection_domain *domain,
			  struct dev_table_entry *new)
{
	u64 htrp = iommu_virt_to_phys(domain->iop.root);

	new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
			FIELD_PREP(DTE_MODE_MASK, domain->iop.mode) |
			FIELD_PREP(DTE_HOST_TRP, htrp >> 12) |
			(dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0) |
			(domain->dirty_tracking ? DTE_FLAG_HAD : 0);
	new.data[1] |= FIELD_PREP(DTE_DOMID_MASK, domain->id);
}

(It is nice to sort the fields by the spec order, I didn't do that)

Looks like an identity one is needed too:

void set_dte_identity(struct dev_table_entry *new)
{
	new->data[0] |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_TV |
			(dev_data->ats_enabled ? DTE_FLAG_IOTLB : 0);
}

Then the whole function is pretty much just:

	amd_iommu_make_clear_dte(dev_data, &new);
	if (gcr3_info && gcr3_info->gcr3_tbl)
		set_dte_gcr3_table(dev_data, &new);
	else if (domain->domain.type == IOMMU_DOMAIN_IDENTITY)
		set_dte_identity(&new)
	else if (domain->domain.type == IOMMU_DOMAIN_PAGING && io.mode == V1)
		amd_iommu_set_dte_v1(dev_data, domain, &new);
	else
		WARN_ON(true);

?

(though how does IDENTITY on a device with a PASID installed work?)

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ