lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12c5b705-1387-4d50-8706-cf5c35dcb3cc@amd.com>
Date: Fri, 5 Jan 2024 20:38:47 +0700
From: "Suthikulpanit, Suravee" <suravee.suthikulpanit@....com>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: linux-kernel@...r.kernel.org, iommu@...ts.linux.dev, joro@...tes.org,
 yi.l.liu@...el.com, kevin.tian@...el.com, nicolinc@...dia.com,
 eric.auger@...hat.com, vasant.hegde@....com, jon.grimm@....com,
 santosh.shukla@....com, Dhaval.Giani@....com, pandoh@...gle.com,
 loganodell@...gle.com
Subject: Re: [RFC PATCH 6/6] iommu/amd: Introduce nested translation support

Hi Jason

On 12/13/2023 8:52 PM, Jason Gunthorpe wrote:
> On Tue, Dec 12, 2023 at 10:01:39AM -0600, Suravee Suthikulpanit wrote:
> 
>> -	if ((flags & ~IOMMU_HWPT_ALLOC_DIRTY_TRACKING) || parent || user_data)
>> +		ret = udata_to_iommu_hwpt_amd_v2(user_data, &hwpt);
>> +		if (ret)
>> +			return ERR_PTR(ret);
>> +
>> +		return amd_iommu_nested_domain_alloc(dev, &hwpt);
>> +	}
>> +
>> +	/* Check supported flags */
>> +	if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT |
>> +		       IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
>> +		return ERR_PTR(-EOPNOTSUPP);
>> +
>> +	if (!check_nested_support(flags))
>>   		return ERR_PTR(-EOPNOTSUPP);
>>   
>> -	return do_iommu_domain_alloc(type, dev, flags);
>> +	dom = iommu_domain_alloc(dev->bus);
> 
> Please don't call iommu_domain_alloc, call your internal function and
> force it to allocate the v1 domain..

Okay.

>> +static int nested_gcr3_update(struct iommu_hwpt_amd_v2 *hwpt, struct iommu_domain *udom)
>> +{
>> +	int ret;
>> +	u16 hdev_id;
>> +	struct pci_dev *pdev;
>> +	struct amd_iommu *iommu;
>> +
>> +	iommu = get_amd_iommu_from_devid(hwpt->iommu_id);
>> +	hdev_id = get_hdev_id(iommu, hwpt->gid, hwpt->gdev_id);
>> +
>> +	pr_debug("%s: gid=%u, hdev_id=%#x, gcr3=%#llx\n",
>> +		 __func__, hwpt->gid, hdev_id,
>> +		 (unsigned long long) hwpt->gcr3);
>> +
>> +	pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(hdev_id),
>> +					   hdev_id & 0xff);
> 
> Huh? "hdev_id"? This is not OK..
> 
> The device you are allowed to look at is the "struct device *dev" passed
> to alloc. You cannot pass in a struct device and then override it with
> another value.

Good point. I'll fix this to use the dev.

>> +	if (!pdev)
>> +		return -EINVAL;
>> +
>> +	/* Note: Currently only support GCR3TRPMode with nested translation */
>> +	if (!check_feature2(FEATURE_GCR3TRPMODE))
>> +		return -EOPNOTSUPP;
>> +
>> +	ret = amd_iommu_set_gcr3tbl_trp(iommu, pdev, hwpt->gcr3, hwpt->glx,
>> +					hwpt->guest_paging_mode);
> 
> Waah?
> 
> This is touching the dev table? That is not right, allocation is only
> *ALLOCATION*. The dev table can't be changed until you do attachment.

My understanding is QEMU should call:

1. iommufd_backend_get_ioas()
    -> ioctl(IOMMU_IOAS_ALLOC)

2. For parent domain
iommufd_backend_alloc_hwpt(IOMMU_HWPT_ALLOC_NEST_PARENT)
   -> ioctl( IOMMU_HWPT_ALLOC)
   --- in Linux ---
   ....
   -> iommufd_hwpt_paging_alloc()
     -> struct iommu_ops.domain_alloc_user(IOMMU_HWPT_ALLOC_NEST_PARENT)

3. For parent domain
iommufd_device_attach_hwpt()
   -> vfio_device_attach_container()
     -> ioctl(VFIO_DEVICE_ATTACH_IOMMUFD_PT)
     --- in Linux ---
     vfio_iommufd_physical_attach_ioas()
     -> iommufd_device_attach()
       -> iommufd_device_do_attach()
         -> iommufd_hw_pagetable_attach()

4. Same as (2) but for child domain w/ stage 1 table.

5. Same as (3) but for child domain w/ stage 1 table.

You want the gcr3 table root point in the DTE to be update at step 5 
only, right? If so, this should be okay.

> Please look at the smmuv3 patches and try to be structurally
> similar. AMD and SMMUv3 are *very similar* in how their HW works
> excluding the viommu stuff.

Could you please point me to the series? I found one but it was really 
old. I might have missed the latest stuff.

> You also can't assume your parent is currently attached to anything.
> 
> The construction of the DTE has to be from-scratch based on the parent
> domain and the provided values in the "hwpt". Again see how smmuv3
> does this where there is one function that builds the entire DTE
> (called STE)

Ok. I'll program fields of the DTE, which are related to DMA-remapping 
with v1 and v2 table using the information in the parent and child 
domains only.

> I'm skeptical you can do this properly without also restructuring the
> DTE logic like I've mentioned before, there is a reason I did that for
> SMMUv3. :)
A device can be attached to a domain, which could be either one-level or 
nested domain. In case of nesting, the parent domain contains 
information for the stage2 (v1) table, while the child domain contains 
information for the stage1 (v2) table. For each domain, we need to keep 
track of the parent domain.

When calling set_dte_entry(), we need to check if the device is attached 
to a domain, which has parent. If so, we need to configure DTE using 
information in both domains accordingly.

I'll update the set_dte_entry() to reflect this logic for programming DTE.

>> +struct iommu_domain *amd_iommu_nested_domain_alloc(struct device *dev,
>> +						   struct iommu_hwpt_amd_v2 *hwpt)
>> +{
>> +	int ret;
>> +	struct iommu_domain *dom;
>> +	struct protection_domain *pdom;
>> +
>> +	dom = iommu_domain_alloc(dev->bus);
>> +	if (!dom)
>> +		return ERR_PTR(-ENOMEM);
> 
>
> Also no, do not allocate a normal domain and then 'wreck'
> it into a nesting domain. Refactor the allocation code to be in
> smaller chucks so you can alloc and init the memory directly here.

Good point. I'll take care of this in the drivers/iommu/amd/iommmu.c: 
do_iommu_domain_alloc().

Thanks,
Suravee

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ