lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7a4777dd-2359-4bcd-839e-c2d0b5f6be14@intel.com>
Date: Thu, 21 Mar 2024 19:26:41 +0800
From: Yi Liu <yi.l.liu@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: Baolu Lu <baolu.lu@...ux.intel.com>, "Tian, Kevin" <kevin.tian@...el.com>,
	"joro@...tes.org" <joro@...tes.org>, "alex.williamson@...hat.com"
	<alex.williamson@...hat.com>, "robin.murphy@....com" <robin.murphy@....com>,
	"cohuck@...hat.com" <cohuck@...hat.com>, "eric.auger@...hat.com"
	<eric.auger@...hat.com>, "nicolinc@...dia.com" <nicolinc@...dia.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "mjrosato@...ux.ibm.com"
	<mjrosato@...ux.ibm.com>, "chao.p.peng@...ux.intel.com"
	<chao.p.peng@...ux.intel.com>, "yi.y.sun@...ux.intel.com"
	<yi.y.sun@...ux.intel.com>, "peterx@...hat.com" <peterx@...hat.com>,
	"jasowang@...hat.com" <jasowang@...hat.com>,
	"shameerali.kolothum.thodi@...wei.com"
	<shameerali.kolothum.thodi@...wei.com>, "lulu@...hat.com" <lulu@...hat.com>,
	"suravee.suthikulpanit@....com" <suravee.suthikulpanit@....com>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>, "Duan,
 Zhenzhong" <zhenzhong.duan@...el.com>, "joao.m.martins@...cle.com"
	<joao.m.martins@...cle.com>, "Zeng, Xin" <xin.zeng@...el.com>, "Zhao, Yan Y"
	<yan.y.zhao@...el.com>
Subject: Re: [PATCH 1/8] iommu: Introduce a replace API for device pasid

On 2024/3/21 14:16, Yi Liu wrote:
> On 2024/3/20 20:38, Jason Gunthorpe wrote:
>> On Tue, Mar 19, 2024 at 03:29:39PM +0800, Yi Liu wrote:
>>> On 2024/3/19 00:52, Jason Gunthorpe wrote:
>>>> On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
>>>>
>>>>> yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
>>>>> and pasid_array update is under the group->lock, so update it should be
>>>>> fine to adjust the order to update pasid_array after set_dev_pasid 
>>>>> returns.
>>>>
>>>> Yes, it makes some sense
>>>>
>>>> But, also I would like it very much if we just have the core pass in
>>>> the actual old domain as a an addition function argument.
>>>
>>> ok, this works too. For normal attach, just pass in a NULL old domain.
>>>
>>>> I think we have some small mistakes in multi-device group error
>>>> unwinding for remove because the global xarray can't isn't actually
>>>> going to be correct in all scenarios.
>>>
>>> do you mean the __iommu_remove_group_pasid() call in the below function?
>>> Currently, it is called when __iommu_set_group_pasid() failed. However,
>>> __iommu_set_group_pasid() may need to do remove itself when error happens,
>>> so the helper can be more self-contained. Or you mean something else?
>>
>> Yes..
>>
>>> int iommu_attach_device_pasid(struct iommu_domain *domain,
>>>                   struct device *dev, ioasid_t pasid)
>>> {
>>>     /* Caller must be a probed driver on dev */
>>>     struct iommu_group *group = dev->iommu_group;
>>>     void *curr;
>>>     int ret;
>>>
>>>     if (!domain->ops->set_dev_pasid)
>>>         return -EOPNOTSUPP;
>>>
>>>     if (!group)
>>>         return -ENODEV;
>>>
>>>     if (!dev_has_iommu(dev) || dev_iommu_ops(dev) != domain->owner)
>>>         return -EINVAL;
>>>
>>>     mutex_lock(&group->mutex);
>>>     curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, 
>>> GFP_KERNEL);
>>>     if (curr) {
>>>         ret = xa_err(curr) ? : -EBUSY;
>>>         goto out_unlock;
>>>     }
>>>
>>>     ret = __iommu_set_group_pasid(domain, group, pasid);
>>
>> So here we have the xa set to the new domain
> 
> aha, yes. The underlying driver won't be able to get the correct domain
> in the remove_dev_pasid callback.
> 
>>>     if (ret) {
>>>         __iommu_remove_group_pasid(group, pasid);
>>
>> And here we still have it set to the new domain even though some of
>> the devices within the group failed to attach. The logic needs to be
>> more like the main domain attach path where iterate and then undo only
>> what we did
> 
> yes, the correct way is to undo what have been done before the fail
> device. However, I somehow remember that pasid capability is only
> available when the group is singleton. So iterate all devices of the
> devices just means one device in fact. If this is true, then the
> current code is fine although a bit confusing.
> 
>> And the whole thing is easier to reason about if an input argument
>> specifies the current attached domain instead of having the driver
>> read it from the xarray.
> 
> yep, will correct it as a fix patch.

Hi Jason,

It appears there are two solutions here.

First, only undo the devices that have set_dev_pasid successfully in
the __iommu_set_group_pasid(), so the problematic
__iommu_remove_group_pasid() call at line 3378 [1] would go away.
This also makes the helper more self-contained. Draft patch in [2]

Second, pass in the domain to remove_dev_pasid(). Draft patch in [3]

Either of the above two should be able to solve the mistake you mentioned.
BTW. They are orthogonal, so it's also possible to apply both of them.
Which one is your preference then?

[1] 
https://github.com/torvalds/linux/blob/b3603fcb79b1036acae10602bffc4855a4b9af80/drivers/iommu/iommu.c#L3378
[2] 
https://github.com/yiliu1765/iommufd/commit/bb83270767ab460978a3452750f5e032b43e59bf
[3] 
https://github.com/yiliu1765/iommufd/commit/bd270b7b6619b8ba35dfaf9870df8728e370163f

Regards,
Yi Liu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ