linux-kernel - Re: [PATCH v2 4/8] iommu: Factor out some helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5498680f-df9e-287c-03f1-020848ba9b37@arm.com>
Date:   Mon, 30 Jan 2023 23:33:54 +0000
From:   Robin Murphy <robin.murphy@....com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     joro@...tes.org, will@...nel.org, iommu@...ts.linux.dev,
        linux-kernel@...r.kernel.org, hch@....de, baolu.lu@...ux.intel.com
Subject: Re: [PATCH v2 4/8] iommu: Factor out some helpers

On 2023-01-30 18:20, Jason Gunthorpe wrote:
> On Mon, Jan 30, 2023 at 06:05:12PM +0000, Robin Murphy wrote:
>>>     * Use a function instead of an array here because the domain-type is a
>>>     * bit-field, so an array would waste memory.
>>> @@ -338,7 +352,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
>>>    	dev->iommu->iommu_dev = iommu_dev;
>>>    	dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);
>>> -	group = iommu_group_get_for_dev(dev);
>>> +	group = iommu_group_get_for_dev(dev, ops);
>>
>> Why? We've literally just assigned dev->iommu->iommu_dev 2 lines above; it
>> is not allowed to have invalid ops. Plus in future they may potentially be a
>> different set of device ops from the ones we initially found to provide
>> ->probe_device - in that case we definitely want group_get_for_dev to use
>> the proper instance ops. This is the only place it is (and should be)
>> called, so I don't see any problem.
> 
> Sure, but IMHO it was clearer to pass the ops down than to obtain it
> again. But maybe this could be tidied in a different way.

I disagree. In what we have now, every operation which calls into a 
driver consistently uses the instance ops (or domain ops), except for 
->probe_device for obvious reasons. Making ->device_group look special 
for no technical reason is less consistent, and as such less clear. It 
would be the only place in the kernel where ops are called from a 
function argument, and to anyone unfamiliar looking at the code and 
wondering why that is, the answer "because Jason thinks it looks better" 
is not going to be obvious at all.

>>>    	if (IS_ERR(group)) {
>>>    		ret = PTR_ERR(group);
>>>    		goto out_release;
>>> @@ -414,7 +428,8 @@ int iommu_probe_device(struct device *dev)
>>>    	mutex_unlock(&group->mutex);
>>>    	iommu_group_put(group);
>>> -	ops = dev_iommu_ops(dev);
>>> +	/* __iommu_probe_device() set the ops */
>>> +	ops = dev_iommu_ops_safe(dev);
>>>    	if (ops->probe_finalize)
>>>    		ops->probe_finalize(dev);
>>> @@ -430,14 +445,13 @@ int iommu_probe_device(struct device *dev)
>>>    void iommu_release_device(struct device *dev)
>>>    {
>>> -	const struct iommu_ops *ops;
>>> +	const struct iommu_ops *ops = dev_iommu_ops(dev);
>>
>> This is just moving an existing check around.
> 
> The point is to have one check. If you need to get the ops and you
> don't know the state of dev then you calll dev_iommu_ops() and check
> for NULL. Simple and consistent.
> 
>>> -     if (!dev->iommu)
>>> +     if (!ops)
>>>                return;
> 
> As you pointed out below this isn't even coded right since iommu can
> be non-NULL.

Oh, indeed that is technically a bug, although it's pretty much 
impossible to hit in practice because there's no real device hotplug on 
DT-based systems using fwspec - "dynamic" buses like PCI SR-IOV or 
fsl-mc aren't going to be discovered at all until the relevant IOMMU 
associated with the main controller device is ready, thus no removable 
child device would ever be in the "half-probed" state. I missed this one 
since I was looking for the dev->iommu->iommu_dev pattern - since it 
looks like this series is headed for a v3 next cycle, I've added this 
site to $SUBJECT locally.

>>> @@ -2620,7 +2626,11 @@ void iommu_get_resv_regions(struct device *dev, struct list_head *list)
>>>    {
>>>    	const struct iommu_ops *ops = dev_iommu_ops(dev);
>>> -	if (ops->get_resv_regions)
>>> +	/*
>>> +	 * Non-API groups still expose reserved_regions in sysfs, so filter out
>>> +	 * calls that get here that way.
>>> +	 */
>>> +	if (ops && ops->get_resv_regions)
>>
>> This is just moving the existing check from the public interface to
>> pointlessly impose it on the internal call path as well.
> 
> Who cares? We don't need to micro-optimize this stuff. The fact there
> are a few bugs already is proof enough of that.

"a few"? So far there's only one, and it's not even the class of bug 
you're trying to address, because there _is_ an explicit validity check 
already, it just has an oversight in it.

This isn't micro-optimisation, it's consistency: we have validity checks 
close to the entrypoints of public interfaces where they are required. 
On internal paths where they are not required, we do not sometimes have 
checks and sometimes not, leaving people to wonder what the requirements 
actually are - in fact someone went so far as to call such patterns 
"confusing" and "overzealous" back when dev_iommu_ops() was reviewed. It 
was agreed that the lack of a check where ops are retrieved clearly 
documents where they are expected to be valid.

>>>    		ops->get_resv_regions(dev, list);
>>>    }
>>> @@ -2979,6 +2989,11 @@ static ssize_t iommu_group_store_type(struct iommu_group *group,
>>>    	/* Since group has only one device */
>>>    	grp_dev = list_first_entry(&group->devices, struct group_device, list);
>>>    	dev = grp_dev->dev;
>>> +	if (!dev_iommu_ops(dev)) {
>>> +		/* The group doesn't have an iommu driver attached */
>>> +		mutex_unlock(&group->mutex);
>>> +		return -EINVAL;
>>> +	}
>>
>> Withot any ops, group->default_domain will be NULL so we could never even
>> get here.
> 
> Fair enough, deserves a comment

Great big function specifically for changing the default domain of a 
group... right at the top, literally the second thing it does on entry 
is check that the group and default domain are valid, and return if not. 
If that isn't sufficiently clear, I'm not sure what to say :/

>>>    	get_device(dev);
>>>    	/*
>>> @@ -3301,7 +3316,7 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
>>>    	const struct iommu_ops *ops;
>>>    	list_for_each_entry(device, &group->devices, list) {
>>> -		ops = dev_iommu_ops(device->dev);
>>> +		ops = dev_iommu_ops_safe(device->dev);
>>>    		ops->remove_dev_pasid(device->dev, pasid);
>>>    	}
>>>    }
>>> @@ -3413,6 +3428,9 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
>>>    	const struct iommu_ops *ops = dev_iommu_ops(dev);
>>>    	struct iommu_domain *domain;
>>> +	if (!ops)
>>> +		return NULL;
>>
>> Anyone who incorrectly calls sva_domain_alloc() directly without having
>> successfully called iommu_dev_enable_feature() first deserves to crash.
> 
> Lets not build assumptions like this into the code please. That
> iommu_dev_enable_feature() thing is on my hitlist too :(

Huh? The whole public API is full of assumptions that callers aren't 
doing nonsensical things. Pass a bogus group or domain pointer to 
iommu_attach_group()? Boom! Pass a bogus domain pointer to iommu_map()? 
Boom! AFAICT the only thing that should be calling 
iommu_sva_domain_alloc() at all at the moment is 
iommu_sva_bind_device(), so it's effectively an internal interface where 
bogus device pointers really shouldn't be expected at all.

Sure, if you change the interface so that random drivers are free to 
allocate SVA domains directly and feed them to iommu_attach_group(), and 
checks are warranted in different places, then add them then.

>> (Incidentally, you've missed enable/disable_feature here.)
> 
> Yes, because they don't call dev_iommu_ops for some reason. It should
> be fixed too.

It is, in the patch you're replying to.

>>> +/* May return NULL if the device has no iommu driver */
>>>    static inline const struct iommu_ops *dev_iommu_ops(struct device *dev)
>>>    {
>>> -	/*
>>> -	 * Assume that valid ops must be installed if iommu_probe_device()
>>> -	 * has succeeded. The device ops are essentially for internal use
>>> -	 * within the IOMMU subsystem itself, so we should be able to trust
>>> -	 * ourselves not to misuse the helper.
>>> -	 */
>>> +	if (!dev->iommu)
>>> +		return NULL;
>>>    	return dev->iommu->iommu_dev->ops;
>>
>> This is actively broken, since dev->iommu may be non-NULL before
>> dev->iommu->iommu_dev is set (a fwspec will always be set before the
>> instyance is registered, and a device may potentially hang around in that
>> state for evertt if the relevant IOMMU instance never appears).
> 
> Sure, I missed a NULL
> 
>> Sorry, I don't think any of this makes sense :/
> 
> The point is to be more consistent and not just blindly add more
> helpers. If you need ops then ask for the ops. If you aren't sure the
> dev has ops then check ops for NULL. It is pretty simple.

I'm not "blindly" adding more helpers, I'm ratifying clear common 
elements of the code and logic that we already have, to make them that 
much easier to replicate further. And I'm still no less puzzled by this 
thread... adding a new helper to consolidate having one thing in some 
places and another in others so offends your sensibilities that what 
you'd much rather do instead is add a new helper to have one thing in 
some places and another thing in others? There is a logical consistency 
to the current code already, which I assume is sufficiently clear to the 
majority of us because it's what made it through community review and 
what we've all been working on top of for the last year. "If you need 
ops then ask for the ops. If you aren't sure the dev has ops then check 
for valid ops first." Just as simple.

FWIW, personally I find up-front availability checks perfectly intuitive 
and natural, because they can easily be imagined as a real-life interaction:

"Do you have any coffee?"
"No, sorry."

vs.

"Please give me a cup of coffee."
<hands over empty cup>
"Is this coffee?"
"No."

One could possibly even argue that a separate up-front check also 
visibly implies that there are no concurrency or TOCTOU considerations 
expected, which for dev_iommu_ops() there should certainly not be.

Please understand that I'm not going to this length just to be 
objectionable; this is me genuinely being unable to rationalise your 
argument and spending my entire evening on a deep dive into my own 
thought process to lay it out and check for any obvious errors.

Thanks,
Robin.