linux-kernel - Re: [PATCH v4 4/7] iommu: Switch __iommu_domain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <08de017f-80b1-8374-ab0e-2eaebe037b86@arm.com>
Date:   Wed, 4 Oct 2023 18:23:05 +0100
From:   Robin Murphy <robin.murphy@....com>
To:     Jason Gunthorpe <jgg@...dia.com>
Cc:     joro@...tes.org, will@...nel.org, iommu@...ts.linux.dev,
        baolu.lu@...ux.intel.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 4/7] iommu: Switch __iommu_domain_alloc() to device ops

On 02/10/2023 8:36 pm, Jason Gunthorpe wrote:
> On Mon, Oct 02, 2023 at 08:02:23PM +0100, Robin Murphy wrote:
>> On 02/10/2023 3:16 pm, Jason Gunthorpe wrote:
>>> On Mon, Oct 02, 2023 at 02:49:12PM +0100, Robin Murphy wrote:
>>>> @@ -2120,20 +2120,30 @@ static struct iommu_domain *__iommu_domain_alloc(const struct iommu_ops *ops,
>>>>    	return domain;
>>>>    }
>>>> -static struct iommu_domain *
>>>> -__iommu_group_domain_alloc(struct iommu_group *group, unsigned int type)
>>>> +static int __iommu_domain_alloc_dev(struct device *dev, void *data)
>>>>    {
>>>
>>> Why?
>>
>> Because out of 3 callsites, two were in a context which now needed to
>> make the iommu_group_first_dev() call itself anyway,
> 
> I don't see it. Why not just do this?
> 
> static int __iommu_domain_alloc_dev(struct device *dev, void *data)
> {
> 	/* FIXME: This is not correctly locked */
> 	struct iommu_group *group = iommu_group_get(dev);
> 	struct group **alloc_group = data;
> 
> 	if (!group)
> 		return 0;
> 
> 	mutex_lock(&group->mutex);
> 	/* Theoretically we could have raced against release */
> 	if (list_empty(&group->devices)) {
> 		mutex_unlock(&group->mutex);
> 		iommu_group_put(group);
> 		return 0;
> 	}
> 
> 	*alloc_group = group;
> 	return 1;
> }
> 
> struct iommu_domain *iommu_domain_alloc(const struct bus_type *bus)
> {
> 	struct iommu_group *alloc_group;
> 	struct iommu_domain *dom;
> 
> 	/* Only one driver is supported with this interface */
> 	if (WARN_ON(iommu_num_drivers > 1))
> 		return NULL;
> 
> 	bus_for_each_dev(bus, NULL, &alloc_group, __iommu_domain_alloc_dev);
> 	if (!alloc_group)
> 		return NULL;
> 	dom = __iommu_group_domain_alloc(alloc_group, IOMMU_DOMAIN_UNMANAGED);
> 	mutex_unlock(&alloc_group->mutex);
> 	iommu_group_put(alloc_group);
> 	return dom;
> }
> EXPORT_SYMBOL_GPL(iommu_domain_alloc);
> 
> (and ++/-- iommu_num_drivers in iommu_device_register)
> 
> One patch, it's pretty easy???

Sure, that would then leave two callsites for 
__iommu_group_domain_alloc(), which might in turn justify leaving it 
factored out, but the endgame is to get a device to resolve 
dev_iommu_ops(), so if we're starting with a suitable device in hand 
then using its group to immediately re-derive a device again when we 
*could* trivially pass the device straight through is silly and 
overcomplicated.

However it turns out that we can't easily touch the group here at all 
without then needing dependent changes in VFIO, so rather than invite 
any more scope creep at this point I'm going to respin this as a purely 
sideways step that sticks to resolving ops from a bus, and save any 
further device/instance niceties for step 2 when I have to touch all the 
external callers anyway.

>> Um... Good? I mean in 3/4 cases it's literally the exact same code just
>> factored out again, while the one I've added picks some arbitrary device
>> in a different way.
> 
> Sure, but the whole point was to make it obvious that there was no
> direct linkage from the various dev parameters we have in places and
> what dev will be passed by the driver.

But that's the argument that makes no sense - it happens to be the case 
in the current code that all default domain allocation sites are already 
buried in several layers worth of group-based machinery, but that in 
itself holds no significance to the allocation process. I maintain that 
it is simply misleading to pretend that (taking a little artistic 
license with types) a round trip through 
dev->iommu_group->devices->next->dev->iommu holds any significance over 
just dev->iommu. There's hardly anything "obvious" about it either - 
it's buried in core code that driver authors have little reason to even 
look at.

If I'm implementing a driver, I'm going to see the signature of the op I 
need to implement, which tells me to allocate a domain and gives me a 
device to allocate it for, and for reference I'm going to look at how 
other drivers implement that op. I would not start by picking through 
the core code for the callers of the caller of that op, to see some 
weird code that looks redundant when in practice I observe it resolving 
a device back to itself, and think "ah yes, now I see the undocumented 
significance of an idea that only existed in Jason's head". If you don't 
want an IOMMU driver to be able to think the particular device is 
important, don't pass a bloody device! If the only intended purpose is 
for the driver to resolve dev->iommu instance data, then have the core 
do that and just pass the dev_iommu or iommu_dev directly; ambiguity solved.

If one is expected to look at subtleties 2 or 3 levels down the 
callchain of an API in order to understand how to implement that API 
correctly, that is *a badly designed API*, and piling on more 
"correctness theatre" code to attempt to highlight the bad design does 
not address the real problem there.

> Everything passes through
> __iommu_group_domain_alloc() and at the end of the day that should be
> the only allocation API.

But *why* should it? What's inherently more special about a group vs. a 
device or an IOMMU instance (especially with those being the things 
drivers actually deal with in relation to domains)?

If you've noticed, between here and patch #3 I already end up 
effectively enforcing that devices in the same group must have identical 
device ops - that's arguably an even stronger requirement than we need, 
but it fits the current drivers so we may as well not relax it until 
proven necessary. So AFAICS the problem you're desperate to address is a 
theoretical one of a driver author going out of their way to implement a 
single .domain_alloc_paging implementation badly, in a way which would 
only affect the behaviour of that driver, and should be easy to spot in 
patch review anyway. Any complexity in core code in an attempt to make 
any difference to that seems about as worthwhile as trying to hold back 
the tide.

Thanks,
Robin.