[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ef259b2-121e-643e-49c2-0b65923d392d@arm.com>
Date: Thu, 8 Sep 2022 11:25:46 +0100
From: Robin Murphy <robin.murphy@....com>
To: Jason Gunthorpe <jgg@...dia.com>, Joerg Roedel <joro@...tes.org>
Cc: Nicolin Chen <nicolinc@...dia.com>, will@...nel.org,
alex.williamson@...hat.com, suravee.suthikulpanit@....com,
marcan@...can.st, sven@...npeter.dev, alyssa@...enzweig.io,
robdclark@...il.com, dwmw2@...radead.org, baolu.lu@...ux.intel.com,
mjrosato@...ux.ibm.com, gerald.schaefer@...ux.ibm.com,
orsonzhai@...il.com, baolin.wang@...ux.alibaba.com,
zhang.lyra@...il.com, thierry.reding@...il.com, vdumpa@...dia.com,
jonathanh@...dia.com, jean-philippe@...aro.org, cohuck@...hat.com,
tglx@...utronix.de, shameerali.kolothum.thodi@...wei.com,
thunder.leizhen@...wei.com, christophe.jaillet@...adoo.fr,
yangyingliang@...wei.com, jon@...id-run.com, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, asahi@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org,
linux-arm-msm@...r.kernel.org, linux-s390@...r.kernel.org,
linux-tegra@...r.kernel.org,
virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
kevin.tian@...el.com
Subject: Re: [PATCH v6 1/5] iommu: Return -EMEDIUMTYPE for incompatible domain
and device/group
On 2022-09-08 01:43, Jason Gunthorpe wrote:
> On Wed, Sep 07, 2022 at 08:41:13PM +0100, Robin Murphy wrote:
>
>>>> FWIW, we're now very close to being able to validate dev->iommu against
>>>> where the domain came from in core code, and so short-circuit ->attach_dev
>>>> entirely if they don't match.
>>>
>>> I don't think this is a long term direction. We have systems now with
>>> a number of SMMU blocks and we really are going to see a need that
>>> they share the iommu_domains so we don't have unncessary overheads
>>> from duplicated io page table memory.
>>>
>>> So ultimately I'd expect to pass the iommu_domain to the driver and
>>> the driver will decide if the page table memory it represents is
>>> compatible or not. Restricting to only the same iommu instance isn't
>>> good..
>>
>> Who said IOMMU instance?
>
> Ah, I completely misunderstood what 'dev->iommu' was referring too, OK
> I see.
>
>> Again, not what I was suggesting. In fact the nature of iommu_attach_group()
>> already rules out bogus devices getting this far, so all a driver currently
>> has to worry about is compatibility of a device that it definitely probed
>> with a domain that it definitely allocated. Therefore, from a caller's point
>> of view, if attaching to an existing domain returns -EINVAL, try another
>> domain; multiple different existing domains can be tried, and may also
>> return -EINVAL for the same or different reasons; the final attempt is to
>> allocate a fresh domain and attach to that, which should always be nominally
>> valid and *never* return -EINVAL. If any attempt returns any other error,
>> bail out down the usual "this should have worked but something went wrong"
>> path. Even if any driver did have a nonsensical "nothing went wrong, I just
>> can't attach my device to any of my domains" case, I don't think it would
>> really need distinguishing from any other general error anyway.
>
> The algorithm you described is exactly what this series does, it just
> used EMEDIUMTYPE instead of EINVAL. Changing it to EINVAL is not a
> fundamental problem, just a bit more work.
>
> Looking at Nicolin's series there is a bunch of existing errnos that
> would still need converting, ie EXDEV, EBUSY, EOPNOTSUPP, EFAULT, and
> ENXIO are all returned as codes for 'domain incompatible with device'
> in various drivers. So the patch would still look much the same, just
> changing them to EINVAL instead of EMEDIUMTYPE.
>
> That leaves the question of the remaining EINVAL's that Nicolin did
> not convert to EMEDIUMTYPE.
>
> eg in the AMD driver:
>
> if (!check_device(dev))
> return -EINVAL;
>
> iommu = rlookup_amd_iommu(dev);
> if (!iommu)
> return -EINVAL;
>
> These are all cases of 'something is really wrong with the device or
> iommu, everything will fail'. Other drivers are using ENODEV for this
> already, so we'd probably have an additional patch changing various
> places like that to ENODEV.
>
> This mixture of error codes is the basic reason why a new code was
> used, because none of the existing codes are used with any
> consistency.
>
> But OK, I'm on board, lets use more common errnos with specific
> meaning, that can be documented in a comment someplace:
> ENOMEM - out of memory
> ENODEV - no domain can attach, device or iommu is messed up
> EINVAL - the domain is incompatible with the device
> <others> - Same behavior as ENODEV, use is discouraged.
>
> I think achieving consistency of error codes is a generally desirable
> goal, it makes the error code actually useful.
>
> Joerg this is a good bit of work, will you be OK with it?
>
>> Thus as long as we can maintain that basic guarantee that attaching
>> a group to a newly allocated domain can only ever fail for resource
>> allocation reasons and not some spurious "incompatibility", then we
>> don't need any obscure trickery, and a single, clear, error code is
>> in fact enough to say all that needs to be said.
>
> As above, this is not the case, drivers do seem to have error paths
> that are unconditional on the domain. Perhaps they are just protective
> assertions and never happen.
Right, that's the gist of what I was getting at - I think it's worth
putting in the effort to audit and fix the drivers so that that *can* be
the case, then we can have a meaningful error API with standard codes
effectively for free, rather than just sighing at the existing mess and
building a slightly esoteric special case on top.
Case in point, the AMD checks quoted above are pointless, since it
checks the same things in ->probe_device, and if that fails then the
device won't get a group so there's no way for it to even reach
->attach_dev any more. I'm sure there's a *lot* of cruft that can be
cleared out now that per-device and per-domain ops give us this kind of
inherent robustness.
Cheers,
Robin.
> Regardless, it doesn't matter. If they return ENODEV or EINVAL the
> VFIO side algorithm will continue to work fine, it just does alot more
> work if EINVAL is permanently returned.
>
> Thanks,
> Jason
Powered by blists - more mailing lists