[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3cf97e3c-c8d9-4282-a8ca-e4c1ea383dbd@arm.com>
Date: Tue, 23 Apr 2024 12:26:41 +0100
From: Robin Murphy <robin.murphy@....com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: joro@...tes.org, will@...nel.org, ewagner12@...il.com,
suravee.suthikulpanit@....com, vashegde@....com, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted
devices
On 16/04/2024 4:29 pm, Jason Gunthorpe wrote:
> On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
>> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
>> precedence over whatever a driver's def_domain_type may have wanted to
>> say. This was intentionally handled in core code since 3 years prior,
>> to avoid drivers poking at the details of what is essentially a policy
>> between the PCI core and the IOMMU core. Now, though, we go to the
>> length of evaluating both constraints to check for any conflict, and if
>> so throw our toys out of the pram and refuse to handle the device at
>> all. Regardless of any intent, in practice this leaves the device, and
>> potentially the rest of its group or even the whole IOMMU, in a largely
>> undetermined state, which at worst may render the whole system
>> unusable.
>
> For systems supporting untrusted device security the translation must
> be BLOCKED at this point.
So why is that not enforced then? Simply hoping that calling
ops->release_device() or returning an error from iommu_device_register()
might do the right thing is no guarantee. Furthermore I'm pretty sure
we're still letting an untrusted device be hotplugged into an existing
group without any checks at all.
Meanwhile if we go back to letting untrusted take priority and attach
the device to an empty DMA domain, oh hey look it's blocked, and what's
more we also retain control of the IOMMU to keep it that way. Problem
solved. You intentionally changed the code to be less effective at what
you say it should be doing, so once again I'm left a little puzzled as
to what point you're trying to argue here.
>> Unfortunately it turns out that this is a realistic situation to run
>> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
>> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
>> an identity default domain for PASIDs to be usable, and thus sets a
>> def_domain_type override based on PASID capability.
>
> The majority of places implementing def_domain_type are using it as a
> statement of HW capability that should not be ignored by the core code:
>
> - DART
> * system page size is too small, can't map IOPTEs, force identity
Not a hardware limitation at all, it supports paging just fine, and
iommu-dma *could* in principle work with larger pages with a bit of
effort (some proof-of-concept patches were posted some time ago). This
is entirely the driver hacking around the legacy of iommu-dma and core
code not expecting the situation and thus not detecting it or handling
it gracefully. In fact I should be able to sort that out relatively
soon, once my other stuff lands and I'm able to start pulling
iommu_dma_init_domain() apart.
> * iommu does not support IDENTITY at all, force paging
> - tegra: Device quirks mean paging and DMA API doesn't work
Hmm, I don't recall any specific device concerns. What I do know is that
historically this driver has always wanted to prevent the ARM DMA domain
or older versions of iommu-dma (prior to iommu_get_dma_domain()) from
getting in the way of what the host1x drivers expect, so if there is any
actual hardware policy here, it's very much implicitly hanging on the
coat-tails of software policy.
> - amd: The driver can't support PAGING when in SNP mode
> - SMMU: The driver can't support paging when in legacy binding mode and
> paging domain allocation fails as well
Quite obviously nothing to do with hardware. That's there solely because
using the old binding prevents us from reasonably being able to order
binding the IOMMU driver against the client drivers, so we can't risk
having iommu-dma come up and change the DMA ops underneath an
already-active device.
> - qcom: Looks like these devices have some iommu bypass bus in their
> HW and paging doesn't work
I believe that is true for some of the modem stuff, however again
there's also plenty of software policy in there too; some of it the same
thing about default domains getting in the way of how the GPU/display
drivers want to use dma-direct for cache maintenance underneath their
own unmanaged domains, one is completely abusing the domain type in a
devious hack which just happens to make some behaviour fall out of the
rest of the driver to satisfy an unrelated hardware/firmware constraint
(which is still *supposed* to be only a temporary fix pending a proper
solution, ha...)
> - SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk
>
> For these force overriding the driver knowledge will either result in
> domain allocate/attach failure or a broken DMA environment anyhow.
>
> The AMD PASID thing is actually unique because the driver can still
> fully support PAGING, despite it wrongly telling the core code that it
> can't.
>
> This is happening because it is all just a hack to work around the
> incomplete SW implementation in the AMD driver.
Not really, it's shown up because the core code took an illogical step
backwards. Without significant surgery to IOMMU and/or driver core code,
the best and most reasonable thing to do at the moment is still the one
which happens to work out OK for the current AMD behaviour. Yes, if we
*had* done all the work to make the core code super-robust then the
issue could have been the GPU driver gracefully failing rather than
wedging the whole system, and the remaining blame could then be fairly
laid on amd_iommu_def_domain_type(), but that is not the case and cannot
be claimed to be.
Per our API expectations, a driver's def_domain_type can go one of three
ways; do nothing, request IOMMU_DOMAIN_DMA, or request
IOMMU_DOMAIN_IDENTITY. For the first two it's clear that untrusted
forcing IOMMU_DOMAIN_DMA without even asking is perfectly fine. In the
third case, we still definitely want untrusted to take precedence, since
our policy is that the device not having access to things it shouldn't
is more important that it having to access things it should. Sure, in
the almost-impossible case that we did have an external device which
genuinely couldn't handle translation, then the user may allow a driver
to bind, which ends up not working and causing IOMMU faults, however
that would be equally true of any kind of blocking notion we could
implement right now, so there's no appreciable loss of functionality.
However, the overwhelming fact is that the few devices genuinely
requiring IDENTITY overrides are on-chip integrated things, any of which
suddenly turning up on external ports would be highly suspect, so that's
actually a big argument in *favour* of not believing def_domain_type
either in such cases.
In fact the more I think about it, I could swear I've had this
discussion and worked through this reasoning before, possibly back when
we first introduced it?
> When the AMD driver is
> completed its def_domain_type should be removed entirely.
>
> Since actual PASID AMD attach isn't implemented yet we could just
> remove that check from def_domain_type as an RC fix. Vasant can sort
> it out properly by disabling pasid support on untrusted devices until
> the DTE logic is fully completed.
>
>> In general, restoring the old behaviour of forcing translation will not
>> make that device's operation any more broken than leaving it potentially
>> blocked or subject to the rest of a group's translations would, nor will
>> it be any less safe than leaving it potentially bypassed or subject to
>> the rest of a group's translations would, so do that, and let eGPUs work
>> again.
>
> Well, this is true, since we can't handle the probe error it doesn't
> matter what we do.
OK, so after all that you do in fact agree? In that case, why are we
still mucking about proposing hacks on top of hacks in the AMD driver
rather than just fixing the regression sensibly?
Thanks,
Robin.
Powered by blists - more mailing lists