linux-kernel - Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3cf97e3c-c8d9-4282-a8ca-e4c1ea383dbd@arm.com>
Date: Tue, 23 Apr 2024 12:26:41 +0100
From: Robin Murphy <robin.murphy@....com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: joro@...tes.org, will@...nel.org, ewagner12@...il.com,
 suravee.suthikulpanit@....com, vashegde@....com, iommu@...ts.linux.dev,
 linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted
 devices

On 16/04/2024 4:29 pm, Jason Gunthorpe wrote:
> On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
>> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
>> precedence over whatever a driver's def_domain_type may have wanted to
>> say. This was intentionally handled in core code since 3 years prior,
>> to avoid drivers poking at the details of what is essentially a policy
>> between the PCI core and the IOMMU core. Now, though, we go to the
>> length of evaluating both constraints to check for any conflict, and if
>> so throw our toys out of the pram and refuse to handle the device at
>> all. Regardless of any intent, in practice this leaves the device, and
>> potentially the rest of its group or even the whole IOMMU, in a largely
>> undetermined state, which at worst may render the whole system
>> unusable.
> 
> For systems supporting untrusted device security the translation must
> be BLOCKED at this point.

So why is that not enforced then? Simply hoping that calling 
ops->release_device() or returning an error from iommu_device_register() 
might do the right thing is no guarantee. Furthermore I'm pretty sure 
we're still letting an untrusted device be hotplugged into an existing 
group without any checks at all.

Meanwhile if we go back to letting untrusted take priority and attach 
the device to an empty DMA domain, oh hey look it's blocked, and what's 
more we also retain control of the IOMMU to keep it that way. Problem 
solved. You intentionally changed the code to be less effective at what 
you say it should be doing, so once again I'm left a little puzzled as 
to what point you're trying to argue here.

>> Unfortunately it turns out that this is a realistic situation to run
>> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
>> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
>> an identity default domain for PASIDs to be usable, and thus sets a
>> def_domain_type override based on PASID capability.
> 
> The majority of places implementing def_domain_type are using it as a
> statement of HW capability that should not be ignored by the core code:
> 
>   - DART
>     * system page size is too small, can't map IOPTEs, force identity

Not a hardware limitation at all, it supports paging just fine, and 
iommu-dma *could* in principle work with larger pages with a bit of 
effort (some proof-of-concept patches were posted some time ago). This 
is entirely the driver hacking around the legacy of iommu-dma and core 
code not expecting the situation and thus not detecting it or handling 
it gracefully. In fact I should be able to sort that out relatively 
soon, once my other stuff lands and I'm able to start pulling 
iommu_dma_init_domain() apart.

>     * iommu does not support IDENTITY at all, force paging
>   - tegra: Device quirks mean paging and DMA API doesn't work

Hmm, I don't recall any specific device concerns. What I do know is that 
historically this driver has always wanted to prevent the ARM DMA domain 
or older versions of iommu-dma (prior to iommu_get_dma_domain()) from 
getting in the way of what the host1x drivers expect, so if there is any 
actual hardware policy here, it's very much implicitly hanging on the 
coat-tails of software policy.

>   - amd: The driver can't support PAGING when in SNP mode
>   - SMMU: The driver can't support paging when in legacy binding mode and
>           paging domain allocation fails as well

Quite obviously nothing to do with hardware. That's there solely because 
using the old binding prevents us from reasonably being able to order 
binding the IOMMU driver against the client drivers, so we can't risk 
having iommu-dma come up and change the DMA ops underneath an 
already-active device.

>   - qcom: Looks like these devices have some iommu bypass bus in their
>           HW and paging doesn't work

I believe that is true for some of the modem stuff, however again 
there's also plenty of software policy in there too; some of it the same 
thing about default domains getting in the way of how the GPU/display 
drivers want to use dma-direct for cache maintenance underneath their 
own unmanaged domains, one is completely abusing the domain type in a 
devious hack which just happens to make some behaviour fall out of the 
rest of the driver to satisfy an unrelated hardware/firmware constraint 
(which is still *supposed* to be only a temporary fix pending a proper 
solution, ha...)

>   - SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk
> 
> For these force overriding the driver knowledge will either result in
> domain allocate/attach failure or a broken DMA environment anyhow.
> 
> The AMD PASID thing is actually unique because the driver can still
> fully support PAGING, despite it wrongly telling the core code that it
> can't.
> 
> This is happening because it is all just a hack to work around the
> incomplete SW implementation in the AMD driver.

Not really, it's shown up because the core code took an illogical step 
backwards. Without significant surgery to IOMMU and/or driver core code, 
the best and most reasonable thing to do at the moment is still the one 
which happens to work out OK for the current AMD behaviour. Yes, if we 
*had* done all the work to make the core code super-robust then the 
issue could have been the GPU driver gracefully failing rather than 
wedging the whole system, and the remaining blame could then be fairly 
laid on amd_iommu_def_domain_type(), but that is not the case and cannot 
be claimed to be.

Per our API expectations, a driver's def_domain_type can go one of three 
ways; do nothing, request IOMMU_DOMAIN_DMA, or request 
IOMMU_DOMAIN_IDENTITY. For the first two it's clear that untrusted 
forcing IOMMU_DOMAIN_DMA without even asking is perfectly fine. In the 
third case, we still definitely want untrusted to take precedence, since 
our policy is that the device not having access to things it shouldn't 
is more important that it having to access things it should. Sure, in 
the almost-impossible case that we did have an external device which 
genuinely couldn't handle translation, then the user may allow a driver 
to bind, which ends up not working and causing IOMMU faults, however 
that would be equally true of any kind of blocking notion we could 
implement right now, so there's no appreciable loss of functionality. 
However, the overwhelming fact is that the few devices genuinely 
requiring IDENTITY overrides are on-chip integrated things, any of which 
suddenly turning up on external ports would be highly suspect, so that's 
actually a big argument in *favour* of not believing def_domain_type 
either in such cases.

In fact the more I think about it, I could swear I've had this 
discussion and worked through this reasoning before, possibly back when 
we first introduced it?

> When the AMD driver is
> completed its def_domain_type should be removed entirely.
> 
> Since actual PASID AMD attach isn't implemented yet we could just
> remove that check from def_domain_type as an RC fix. Vasant can sort
> it out properly by disabling pasid support on untrusted devices until
> the DTE logic is fully completed.
> 
>> In general, restoring the old behaviour of forcing translation will not
>> make that device's operation any more broken than leaving it potentially
>> blocked or subject to the rest of a group's translations would, nor will
>> it be any less safe than leaving it potentially bypassed or subject to
>> the rest of a group's translations would, so do that, and let eGPUs work
>> again.
> 
> Well, this is true, since we can't handle the probe error it doesn't
> matter what we do.

OK, so after all that you do in fact agree? In that case, why are we 
still mucking about proposing hacks on top of hacks in the AMD driver 
rather than just fixing the regression sensibly?

Thanks,
Robin.