lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240424130457.GF231144@ziepe.ca>
Date: Wed, 24 Apr 2024 10:04:57 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Robin Murphy <robin.murphy@....com>
Cc: joro@...tes.org, will@...nel.org, ewagner12@...il.com,
	suravee.suthikulpanit@....com, vashegde@....com,
	iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
	regressions@...ts.linux.dev
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted
 devices

On Tue, Apr 23, 2024 at 12:26:41PM +0100, Robin Murphy wrote:
> On 16/04/2024 4:29 pm, Jason Gunthorpe wrote:
> > On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
> > > Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
> > > precedence over whatever a driver's def_domain_type may have wanted to
> > > say. This was intentionally handled in core code since 3 years prior,
> > > to avoid drivers poking at the details of what is essentially a policy
> > > between the PCI core and the IOMMU core. Now, though, we go to the
> > > length of evaluating both constraints to check for any conflict, and if
> > > so throw our toys out of the pram and refuse to handle the device at
> > > all. Regardless of any intent, in practice this leaves the device, and
> > > potentially the rest of its group or even the whole IOMMU, in a largely
> > > undetermined state, which at worst may render the whole system
> > > unusable.
> > 
> > For systems supporting untrusted device security the translation must
> > be BLOCKED at this point.
> 
> So why is that not enforced then? 

I'm not sure what you mean? At this point the core code has no idea
what translation the iommu driver setup when it booted the HW..

I mean if a iommu driver is going to support untrusted then at no
point can it be doing IDENTITY while the untrusted link is up.

> Simply hoping that calling ops->release_device() or returning an
> error from iommu_device_register() might do the right thing is no
> guarantee.

The systems that support untrusted must put things back to BLOCKED in
their release_device() to be ready for the next hotplug of an
untrusted device.

> Furthermore I'm pretty sure we're still letting an
> untrusted device be hotplugged into an existing group without any
> checks at all.

Yes, combining trusted and untrusted in the same group, regardless of
domain type, is fundamentally wrong and we don't check it.

> > > Unfortunately it turns out that this is a realistic situation to run
> > > into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
> > > laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
> > > an identity default domain for PASIDs to be usable, and thus sets a
> > > def_domain_type override based on PASID capability.
> > 
> > The majority of places implementing def_domain_type are using it as a
> > statement of HW capability that should not be ignored by the core code:
> > 
> >   - DART
> >     * system page size is too small, can't map IOPTEs, force identity
> 
> Not a hardware limitation at all, it supports paging just fine, and
> iommu-dma *could* in principle work with larger pages with a bit of effort
> (some proof-of-concept patches were posted some time ago). 

I was not clear, I mean a HW/SW/Driver/system issue. The net result is
that PAGING or IDENTITY does not work in the Linux kernel for a whole
bunch of good and bad reasons.

> >     * iommu does not support IDENTITY at all, force paging

Like this one is a good reason. No HW support. It is checked during
attach domain and fails it.

> >   - tegra: Device quirks mean paging and DMA API doesn't work
> 
> Hmm, I don't recall any specific device concerns. 

Thierry told me once, it is another ugly GPU driver hack IIRC. It is a
bad reason, some hacky GPU driver issue and is overkill since it
should of_match like qcom.

> >   - amd: The driver can't support PAGING when in SNP mode
> >   - SMMU: The driver can't support paging when in legacy binding mode and
> >           paging domain allocation fails as well
> 
> Quite obviously nothing to do with hardware.

But attach_domain still fails. From a core perspective PAGING doesn't
work.

> OK, so after all that you do in fact agree? In that case, why are we still
> mucking about proposing hacks on top of hacks in the AMD driver rather than
> just fixing the regression sensibly?

It is because your proposal is regressing the meaning of
def_domain_type back to a policy knob when I've spent a bunch of work
emptying out def_domain type implementations to get it into a
capability report.

def_domain_type is now about *capability*. Does the
HW/SW/Driver/system support PAGING/IDENTITY or not.

Meaning if def_domain_type says it is not supported then the core code
should not use it. This is how everything was working until AMD
changed their driver to lie about what their attach_domain would
accept.

I do not want to see def_domain_type regress back to being confused
where some drivers are policy advice and some drivers are capability!

AMD should hack their driver for the rc fix and then go and fix it
properly to remove the PASID logic entirely from def_domain_type. I
will also point again out that in v6.9-rc AMD doesn't even support
PASID yet so this abuse of def_domain_type isn't even needed. :(

The core code should contiue to treat def_domain_type as capability.

This has obviously become confusing. I think we should rename
def_domain_type to have a clearer name and clearer purpose. Call it
supported_domain_types() returning an enum of:

 SUPPORTS_ALL
 SUPPORTS_IDENTITY
 SUPPORTS_PAGING
 SUPPORTS_IDENTITY_PASID

Which is much more clearly about some full system support, not a way
for drivers to inject arbitrary policy.

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ