[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0db9b48-e9c6-4e40-9d07-c353ab14e4ce@amd.com>
Date: Wed, 17 Apr 2024 10:53:00 +0530
From: Vasant Hegde <vashegde@....com>
To: Jason Gunthorpe <jgg@...pe.ca>, Robin Murphy <robin.murphy@....com>
Cc: joro@...tes.org, will@...nel.org, ewagner12@...il.com,
suravee.suthikulpanit@....com, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted
devices
Hi Jason, Robin,
On 4/16/2024 8:59 PM, Jason Gunthorpe wrote:
> On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
>> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
>> precedence over whatever a driver's def_domain_type may have wanted to
>> say. This was intentionally handled in core code since 3 years prior,
>> to avoid drivers poking at the details of what is essentially a policy
>> between the PCI core and the IOMMU core. Now, though, we go to the
>> length of evaluating both constraints to check for any conflict, and if
>> so throw our toys out of the pram and refuse to handle the device at
>> all. Regardless of any intent, in practice this leaves the device, and
>> potentially the rest of its group or even the whole IOMMU, in a largely
>> undetermined state, which at worst may render the whole system
>> unusable.
>
> For systems supporting untrusted device security the translation must
> be BLOCKED at this point.
>
>> Unfortunately it turns out that this is a realistic situation to run
>> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
>> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
>> an identity default domain for PASIDs to be usable, and thus sets a
>> def_domain_type override based on PASID capability.
>
> The majority of places implementing def_domain_type are using it as a
> statement of HW capability that should not be ignored by the core code:
>
> - DART
> * system page size is too small, can't map IOPTEs, force identity
> * iommu does not support IDENTITY at all, force paging
> - tegra: Device quirks mean paging and DMA API doesn't work
> - amd: The driver can't support PAGING when in SNP mode
Actually When SNP (Secure Nested Paging) is enabled in host, AMD driver forces
DMA translation mode with AMD V1 page table.
> - SMMU: The driver can't support paging when in legacy binding mode and
> paging domain allocation fails as well
> - qcom: Looks like these devices have some iommu bypass bus in their
> HW and paging doesn't work
> - SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk
>
> For these force overriding the driver knowledge will either result in
> domain allocate/attach failure or a broken DMA environment anyhow.
>
> The AMD PASID thing is actually unique because the driver can still
> fully support PAGING, despite it wrongly telling the core code that it
> can't.
As I mentioned in other thread, AMD driver will be fixed to support paging mode
with V2 page table for PASID. I will look into it soon.
>
> This is happening because it is all just a hack to work around the
> incomplete SW implementation in the AMD driver. When the AMD driver is
> completed its def_domain_type should be removed entirely.
Not related to this topic, but for completeness.. AMD driver has many condition
to deal. like :
- Memory Encryption support - by default enforce paging mode
- SNP - Enforce paging mode with AMD V1 page table
- GPUs - Identity mapping
>
> Since actual PASID AMD attach isn't implemented yet we could just
> remove that check from def_domain_type as an RC fix. Vasant can sort
> it out properly by disabling pasid support on untrusted devices until
> the DTE logic is fully completed.
Keeping PASID support aside, largely the question is who should handle/decide
domain type for untrusted device? Is it core IOMMU layer -OR- HW driver?
- If its core layer, then this patch looks good to me.
- If its individual driver, then I can try to add fix in AMD driver. But then
what is the expectation? Driver is expected to return IOMMU_DOMAIN_DMA -OR- core
IOMMU layer is expected to adhere to whatever driver returned?
-Vasant
>
>> In general, restoring the old behaviour of forcing translation will not
>> make that device's operation any more broken than leaving it potentially
>> blocked or subject to the rest of a group's translations would, nor will
>> it be any less safe than leaving it potentially bypassed or subject to
>> the rest of a group's translations would, so do that, and let eGPUs work
>> again.
>
> Well, this is true, since we can't handle the probe error it doesn't
> matter what we do.
>
> Jason
Powered by blists - more mailing lists