lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0db9b48-e9c6-4e40-9d07-c353ab14e4ce@amd.com>
Date: Wed, 17 Apr 2024 10:53:00 +0530
From: Vasant Hegde <vashegde@....com>
To: Jason Gunthorpe <jgg@...pe.ca>, Robin Murphy <robin.murphy@....com>
Cc: joro@...tes.org, will@...nel.org, ewagner12@...il.com,
 suravee.suthikulpanit@....com, iommu@...ts.linux.dev,
 linux-kernel@...r.kernel.org, regressions@...ts.linux.dev
Subject: Re: [PATCH] iommu: Fix def_domain_type interaction with untrusted
 devices

Hi Jason, Robin,


On 4/16/2024 8:59 PM, Jason Gunthorpe wrote:
> On Tue, Apr 16, 2024 at 02:00:43PM +0100, Robin Murphy wrote:
>> Previously, an untrusted device forcing IOMMU_DOMAIN_DMA always took
>> precedence over whatever a driver's def_domain_type may have wanted to
>> say. This was intentionally handled in core code since 3 years prior,
>> to avoid drivers poking at the details of what is essentially a policy
>> between the PCI core and the IOMMU core. Now, though, we go to the
>> length of evaluating both constraints to check for any conflict, and if
>> so throw our toys out of the pram and refuse to handle the device at
>> all. Regardless of any intent, in practice this leaves the device, and
>> potentially the rest of its group or even the whole IOMMU, in a largely
>> undetermined state, which at worst may render the whole system
>> unusable.
> 
> For systems supporting untrusted device security the translation must
> be BLOCKED at this point.
> 
>> Unfortunately it turns out that this is a realistic situation to run
>> into by connecting a PASID-capable device (e.g. a GPU) to an AMD-based
>> laptop via a Thunderbolt expansion box, since the AMD IOMMU driver needs
>> an identity default domain for PASIDs to be usable, and thus sets a
>> def_domain_type override based on PASID capability.
> 
> The majority of places implementing def_domain_type are using it as a
> statement of HW capability that should not be ignored by the core code:
> 
>  - DART
>    * system page size is too small, can't map IOPTEs, force identity
>    * iommu does not support IDENTITY at all, force paging
>  - tegra: Device quirks mean paging and DMA API doesn't work
>  - amd: The driver can't support PAGING when in SNP mode

Actually When SNP (Secure Nested Paging) is enabled in host, AMD driver forces
DMA translation mode with AMD V1 page table.


>  - SMMU: The driver can't support paging when in legacy binding mode and
>          paging domain allocation fails as well
>  - qcom: Looks like these devices have some iommu bypass bus in their
>          HW and paging doesn't work
>  - SMMUv3: The comment says HISI devices cannot support paging due to a HW quirk
> 
> For these force overriding the driver knowledge will either result in
> domain allocate/attach failure or a broken DMA environment anyhow.
> 
> The AMD PASID thing is actually unique because the driver can still
> fully support PAGING, despite it wrongly telling the core code that it
> can't.

As I mentioned in other thread, AMD driver will be fixed to support paging mode
with V2 page table for PASID. I will look into it soon.


> 
> This is happening because it is all just a hack to work around the
> incomplete SW implementation in the AMD driver. When the AMD driver is
> completed its def_domain_type should be removed entirely.

Not related to this topic, but for completeness.. AMD driver has many condition
to deal. like :
  - Memory Encryption support - by default enforce paging mode
  - SNP - Enforce paging mode with AMD V1 page table
  - GPUs - Identity mapping

> 
> Since actual PASID AMD attach isn't implemented yet we could just
> remove that check from def_domain_type as an RC fix. Vasant can sort
> it out properly by disabling pasid support on untrusted devices until
> the DTE logic is fully completed.

Keeping PASID support aside, largely the question is who should handle/decide
domain type for untrusted device? Is it core IOMMU layer -OR- HW driver?
  - If its core layer, then this patch looks good to me.
  - If its individual driver, then I can try to add fix in AMD driver. But then
what is the expectation? Driver is expected to return IOMMU_DOMAIN_DMA -OR- core
IOMMU layer is expected to adhere to whatever driver returned?


-Vasant


> 
>> In general, restoring the old behaviour of forcing translation will not
>> make that device's operation any more broken than leaving it potentially
>> blocked or subject to the rest of a group's translations would, nor will
>> it be any less safe than leaving it potentially bypassed or subject to
>> the rest of a group's translations would, so do that, and let eGPUs work
>> again.
> 
> Well, this is true, since we can't handle the probe error it doesn't
> matter what we do.
> 
> Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ