linux-kernel - Re: [PATCH 1/4] iommu/amd: Introduce Protection-domain flag VFIO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y8q9ocj2IZB2r6Np@ziepe.ca>
Date:   Fri, 20 Jan 2023 12:13:21 -0400
From:   Jason Gunthorpe <jgg@...pe.ca>
To:     "Kalra, Ashish" <ashish.kalra@....com>
Cc:     Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
        linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
        joro@...tes.org, robin.murphy@....com, thomas.lendacky@....com,
        vasant.hegde@....com, jon.grimm@....com
Subject: Re: [PATCH 1/4] iommu/amd: Introduce Protection-domain flag VFIO

On Fri, Jan 20, 2023 at 09:12:26AM -0600, Kalra, Ashish wrote:
> On 1/19/2023 11:44 AM, Jason Gunthorpe wrote:
> > On Thu, Jan 19, 2023 at 02:54:43AM -0600, Kalra, Ashish wrote:
> > > Hello Jason,
> > > 
> > > On 1/13/2023 9:33 AM, Jason Gunthorpe wrote:
> > > > On Tue, Jan 10, 2023 at 08:31:34AM -0600, Suravee Suthikulpanit wrote:
> > > > > Currently, to detect if a domain is enabled with VFIO support, the driver
> > > > > checks if the domain has devices attached and check if the domain type is
> > > > > IOMMU_DOMAIN_UNMANAGED.
> > > > 
> > > > NAK
> > > > 
> > > > If you need weird HW specific stuff like this then please implement it
> > > > properly in iommufd, not try and randomly guess what things need from
> > > > the domain type.
> > > > 
> > > > All this confidential computing stuff needs a comprehensive solution,
> > > > not some piecemeal mess. How can you even use a CC guest with VFIO in
> > > > the upstream kernel? Hmm?
> > > > 
> > > 
> > > Currently all guest devices are untrusted - whether they are emulated,
> > > virtio or passthrough. In the current use case of VFIO device-passthrough to
> > > an SNP guest, the pass-through device will perform DMA to un-encrypted or
> > > shared guest memory, in the same way as virtio or emulated devices.
> > > 
> > > This fix is prompted by an issue reported by Nvidia, they are trying to do
> > > PCIe device passthrough to SNP guest. The memory allocated for DMA is
> > > through dma_alloc_coherent() in the SNP guest and during DMA I/O an
> > > RMP_PAGE_FAULT is observed on the host.
> > > 
> > > These dma_alloc_coherent() calls map into page state change hypercalls into
> > > the host to change guest page state from encrypted to shared in the RMP
> > > table.
> > > 
> > > Following is a link to issue discussed above:
> > > https://github.com/AMDESE/AMDSEV/issues/109
> > 
> > Wow you should really write all of this in the commmit message
> > 
> > > Now, to set individual 4K entries to different shared/private
> > > mappings in NPT or host page tables for large page entries, the RMP
> > > and NPT/host page table large page entries are split to 4K pte’s.
> > 
> > Why are mappings to private pages even in the iommu in the first
> > place - and how did they even get there?
> > 
> 
> You seem to be confusing between host/NPT page tables and IOMMU page tables.

No, I haven't. I'm repeating what was said:

 during DMA I/O an RMP_PAGE_FAULT is observed on the host.

So, I'm interested to hear how you can get a RMP_PAGE_FAULT from the
IOMMU if the IOMMU is only programmed with shared pages that, by (my)
definition, are accessible to the CPU and should not generate a
RMP_PAGE_FAULT?

I think you are confusing my use of the word private with some AMD
architecture deatils. When I say private I mean that the host CPU will
generate a violation if it tries to access the memory.

I think the conclusion is logical - if the IOMMU is experiencing a
protection violation it is because the IOMMU was programed with PFNs
it is not allowed to access - and so why was that even done in the
first place?

I suppose what is going on is you program the IOPTEs with PFNs of
unknown state and when the PFN changes access protections the IOMMU
can simply use it without needing to synchronize with the access
protection change. And your problem is that the granularity of access
protection change does not match the IOPTE granularity in the IOMMU.

But this seems very wasteful as the IOMMU will be using IOPTEs and
also will pin the memory when the systems *knows* this memory cannot
be accessed through the IOMMU. It seems much better to dynamically
establish IOMMU mappings only when you learn that the memory is
actually accesisble to the IOMMU.

Also, I thought the leading plan for CC was to use the memfd approach here:

https://lore.kernel.org/kvm/20220915142913.2213336-1-chao.p.peng@linux.intel.com/

Which prevents mmaping the memory to userspace - so how did it get
into the IOMMU in the first place?

Jason