linux-kernel - Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250731164420.GW26511@ziepe.ca>
Date: Thu, 31 Jul 2025 13:44:20 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Suzuki K Poulose <suzuki.poulose@....com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@...nel.org>,
	linux-coco@...ts.linux.dev, kvmarm@...ts.linux.dev,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	aik@....com, lukas@...ner.de, Samuel Ortiz <sameo@...osinc.com>,
	Xu Yilun <yilun.xu@...ux.intel.com>,
	Steven Price <steven.price@....com>,
	Catalin Marinas <catalin.marinas@....com>,
	Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>,
	Oliver Upton <oliver.upton@...ux.dev>
Subject: Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private
 memory

On Thu, Jul 31, 2025 at 02:48:23PM +0100, Suzuki K Poulose wrote:
> On 31/07/2025 13:17, Jason Gunthorpe wrote:
> > On Wed, Jul 30, 2025 at 11:09:35AM +0100, Suzuki K Poulose wrote:
> > > > > It is unclear whether devices would need to perform DMA to shared
> > > > > (unencrypted) memory while operating in this mode, as TLPs with T=1
> > > > > are generally expected to target private memory.
> > > > 
> > > > PCI SIG supports it, kernel should support it.
> > > 
> > > ACK. On Arm CCA, the device can access shared IPA, with T=1 transaction
> > > as long as the mapping is active in the Stage2 managed by RMM.
> > 
> > Right, I expect that the T=0 SMMU S2 translation is a perfect subset of
> > the T=1 S2 rmm translation. At most pages that are not available to
> > T=0 should be removed when making the subset.
> 
> Yes, this is what the VMM is supposed to do today, see [0] & [1].

Okay great!

> > I'm not sure what the plan is here on ARM though, do you expect to
> > pre-load the entire T=0 SMMU S2 with the shared IPA aliases and rely
> > on the GPT for protection or will the hypervisor dynamically change
> > the T=0 SMMU S2 after each shared/private change? Same question for
> 
> Yes, share/private transitions do go all the way back to VMM and it
> is supposed to make the necessary changes to the SMMU S2 (as in [1]).

Okay, it works, but also why?

>From a hypervisor perspective when using VFIO I'd like the guestmemfd
to fix all the physical memory immediately, so the entire physical map
is fixed and known. Backed by 1G huge pages most likely.

Is there a reason not to just dump that into the T=0 SMMU using 1G
huge pages and never touch it again? The GPT provides protection?

Sure sounds appealing..

> As for the RMM S2, the current plan is to re-use the CPU S2 managed
> by RMM.

Yes, but my question is if the CPU will be prepopulated
 
> Actually it is. But might solve the problem for confidential VMs,
> where the S2 mapping is kind of pinned.

Not kind of pinned, it is pinned in the hypervisor..
 
> Population of S2 is a bit tricky for CVMs, as there are restrictions
> due to :
>   1) Pre-boot measurements
>   2) Restrictions on modifying the S2 (at least on CCA).

I haven't dug into any of this, but I'd challenge you to try to make
it run fast if the guestmemfd has a full fixed address map in 1G pages
and could just dump them into the RMM efficiently once during boot.

Perhaps there are ways to optimize the measurements for huge amounts
of zero'd memory.

> Filling in the S2, with already populated S2 is complicated for CCA
> (costly, but not impossible). But the easier way is for the Realm to
> fault in the pages before they are used for DMA (and S2 mappings can be
> pinned by the hyp as default). Hence that suggestion.

I guess, but it's weird, kinda slow, and the RMM can never unfault them..

How will you reconstruct the 1G huge pages in the S2 if you are only
populating on faults? Can you really fault the entire 1G page? If so
why can't it be prepopulated?

Jason