[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250905162258.GA483339@ziepe.ca>
Date: Fri, 5 Sep 2025 13:22:58 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: Catalin Marinas <catalin.marinas@....com>
Cc: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@...nel.org>,
linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
linux-coco@...ts.linux.dev, will@...nel.org, maz@...nel.org,
tglx@...utronix.de, robin.murphy@....com, suzuki.poulose@....com,
akpm@...ux-foundation.org, steven.price@....com
Subject: Re: [RFC PATCH] arm64: swiotlb: dma: its: Ensure shared buffers are
properly aligned
On Fri, Sep 05, 2025 at 02:13:34PM +0100, Catalin Marinas wrote:
> Hi Aneesh,
>
> On Fri, Sep 05, 2025 at 11:24:41AM +0530, Aneesh Kumar K.V (Arm) wrote:
> > When running with private memory guests, the guest kernel must allocate
> > memory with specific constraints when sharing it with the hypervisor.
> >
> > These shared memory buffers are also accessed by the host kernel, which
> > means they must be aligned to the host kernel's page size.
>
> So this is the case where the guest page size is smaller than the host
> one. Just trying to understand what would go wrong if we don't do
> anything here. Let's say the guest uses 4K pages and the host a 64K
> pages. Within a 64K range, only a 4K is shared/decrypted. If the host
> does not explicitly access the other 60K around the shared 4K, can
> anything still go wrong? Is the hardware ok with speculative loads from
> non-shared ranges?
+1 I'm also confused by this description.
I thought the issue here was in the RMM. The GPT or S2 min granule
could be > 4k and in this case an unaligned set_memory_decrypted()
from the guest would have to fail inside the RMM as impossible to
execute?
Though I'm a little unclear on when and why the S2 needs to be
manipulated. Can't the S2 fully map both the protected and unprotected
IPA space and rely on the GPT for protection?
I do remember having a discussion that set_memory_decrypted() has
nothing to do with the VM's S1 granule size, and it is a mistake to
have linked these together. The VM needs to understand what
granularity the RMM will support set_memory_decrypted() for and follow
that.
I don't recall there is also an issue on the hypervisor? I thought GPT
faults on ARM were going to work well, ie we could cleanly segfault
the VMM process if it touches any protected memory that may have been
mapped into it, and speculation was safe?
> > @@ -213,16 +213,20 @@ static gfp_t gfp_flags_quirk;
> > static struct page *its_alloc_pages_node(int node, gfp_t gfp,
> > unsigned int order)
> > {
> > + long new_order;
> > struct page *page;
> > int ret = 0;
> >
> > - page = alloc_pages_node(node, gfp | gfp_flags_quirk, order);
> > + /* align things to hypervisor page size */
> > + new_order = get_order(ALIGN((PAGE_SIZE << order), arch_shared_mem_alignment()));
> > +
> > + page = alloc_pages_node(node, gfp | gfp_flags_quirk, new_order);
> >
> > if (!page)
> > return NULL;
> >
> > ret = set_memory_decrypted((unsigned long)page_address(page),
> > - 1 << order);
> > + 1 << new_order);
>
> At some point this could move to the DMA API.
I don't think we should be open coding these patterns.
Esepcially given the above, it makes no sense to 'alloc page' and then
'decrypt page' on ARM CCA. decryption is not really a OS page level
operation. I suggest coming with some series to clean these up into a
more sensible API.
Everything wanting decrypted memory should be going through some more
general API that has some opportunity to use pools.
DMA API may be one choice, but I know we will need more options in
RDMA land :|
Jason
Powered by blists - more mailing lists