lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqzqzyeg3j2.fsf@ackerleytng-ctop.c.googlers.com>
Date: Thu, 17 Jul 2025 09:56:01 -0700
From: Ackerley Tng <ackerleytng@...gle.com>
To: Xu Yilun <yilun.xu@...ux.intel.com>
Cc: Yan Zhao <yan.y.zhao@...el.com>, Vishal Annapurve <vannapurve@...gle.com>, 
	Jason Gunthorpe <jgg@...pe.ca>, Alexey Kardashevskiy <aik@....com>, Fuad Tabba <tabba@...gle.com>, kvm@...r.kernel.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, x86@...nel.org, 
	linux-fsdevel@...r.kernel.org, ajones@...tanamicro.com, 
	akpm@...ux-foundation.org, amoorthy@...gle.com, anthony.yznaga@...cle.com, 
	anup@...infault.org, aou@...s.berkeley.edu, bfoster@...hat.com, 
	binbin.wu@...ux.intel.com, brauner@...nel.org, catalin.marinas@....com, 
	chao.p.peng@...el.com, chenhuacai@...nel.org, dave.hansen@...el.com, 
	david@...hat.com, dmatlack@...gle.com, dwmw@...zon.co.uk, 
	erdemaktas@...gle.com, fan.du@...el.com, fvdl@...gle.com, graf@...zon.com, 
	haibo1.xu@...el.com, hch@...radead.org, hughd@...gle.com, ira.weiny@...el.com, 
	isaku.yamahata@...el.com, jack@...e.cz, james.morse@....com, 
	jarkko@...nel.org, jgowans@...zon.com, jhubbard@...dia.com, jroedel@...e.de, 
	jthoughton@...gle.com, jun.miao@...el.com, kai.huang@...el.com, 
	keirf@...gle.com, kent.overstreet@...ux.dev, kirill.shutemov@...el.com, 
	liam.merwick@...cle.com, maciej.wieczor-retman@...el.com, 
	mail@...iej.szmigiero.name, maz@...nel.org, mic@...ikod.net, 
	michael.roth@....com, mpe@...erman.id.au, muchun.song@...ux.dev, 
	nikunj@....com, nsaenz@...zon.es, oliver.upton@...ux.dev, palmer@...belt.com, 
	pankaj.gupta@....com, paul.walmsley@...ive.com, pbonzini@...hat.com, 
	pdurrant@...zon.co.uk, peterx@...hat.com, pgonda@...gle.com, pvorel@...e.cz, 
	qperret@...gle.com, quic_cvanscha@...cinc.com, quic_eberman@...cinc.com, 
	quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com, quic_pheragu@...cinc.com, 
	quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com, richard.weiyang@...il.com, 
	rick.p.edgecombe@...el.com, rientjes@...gle.com, roypat@...zon.co.uk, 
	rppt@...nel.org, seanjc@...gle.com, shuah@...nel.org, steven.price@....com, 
	steven.sistare@...cle.com, suzuki.poulose@....com, thomas.lendacky@....com, 
	usama.arif@...edance.com, vbabka@...e.cz, viro@...iv.linux.org.uk, 
	vkuznets@...hat.com, wei.w.wang@...el.com, will@...nel.org, 
	willy@...radead.org, xiaoyao.li@...el.com, yilun.xu@...el.com, 
	yuzenghui@...wei.com, zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce
 KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls

Xu Yilun <yilun.xu@...ux.intel.com> writes:

> On Wed, Jul 16, 2025 at 03:22:06PM -0700, Ackerley Tng wrote:
>> Yan Zhao <yan.y.zhao@...el.com> writes:
>> 
>> > On Tue, Jun 24, 2025 at 07:10:38AM -0700, Vishal Annapurve wrote:
>> >> On Tue, Jun 24, 2025 at 6:08 AM Jason Gunthorpe <jgg@...pe.ca> wrote:
>> >> >
>> >> > On Tue, Jun 24, 2025 at 06:23:54PM +1000, Alexey Kardashevskiy wrote:
>> >> >
>> >> > > Now, I am rebasing my RFC on top of this patchset and it fails in
>> >> > > kvm_gmem_has_safe_refcount() as IOMMU holds references to all these
>> >> > > folios in my RFC.
>> >> > >
>> >> > > So what is the expected sequence here? The userspace unmaps a DMA
>> >> > > page and maps it back right away, all from the userspace? The end
>> >> > > result will be the exactly same which seems useless. And IOMMU TLB
>> >> 
>> >>  As Jason described, ideally IOMMU just like KVM, should just:
>> >> 1) Directly rely on guest_memfd for pinning -> no page refcounts taken
>> >> by IOMMU stack
>> > In TDX connect, TDX module and TDs do not trust VMM. So, it's the TDs to inform
>> > TDX module about which pages are used by it for DMAs purposes.
>> > So, if a page is regarded as pinned by TDs for DMA, the TDX module will fail the
>> > unmap of the pages from S-EPT.
>> >
>> > If IOMMU side does not increase refcount, IMHO, some way to indicate that
>> > certain PFNs are used by TDs for DMA is still required, so guest_memfd can
>> > reject the request before attempting the actual unmap.
>> > Otherwise, the unmap of TD-DMA-pinned pages will fail.
>> >
>> > Upon this kind of unmapping failure, it also doesn't help for host to retry
>> > unmapping without unpinning from TD.
>> >
>> >
>> 
>> Yan, Yilun, would it work if, on conversion,
>> 
>> 1. guest_memfd notifies IOMMU that a conversion is about to happen for a
>>    PFN range
>
> It is the Guest fw call to release the pinning.

I see, thanks for explaining.

> By the time VMM get the
> conversion requirement, the page is already physically unpinned. So I
> agree with Jason the pinning doesn't have to reach to iommu from SW POV.
>

If by the time KVM gets the conversion request, the page is unpinned,
then we're all good, right?

When guest_memfd gets the conversion request, as part of conversion
handling it will request to zap the page from stage-2 page tables. TDX
module would see that the page is unpinned and the unmapping will
proceed fine. Is that understanding correct?

>> 2. IOMMU forwards the notification to TDX code in the kernel
>> 3. TDX code in kernel tells TDX module to stop thinking of any PFNs in
>>    the range as pinned for DMA?
>
> TDX host can't stop the pinning. Actually this mechanism is to prevent
> host from unpin/unmap the DMA out of Guest expectation.
>

On this note, I'd also like to check something else. Putting TDX connect
and IOMMUs aside, if the host unmaps a guest private page today without
the guest requesting it, the unmapping will work and the guest will be
broken, right?

> Thanks,
> Yilun
>
>> 
>> If the above is possible then by the time we get to unmapping from
>> S-EPTs, TDX module would already consider the PFNs in the range "not
>> pinned for DMA".


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ