linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aFJdYqN3QHQzMrVM@yzhao56-desk.sh.intel.com>
Date: Wed, 18 Jun 2025 14:32:02 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Vishal Annapurve <vannapurve@...gle.com>
CC: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>,
	"Li, Xiaoyao" <xiaoyao.li@...el.com>, "Shutemov, Kirill"
	<kirill.shutemov@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>,
	"david@...hat.com" <david@...hat.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "tabba@...gle.com" <tabba@...gle.com>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "Du, Fan" <fan.du@...el.com>,
	"michael.roth@....com" <michael.roth@....com>, "seanjc@...gle.com"
	<seanjc@...gle.com>, "Weiny, Ira" <ira.weiny@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "Peng, Chao P"
	<chao.p.peng@...el.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Tue, Jun 17, 2025 at 11:21:41PM -0700, Vishal Annapurve wrote:
> On Tue, Jun 17, 2025 at 11:15 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> >
> > On Tue, Jun 17, 2025 at 09:33:02PM -0700, Vishal Annapurve wrote:
> > > On Tue, Jun 17, 2025 at 5:49 PM Yan Zhao <yan.y.zhao@...el.com> wrote:
> > > >
> > > > On Wed, Jun 18, 2025 at 08:34:24AM +0800, Edgecombe, Rick P wrote:
> > > > > On Tue, 2025-06-17 at 01:09 -0700, Vishal Annapurve wrote:
> > > > > > Sorry I quoted Ackerley's response wrongly. Here is the correct reference [1].
> > > > >
> > > > > I'm confused...
> > > > >
> > > > > >
> > > > > > Speculative/transient refcounts came up a few times In the context of
> > > > > > guest_memfd discussions, some examples include: pagetable walkers,
> > > > > > page migration, speculative pagecache lookups, GUP-fast etc. David H
> > > > > > can provide more context here as needed.
> > > > > >
> > > > > > Effectively some core-mm features that are present today or might land
> > > > > > in the future can cause folio refcounts to be grabbed for short
> > > > > > durations without actual access to underlying physical memory. These
> > > > > > scenarios are unlikely to happen for private memory but can't be
> > > > > > discounted completely.
> > > > >
> > > > > This means the refcount could be increased for other reasons, and so guestmemfd
> > > > > shouldn't rely on refcounts for it's purposes? So, it is not a problem for other
> > > > > components handling the page elevate the refcount?
> > > > Besides that, in [3], when kvm_gmem_convert_should_proceed() determines whether
> > > > to convert to private, why is it allowed to just invoke
> > > > kvm_gmem_has_safe_refcount() without taking speculative/transient refcounts into
> > > > account? Isn't it more easier for shared pages to have speculative/transient
> > > > refcounts?
> > >
> > > These speculative refcounts are taken into account, in case of unsafe
> > > refcounts, conversion operation immediately exits to userspace with
> > > EAGAIN and userspace is supposed to retry conversion.
> > Hmm, so why can't private-to-shared conversion also exit to userspace with
> > EAGAIN?
> 
> How would userspace/guest_memfd differentiate between
> speculative/transient refcounts and extra refcounts due to TDX unmap
> failures?
Hmm, it also can't differentiate between speculative/transient refcounts and
extra refcounts on shared folios due to other reasons.

> 
> >
> > In the POC
> > https://lore.kernel.org/lkml/aE%2Fq9VKkmaCcuwpU@yzhao56-desk.sh.intel.com,
> > kvm_gmem_convert_should_proceed() just returns EFAULT (can be modified to
> > EAGAIN) to userspace instead.
> >
> > >
> > > Yes, it's more easier for shared pages to have speculative/transient refcounts.
> > >
> > > >
> > > > [3] https://lore.kernel.org/lkml/d3832fd95a03aad562705872cbda5b3d248ca321.1747264138.git.ackerleytng@google.com/
> > > >
> > > > > >
> > > > > > Another reason to avoid relying on refcounts is to not block usage of
> > > > > > raw physical memory unmanaged by kernel (without page structs) to back
> > > > > > guest private memory as we had discussed previously. This will help
> > > > > > simplify merge/split operations during conversions and help usecases
> > > > > > like guest memory persistence [2] and non-confidential VMs.
> > > > >
> > > > > If this becomes a thing for private memory (which it isn't yet), then couldn't
> > > > > we just change things at that point?
> > > > >
> > > > > Is the only issue with TDX taking refcounts that it won't work with future code
> > > > > changes?
> > > > >
> > > > > >
> > > > > > [1] https://lore.kernel.org/lkml/diqz7c2lr6wg.fsf@ackerleytng-ctop.c.googlers.com/
> > > > > > [2] https://lore.kernel.org/lkml/20240805093245.889357-1-jgowans@amazon.com/
> > > > >