lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aFulSNMRd9kA9X+V@yzhao56-desk.sh.intel.com>
Date: Wed, 25 Jun 2025 15:29:12 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "ackerleytng@...gle.com" <ackerleytng@...gle.com>,
	"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Hansen, Dave" <dave.hansen@...el.com>, "david@...hat.com"
	<david@...hat.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>,
	"tabba@...gle.com" <tabba@...gle.com>, "vbabka@...e.cz" <vbabka@...e.cz>,
	"Du, Fan" <fan.du@...el.com>, "michael.roth@....com" <michael.roth@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "Peng, Chao P"
	<chao.p.peng@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Weiny, Ira" <ira.weiny@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun"
	<jun.miao@...el.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Wed, Jun 25, 2025 at 08:01:41AM +0800, Edgecombe, Rick P wrote:
> > > > So I think we're all in support of indicating unmapping/splitting issues
> > > > without returning anything from unmap(), and the discussed options are
> > > > 
> > > > a. Refcounts: won't work - mostly discussed in this (sub-)thread
> > > >    [3]. Using refcounts makes it impossible to distinguish between
> > > >    transient refcounts and refcounts due to errors.
> > > > 
> > > > b. Page flags: won't work with/can't benefit from HVO.
> > > 
> > > As above, this was for the purpose of catching bugs, not for guestmemfd to
> > > logically depend on it.
> > > 
> > > > 
> > > > Suggestions still in the running:
> > > > 
> > > > c. Folio flags are not precise enough to indicate which page actually
> > > >    had an error, but this could be sufficient if we're willing to just
> > > >    waste the rest of the huge page on unmapping error.
> > > 
> > > For a scenario of TDX module bug, it seems ok to me.
> > > 
> > > > 
> > > > d. Folio flags with folio splitting on error. This means that on
> > > >    unmapping/Secure EPT PTE splitting error, we have to split the
> > > >    (larger than 4K) folio to 4K, and then set a flag on the split folio.
> > > > 
> > > >    The issue I see with this is that splitting pages with HVO applied
> > > >    means doing allocations, and in an error scenario there may not be
> > > >    memory left to split the pages.
> > > > 
> > > > e. Some other data structure in guest_memfd, say, a linked list, and a
> > > >    function like kvm_gmem_add_error_pfn(struct page *page) that would
> > > >    look up the guest_memfd inode from the page and add the page's pfn to
> > > >    the linked list.
> > > > 
> > > >    Everywhere in guest_memfd that does unmapping/splitting would then
> > > >    check this linked list to see if the unmapping/splitting
> > > >    succeeded.
> > > > 
> > > >    Everywhere in guest_memfd that allocates pages will also check this
> > > >    linked list to make sure the pages are functional.
> > > > 
> > > >    When guest_memfd truncates, if the page being truncated is on the
> > > >    list, retain the refcount on the page and leak that page.
> > > 
> > > I think this is a fine option.
> > > 
> > > > 
> > > > f. Combination of c and e, something similar to HugeTLB's
> > > >    folio_set_hugetlb_hwpoison(), which sets a flag AND adds the pages in
> > > >    trouble to a linked list on the folio.
> > > > 
> > > > g. Like f, but basically treat an unmapping error as hardware poisoning.
> > > > 
> > > > I'm kind of inclined towards g, to just treat unmapping errors as
> > > > HWPOISON and buying into all the HWPOISON handling requirements. What do
> > > > yall think? Can a TDX unmapping error be considered as memory poisoning?
> > > 
> > > What does HWPOISON bring over refcounting the page/folio so that it never
> > > returns to the page allocator?
... 
> I do think that these threads have gone on far too long. It's probably about
> time to move forward with something even if it's just to have something to
> discuss that doesn't require footnoting so many lore links. So how about we move
> forward with option e as a next step. Does that sound good Yan?
I'm ok with e if allocation of memory for the linked list is not a problem.
Otherwise, I feel that a simpler solution would be to set a folio flag when an
unmapping error occurs.

guest_memfd needs to check this folio flag before the actual conversion and
in kvm_gmem_free_folio().


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ