[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e783fa6ee3997567c661e5c10b05b5d456382fb.camel@intel.com>
Date: Fri, 16 May 2025 19:14:46 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>
CC: "palmer@...belt.com" <palmer@...belt.com>, "kvm@...r.kernel.org"
<kvm@...r.kernel.org>, "catalin.marinas@....com" <catalin.marinas@....com>,
"Miao, Jun" <jun.miao@...el.com>, "nsaenz@...zon.es" <nsaenz@...zon.es>,
"pdurrant@...zon.co.uk" <pdurrant@...zon.co.uk>, "vbabka@...e.cz"
<vbabka@...e.cz>, "peterx@...hat.com" <peterx@...hat.com>, "x86@...nel.org"
<x86@...nel.org>, "jack@...e.cz" <jack@...e.cz>, "tabba@...gle.com"
<tabba@...gle.com>, "quic_svaddagi@...cinc.com" <quic_svaddagi@...cinc.com>,
"amoorthy@...gle.com" <amoorthy@...gle.com>, "pvorel@...e.cz"
<pvorel@...e.cz>, "vkuznets@...hat.com" <vkuznets@...hat.com>,
"mail@...iej.szmigiero.name" <mail@...iej.szmigiero.name>, "Annapurve,
Vishal" <vannapurve@...gle.com>, "anthony.yznaga@...cle.com"
<anthony.yznaga@...cle.com>, "Wang, Wei W" <wei.w.wang@...el.com>,
"keirf@...gle.com" <keirf@...gle.com>, "Wieczor-Retman, Maciej"
<maciej.wieczor-retman@...el.com>, "Zhao, Yan Y" <yan.y.zhao@...el.com>,
"ajones@...tanamicro.com" <ajones@...tanamicro.com>, "Hansen, Dave"
<dave.hansen@...el.com>, "rppt@...nel.org" <rppt@...nel.org>,
"quic_mnalajal@...cinc.com" <quic_mnalajal@...cinc.com>, "aik@....com"
<aik@....com>, "usama.arif@...edance.com" <usama.arif@...edance.com>,
"fvdl@...gle.com" <fvdl@...gle.com>, "paul.walmsley@...ive.com"
<paul.walmsley@...ive.com>, "bfoster@...hat.com" <bfoster@...hat.com>,
"quic_cvanscha@...cinc.com" <quic_cvanscha@...cinc.com>,
"willy@...radead.org" <willy@...radead.org>, "Du, Fan" <fan.du@...el.com>,
"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>,
"thomas.lendacky@....com" <thomas.lendacky@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"mic@...ikod.net" <mic@...ikod.net>, "oliver.upton@...ux.dev"
<oliver.upton@...ux.dev>, "akpm@...ux-foundation.org"
<akpm@...ux-foundation.org>, "steven.price@....com" <steven.price@....com>,
"muchun.song@...ux.dev" <muchun.song@...ux.dev>, "binbin.wu@...ux.intel.com"
<binbin.wu@...ux.intel.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
"rientjes@...gle.com" <rientjes@...gle.com>, "Aktas, Erdem"
<erdemaktas@...gle.com>, "mpe@...erman.id.au" <mpe@...erman.id.au>,
"david@...hat.com" <david@...hat.com>, "jgg@...pe.ca" <jgg@...pe.ca>,
"hughd@...gle.com" <hughd@...gle.com>, "Xu, Haibo1" <haibo1.xu@...el.com>,
"jhubbard@...dia.com" <jhubbard@...dia.com>, "anup@...infault.org"
<anup@...infault.org>, "maz@...nel.org" <maz@...nel.org>, "Yamahata, Isaku"
<isaku.yamahata@...el.com>, "jthoughton@...gle.com" <jthoughton@...gle.com>,
"steven.sistare@...cle.com" <steven.sistare@...cle.com>,
"quic_pheragu@...cinc.com" <quic_pheragu@...cinc.com>, "jarkko@...nel.org"
<jarkko@...nel.org>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
"chenhuacai@...nel.org" <chenhuacai@...nel.org>, "Huang, Kai"
<kai.huang@...el.com>, "shuah@...nel.org" <shuah@...nel.org>,
"dwmw@...zon.co.uk" <dwmw@...zon.co.uk>, "pankaj.gupta@....com"
<pankaj.gupta@....com>, "Peng, Chao P" <chao.p.peng@...el.com>,
"nikunj@....com" <nikunj@....com>, "Graf, Alexander" <graf@...zon.com>,
"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, "pbonzini@...hat.com"
<pbonzini@...hat.com>, "yuzenghui@...wei.com" <yuzenghui@...wei.com>,
"jroedel@...e.de" <jroedel@...e.de>, "suzuki.poulose@....com"
<suzuki.poulose@....com>, "jgowans@...zon.com" <jgowans@...zon.com>, "Xu,
Yilun" <yilun.xu@...el.com>, "liam.merwick@...cle.com"
<liam.merwick@...cle.com>, "michael.roth@....com" <michael.roth@....com>,
"quic_tsoni@...cinc.com" <quic_tsoni@...cinc.com>,
"richard.weiyang@...il.com" <richard.weiyang@...il.com>, "Weiny, Ira"
<ira.weiny@...el.com>, "aou@...s.berkeley.edu" <aou@...s.berkeley.edu>, "Li,
Xiaoyao" <xiaoyao.li@...el.com>, "qperret@...gle.com" <qperret@...gle.com>,
"kent.overstreet@...ux.dev" <kent.overstreet@...ux.dev>,
"dmatlack@...gle.com" <dmatlack@...gle.com>, "james.morse@....com"
<james.morse@....com>, "brauner@...nel.org" <brauner@...nel.org>,
"hch@...radead.org" <hch@...radead.org>, "ackerleytng@...gle.com"
<ackerleytng@...gle.com>, "linux-fsdevel@...r.kernel.org"
<linux-fsdevel@...r.kernel.org>, "pgonda@...gle.com" <pgonda@...gle.com>,
"quic_pderrin@...cinc.com" <quic_pderrin@...cinc.com>, "roypat@...zon.co.uk"
<roypat@...zon.co.uk>, "will@...nel.org" <will@...nel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd
On Fri, 2025-05-16 at 10:51 -0700, Sean Christopherson wrote:
> From my perspective, 1GiB hugepage support in guest_memfd isn't about improving
> CoCo performance, it's about achieving feature parity on guest_memfd with respect
> to existing backing stores so that it's possible to use guest_memfd to back all
> VM shapes in a fleet.
>
> Let's assume there is significant value in backing non-CoCo VMs with 1GiB pages,
> unless you want to re-litigate the existence of 1GiB support in HugeTLBFS.
I didn't expect to go in that direction when I first asked. But everyone says
huge, but no one knows the numbers. It can be a sign of things.
Meanwhile I'm watching patches to make 5 level paging walks unconditional fly by
because people couldn't find a cost to the extra level of walk. So re-litigate,
no. But I'll probably remain quietly suspicious of the exact cost/value. At
least on the CPU side, I totally missed the IOTLB side at first, sorry.
>
> If we assume 1GiB support is mandatory for non-CoCo VMs, then it becomes mandatory
> for CoCo VMs as well, because it's the only realistic way to run CoCo VMs and
> non-CoCo VMs on a single host. Mixing 1GiB HugeTLBFS with any other backing store
> for VMs simply isn't tenable due to the nature of 1GiB allocations. E.g. grabbing
> sub-1GiB chunks of memory for CoCo VMs quickly fragments memory to the point where
> HugeTLBFS can't allocate memory for non-CoCo VMs.
It makes sense that there would be a difference in how many huge pages the non-
coco guests would get. Where I start to lose you is when you guys talk about
"mandatory" or similar. If you want upstream review, it would help to have more
numbers on the "why" question. At least for us folks outside the hyperscalars
where such things are not as obvious.
>
> Teaching HugeTLBFS to play nice with TDX and SNP isn't happening, which leaves
> adding 1GiB support to guest_memfd as the only way forward.
>
> Any boost to TDX (or SNP) performance is purely a bonus.
Most of the bullets in the talk were about mapping sizes AFAICT, so this is the
kind of reasoning I was hoping for. Thanks for elaborating on it, even though
still no one has any numbers besides the vmemmap savings.
Powered by blists - more mailing lists