lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCdVXn3ZqFXzQ0e4@google.com>
Date: Fri, 16 May 2025 10:45:07 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Vishal Annapurve <vannapurve@...gle.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@...el.com>, "pvorel@...e.cz" <pvorel@...e.cz>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "catalin.marinas@....com" <catalin.marinas@....com>, 
	Jun Miao <jun.miao@...el.com>, Kirill Shutemov <kirill.shutemov@...el.com>, 
	"pdurrant@...zon.co.uk" <pdurrant@...zon.co.uk>, "steven.price@....com" <steven.price@....com>, 
	"peterx@...hat.com" <peterx@...hat.com>, "x86@...nel.org" <x86@...nel.org>, 
	"amoorthy@...gle.com" <amoorthy@...gle.com>, "tabba@...gle.com" <tabba@...gle.com>, 
	"quic_svaddagi@...cinc.com" <quic_svaddagi@...cinc.com>, "maz@...nel.org" <maz@...nel.org>, 
	"vkuznets@...hat.com" <vkuznets@...hat.com>, "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, 
	"keirf@...gle.com" <keirf@...gle.com>, "hughd@...gle.com" <hughd@...gle.com>, 
	"mail@...iej.szmigiero.name" <mail@...iej.szmigiero.name>, "palmer@...belt.com" <palmer@...belt.com>, 
	Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>, Yan Y Zhao <yan.y.zhao@...el.com>, 
	"ajones@...tanamicro.com" <ajones@...tanamicro.com>, "willy@...radead.org" <willy@...radead.org>, 
	"jack@...e.cz" <jack@...e.cz>, "paul.walmsley@...ive.com" <paul.walmsley@...ive.com>, "aik@....com" <aik@....com>, 
	"usama.arif@...edance.com" <usama.arif@...edance.com>, 
	"quic_mnalajal@...cinc.com" <quic_mnalajal@...cinc.com>, "fvdl@...gle.com" <fvdl@...gle.com>, 
	"rppt@...nel.org" <rppt@...nel.org>, "quic_cvanscha@...cinc.com" <quic_cvanscha@...cinc.com>, 
	"nsaenz@...zon.es" <nsaenz@...zon.es>, "vbabka@...e.cz" <vbabka@...e.cz>, Fan Du <fan.du@...el.com>, 
	"anthony.yznaga@...cle.com" <anthony.yznaga@...cle.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "mic@...ikod.net" <mic@...ikod.net>, 
	"oliver.upton@...ux.dev" <oliver.upton@...ux.dev>, 
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "bfoster@...hat.com" <bfoster@...hat.com>, 
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "muchun.song@...ux.dev" <muchun.song@...ux.dev>, 
	Zhiquan1 Li <zhiquan1.li@...el.com>, "rientjes@...gle.com" <rientjes@...gle.com>, 
	"mpe@...erman.id.au" <mpe@...erman.id.au>, Erdem Aktas <erdemaktas@...gle.com>, 
	"david@...hat.com" <david@...hat.com>, "jgg@...pe.ca" <jgg@...pe.ca>, 
	"jhubbard@...dia.com" <jhubbard@...dia.com>, Haibo1 Xu <haibo1.xu@...el.com>, 
	"anup@...infault.org" <anup@...infault.org>, Dave Hansen <dave.hansen@...el.com>, 
	Isaku Yamahata <isaku.yamahata@...el.com>, "jthoughton@...gle.com" <jthoughton@...gle.com>, 
	Wei W Wang <wei.w.wang@...el.com>, 
	"steven.sistare@...cle.com" <steven.sistare@...cle.com>, "jarkko@...nel.org" <jarkko@...nel.org>, 
	"quic_pheragu@...cinc.com" <quic_pheragu@...cinc.com>, "chenhuacai@...nel.org" <chenhuacai@...nel.org>, 
	Kai Huang <kai.huang@...el.com>, "shuah@...nel.org" <shuah@...nel.org>, 
	"dwmw@...zon.co.uk" <dwmw@...zon.co.uk>, "pankaj.gupta@....com" <pankaj.gupta@....com>, 
	Chao P Peng <chao.p.peng@...el.com>, "nikunj@....com" <nikunj@....com>, Alexander Graf <graf@...zon.com>, 
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, "pbonzini@...hat.com" <pbonzini@...hat.com>, 
	"yuzenghui@...wei.com" <yuzenghui@...wei.com>, "jroedel@...e.de" <jroedel@...e.de>, 
	"suzuki.poulose@....com" <suzuki.poulose@....com>, "jgowans@...zon.com" <jgowans@...zon.com>, 
	Yilun Xu <yilun.xu@...el.com>, "liam.merwick@...cle.com" <liam.merwick@...cle.com>, 
	"michael.roth@....com" <michael.roth@....com>, "quic_tsoni@...cinc.com" <quic_tsoni@...cinc.com>, 
	"richard.weiyang@...il.com" <richard.weiyang@...il.com>, Ira Weiny <ira.weiny@...el.com>, 
	"aou@...s.berkeley.edu" <aou@...s.berkeley.edu>, Xiaoyao Li <xiaoyao.li@...el.com>, 
	"qperret@...gle.com" <qperret@...gle.com>, 
	"kent.overstreet@...ux.dev" <kent.overstreet@...ux.dev>, "dmatlack@...gle.com" <dmatlack@...gle.com>, 
	"james.morse@....com" <james.morse@....com>, "brauner@...nel.org" <brauner@...nel.org>, 
	"ackerleytng@...gle.com" <ackerleytng@...gle.com>, 
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>, "pgonda@...gle.com" <pgonda@...gle.com>, 
	"quic_pderrin@...cinc.com" <quic_pderrin@...cinc.com>, "roypat@...zon.co.uk" <roypat@...zon.co.uk>, 
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "will@...nel.org" <will@...nel.org>, 
	"hch@...radead.org" <hch@...radead.org>
Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

On Fri, May 16, 2025, Vishal Annapurve wrote:
> On Thu, May 15, 2025 at 7:12 PM Edgecombe, Rick P <rick.p.edgecombe@...el.com> wrote:
> > On Thu, 2025-05-15 at 17:57 -0700, Sean Christopherson wrote:
> > > You're conflating two different things.  guest_memfd allocating and managing
> > > 1GiB physical pages, and KVM mapping memory into the guest at 1GiB/2MiB
> > > granularity.  Allocating memory in 1GiB chunks is useful even if KVM can only
> > > map memory into the guest using 4KiB pages.
> >
> > I'm aware of the 1.6% vmemmap benefits from the LPC talk. Is there more? The
> > list quoted there was more about guest performance. Or maybe the clever page
> > table walkers that find contiguous small mappings could benefit guest
> > performance too? It's the kind of thing I'd like to see at least broadly called
> > out.
> 
> The crux of this series really is hugetlb backing support for guest_memfd and
> handling CoCo VMs irrespective of the page size as I suggested earlier, so 2M
> page sizes will need to handle similar complexity of in-place conversion.
> 
> Google internally uses 1G hugetlb pages to achieve high bandwidth IO,

E.g. hitting target networking line rates is only possible with 1GiB mappings,
otherwise TLB pressure gets in the way.

> lower memory footprint using HVO and lower MMU/IOMMU page table memory
> footprint among other improvements. These percentages carry a substantial
> impact when working at the scale of large fleets of hosts each carrying
> significant memory capacity.

Yeah, 1.6% might sound small, but over however many bytes of RAM there are in
the fleet, it's a huge (lol) amount of memory saved.

> > >   Yes, some of this is useful for TDX, but we (and others) want to use
> > > guest_memfd for far more than just CoCo VMs.
> >
> >
> > >  And for non-CoCo VMs, 1GiB hugepages are mandatory for various workloads.
> > I've heard this a lot. It must be true, but I've never seen the actual numbers.
> > For a long time people believed 1GB huge pages on the direct map were critical,
> > but then benchmarking on a contemporary CPU couldn't find much difference
> > between 2MB and 1GB. I'd expect TDP huge pages to be different than that because
> > the combined walks are huge, iTLB, etc, but I'd love to see a real number.

The direct map is very, very different than userspace and thus guest mappings.
Software (hopefully) isn't using the direct map to index multi-TiB databases,
or to transfer GiBs of data over the network.  The amount of memory the kernel
is regularly accessing is an order or two magnitude smaller than single process
use cases.

A few examples from a quick search:

http://pvk.ca/Blog/2014/02/18/how-bad-can-1gb-pages-be
https://www.percona.com/blog/benchmark-postgresql-with-linux-hugepages/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ