lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1728684491.git.ackerleytng@google.com>
Date: Fri, 11 Oct 2024 23:22:35 +0000
From: Ackerley Tng <ackerleytng@...gle.com>
To: muchun.song@...ux.dev, peterx@...hat.com, akpm@...ux-foundation.org, 
	rientjes@...gle.com, fvdl@...gle.com, jthoughton@...gle.com, david@...hat.com
Cc: isaku.yamahata@...el.com, zhiquan1.li@...el.com, fan.du@...el.com, 
	jun.miao@...el.com, tabba@...gle.com, quic_eberman@...cinc.com, 
	roypat@...zon.co.uk, jgg@...dia.com, jhubbard@...dia.com, seanjc@...gle.com, 
	pbonzini@...hat.com, erdemaktas@...gle.com, vannapurve@...gle.com, 
	ackerleytng@...gle.com, pgonda@...gle.com, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org
Subject: [RFC PATCH 0/3] Reduce dependence on vmas deep in hugetlb allocation code

I hope to use these 3 patches to start a discussion on eventually
removing the need to pass a struct vma pointer when taking a folio
from the global pool (i.e. dequeue_hugetlb_folio_vma()).

Why eliminate passing the struct vma pointer?

VMAs are more related to mapping into userspace, and it would be cleaner if the
HugeTLB folio allocation process could just focus on returning a folio.

Currently, the vma struct is a convenient struct that holds pieces of
information required in the allocation process, but dequeuing should not depend
on the VMA concept.

If the vma is needed deep in the allocation process, then allocation could
become awkward, such as in HugeTLBfs's fallocate, where there is no vma (yet)
and a pseudo-vma has to be created.

Separation will help with HugeTLB unification. Taking reference from the buddy
allocator, __alloc_pages_noprof() is conceptually separate from VMAs.

I started looking into this because we want to use HugeTLB folios in guest_memfd
[1], and then I found that the HugeTLB folio allocation process is tightly
coupled with VMAs. This makes it hard to use HugeTLB folios in guest_memfd,
which does not have VMAs for private pages.

Then, I watched Peter Xu's talk at LSFMM [2] about HugeTLB unifications and
thought that these patches could also contribute to the unification effort.

As discussed at LPC 2024 [3], the general preference is for guest_memfd to use
HugeTLB folios. While that is being worked out, I hope these patches can be
separately considered and merged. I believe the patches are still useful in
improving understandability of the resv_map/subpool/hstate reservation system in
HugeTLB, and there are no functionality changes intended.

---

Why use HugeTLB folios in guest_memfd?

HugeTLB is *the* source of 1G pages in the kernel today and it would be best for
all 1G page users (HugeTLB, HugeTLBfs, or guest_memfd) on a host to draw from
the same pool of 1G pages.

This allows central tracking of all 1G pages, a precious resource on a machine.

Having a separate 1G page allocator would not only require rebuilding
of features that HugeTLB has, but also cause a split 1G pool. If both
allocators are used on a machine, it would be complicated to

(a) predetermine how many pages to put in each allocator's pool or
(b) transfer pages between the pools at runtime.

---

[1] https://lore.kernel.org/all/cover.1726009989.git.ackerleytng@google.com/T/
[2] https://youtu.be/7k-m2gTDu2k?si=ghWZ6qa1GAdaHOFP
[3] https://youtu.be/PVTjLLEpozE?si=HvdDlUc_4ElVXu5R

Ackerley Tng (3):
  mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma()
  mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv()
  mm: hugetlb: Remove unnecessary check for avoid_reserve

 mm/hugetlb.c | 57 +++++++++++++++++++++-------------------------------
 1 file changed, 23 insertions(+), 34 deletions(-)

--
2.47.0.rc1.288.g06298d1525-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ