lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPTztWYD09A5rJBPNtjDa07uMswxFHutYGwBR54ByfMchd6YKA@mail.gmail.com>
Date: Mon, 10 Feb 2025 10:56:50 -0800
From: Frank van der Linden <fvdl@...gle.com>
To: Oscar Salvador <osalvador@...e.de>
Cc: akpm@...ux-foundation.org, muchun.song@...ux.dev, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, yuzhao@...gle.com, usamaarif642@...il.com, 
	joao.m.martins@...cle.com, roman.gushchin@...ux.dev
Subject: Re: [PATCH v3 00/28] hugetlb/CMA improvements for large systems

On Mon, Feb 10, 2025 at 10:40 AM Oscar Salvador <osalvador@...e.de> wrote:
>
> On Thu, Feb 06, 2025 at 06:50:40PM +0000, Frank van der Linden wrote:
> > v3:
> > * Fix SPDX comment include file format.
> > * Add new hugetlb_cma.* files to MAINTAINERS
> > * Document new ranges/ subdir in CMA debugfs.
> > * Fix powerpc compilation for config without HAVE_BOOTMEM_INFO_NODE
> > * Fix various other nits found by kernel test robot.
> > * Use a PFN value of -1 to indicate a non-mirrored mapping
> >   in sparse-vmemmap.c, not 0.
> > * Fix incorrect if() statement that got mangled in cma.c
> >
> > v2:
> > * Add missing CMA debugfs code.
> > * Minor cleanups in hugetlb_cma changes.
> > * Move hugetlb_cma code to its own file to further clean
> >   things up.
> >
> > On large systems, we observed some issues with hugetlb and CMA:
> >
> > 1) When specifying a large number of hugetlb boot pages (hugepages=
> >    on the commandline), the kernel may run out of memory before it
> >    even gets to HVO. For example, if you have a 3072G system, and
> >    want to use 3024 1G hugetlb pages for VMs, that should leave
> >    you plenty of space for the hypervisor, provided you have the
> >    hugetlb vmemmap optimization (HVO) enabled. However, since
> >    the vmemmap pages are always allocated first, and then later
> >    in boot freed, you will actually run yourself out of memory
> >    before you can do HVO. This means not getting all the hugetlb
> >    pages you want, and worse, failure to boot if there is an
> >    allocation failure in the system from which it can't recover.
> >
> > 2) There is a system setup where you might want to use hugetlb_cma
> >    with a large value (say, again, 3024 out of 3072G like above),
> >    and then lower that if system usage allows it, to make room
> >    for non-hugetlb processes. For this, a variation of the problem
> >    above applies: the kernel runs out of unmovable space to allocate
> >    from before you finish boot, since your CMA area takes up all
> >    the space.
> >
> > 3) CMA wants to use one big contiguous area for allocations. Which
> >    fails if you have the aforementioned 3T system with a gap in the
> >    middle of physical memory (like the < 40bits BIOS DMA area seen on
> >    some AMD systems). You then won't be able to set up a CMA area for
> >    one of the NUMA nodes, leading to loss of half of your hugetlb
> >    CMA area.
> >
> > 4) Under the scenario mentioned in 2), when trying to grow the
> >    number of hugetlb pages after dropping it for a while, new
> >    CMA allocations may fail occasionally. This is not unexpected,
> >    some transient references on pages may prevent cma_alloc
> >    from succeeding under memory pressure. However, the hugetlb
> >    code then falls back to a normal contiguous alloc, which may
> >    end up succeeding. This is not always desired behavior. If
> >    you have a large CMA area, then the kernel has a restricted
> >    amount of memory it can do unmovable allocations from (a well
> >    known issue). A normal contiguous alloc may eat further in to
> >    this space.
>
> Hi Frank,
>
> While I plan to keep reviewing the series, I think it would make sense
> to split this patchset into two smaller ones.
> The way I see it, we are trying to deal with two different problems and their
> solutions.
>
> 1) pre-hvo at boot time
> 2) multi-range support of CMA (only used for hugetlb)
>
> I did not go through the entire patchset yet, so I ignore whether the
> respective patches to tackle these two problems are really dependent on
> each other, but I think that would be very interesting to consider a
> patchset per solution if that is not the case.
>
> IMHO, it would ease review quite a lot.

Hi Oskar,

Thanks a lot for reviewing this series.

I certainly could split it up, but here are the dependencies (it's
actually 3 parts):

1. Multi-range CMA (used by hugetlb) (patches 1-4)
2. Pre-HVO for hugetlb bootmem pages (patches 5-22)
3. Enable hugepages= (and pre-HVO) for CMA (patches 23-28)

1 and 2 are independent. 3 depends on 1 and 2.

So, I could post 1) and 2) simultaneously, and 3) would have to wait
until 1) and 2) are resolved.

Andrew, do you have any thoughts on splitting it up?

- Frank

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ