lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA1CXcA0WX+qGKvL4VcTM_bazFxRyDmp5DK60ycoS4OCDUnH-Q@mail.gmail.com>
Date: Wed, 21 May 2025 04:19:14 -0600
From: Nico Pache <npache@...hat.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: linux-mm@...ck.org, linux-doc@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	rientjes@...gle.com, hannes@...xchg.org, lorenzo.stoakes@...cle.com, 
	rdunlap@...radead.org, mhocko@...e.com, Liam.Howlett@...cle.com, 
	zokeefe@...gle.com, surenb@...gle.com, jglisse@...gle.com, cl@...two.org, 
	jack@...e.cz, dave.hansen@...ux.intel.com, will@...nel.org, tiwai@...e.de, 
	catalin.marinas@....com, anshuman.khandual@....com, dev.jain@....com, 
	raquini@...hat.com, aarcange@...hat.com, kirill.shutemov@...ux.intel.com, 
	yang@...amperecomputing.com, thomas.hellstrom@...ux.intel.com, 
	vishal.moola@...il.com, sunnanyong@...wei.com, usamaarif642@...il.com, 
	wangkefeng.wang@...wei.com, ziy@...dia.com, shuah@...nel.org, 
	peterx@...hat.com, willy@...radead.org, ryan.roberts@....com, 
	baolin.wang@...ux.alibaba.com, baohua@...nel.org, david@...hat.com, 
	mathieu.desnoyers@...icios.com, mhiramat@...nel.org, rostedt@...dmis.org, 
	corbet@....net, akpm@...ux-foundation.org
Subject: Re: [PATCH v6 0/4] mm: introduce THP deferred setting

On Tue, May 20, 2025 at 3:25 AM Yafang Shao <laoar.shao@...il.com> wrote:
>
> On Thu, May 15, 2025 at 11:41 AM Nico Pache <npache@...hat.com> wrote:
> >
> > This series is a follow-up to [1], which adds mTHP support to khugepaged.
> > mTHP khugepaged support is a "loose" dependency for the sysfs/sysctl
> > configs to make sense. Without it global="defer" and  mTHP="inherit" case
> > is "undefined" behavior.
> >
> > We've seen cases were customers switching from RHEL7 to RHEL8 see a
> > significant increase in the memory footprint for the same workloads.
> >
> > Through our investigations we found that a large contributing factor to
> > the increase in RSS was an increase in THP usage.
> >
> > For workloads like MySQL, or when using allocators like jemalloc, it is
> > often recommended to set /transparent_hugepages/enabled=never. This is
> > in part due to performance degradations and increased memory waste.
> >
> > This series introduces enabled=defer, this setting acts as a middle
> > ground between always and madvise. If the mapping is MADV_HUGEPAGE, the
> > page fault handler will act normally, making a hugepage if possible. If
> > the allocation is not MADV_HUGEPAGE, then the page fault handler will
> > default to the base size allocation. The caveat is that khugepaged can
> > still operate on pages that are not MADV_HUGEPAGE.
> >
> > This allows for three things... one, applications specifically designed to
> > use hugepages will get them, and two, applications that don't use
> > hugepages can still benefit from them without aggressively inserting
> > THPs at every possible chance. This curbs the memory waste, and defers
> > the use of hugepages to khugepaged. Khugepaged can then scan the memory
> > for eligible collapsing. Lastly there is the added benefit for those who
> > want THPs but experience higher latency PFs. Now you can get base page
> > performance at the PF handler and Hugepage performance for those mappings
> > after they collapse.
> >
> > Admins may want to lower max_ptes_none, if not, khugepaged may
> > aggressively collapse single allocations into hugepages.
> >
> > TESTING:
> > - Built for x86_64, aarch64, ppc64le, and s390x
> > - selftests mm
> > - In [1] I provided a script [2] that has multiple access patterns
> > - lots of general use.
> > - redis testing. This test was my original case for the defer mode. What I
> >    was able to prove was that THP=always leads to increased max_latency
> >    cases; hence why it is recommended to disable THPs for redis servers.
> >    However with 'defer' we dont have the max_latency spikes and can still
> >    get the system to utilize THPs. I further tested this with the mTHP
> >    defer setting and found that redis (and probably other jmalloc users)
> >    can utilize THPs via defer (+mTHP defer) without a large latency
> >    penalty and some potential gains. I uploaded some mmtest results
> >    here[3] which compares:
> >        stock+thp=never
> >        stock+(m)thp=always
> >        khugepaged-mthp + defer (max_ptes_none=64)
> >
> >   The results show that (m)THPs can cause some throughput regression in
> >   some cases, but also has gains in other cases. The mTHP+defer results
> >   have more gains and less losses over the (m)THP=always case.
> >
> > V6 Changes:
> > - nits
> > - rebased dependent series and added review tags
> >
> > V5 Changes:
> > - rebased dependent series
> > - added reviewed-by tag on 2/4
> >
> > V4 Changes:
> > - Minor Documentation fixes
> > - rebased the dependent series [1] onto mm-unstable
> >     commit 0e68b850b1d3 ("vmalloc: use atomic_long_add_return_relaxed()")
> >
> > V3 Changes:
> > - Combined the documentation commits into one, and moved a section to the
> >   khugepaged mthp patchset
> >
> > V2 Changes:
> > - base changes on mTHP khugepaged support
> > - Fix selftests parsing issue
> > - add mTHP defer option
> > - add mTHP defer Documentation
> >
> > [1] - https://lore.kernel.org/all/20250515032226.128900-1-npache@redhat.com/
> > [2] - https://gitlab.com/npache/khugepaged_mthp_test
> > [3] - https://people.redhat.com/npache/mthp_khugepaged_defer/testoutput2/output.html
> >
> > Nico Pache (4):
> >   mm: defer THP insertion to khugepaged
> >   mm: document (m)THP defer usage
> >   khugepaged: add defer option to mTHP options
> >   selftests: mm: add defer to thp setting parser
> >
> >  Documentation/admin-guide/mm/transhuge.rst | 31 +++++++---
> >  include/linux/huge_mm.h                    | 18 +++++-
> >  mm/huge_memory.c                           | 69 +++++++++++++++++++---
> >  mm/khugepaged.c                            |  8 +--
> >  tools/testing/selftests/mm/thp_settings.c  |  1 +
> >  tools/testing/selftests/mm/thp_settings.h  |  1 +
> >  6 files changed, 106 insertions(+), 22 deletions(-)
> >
> > --
> > 2.49.0
> >
> >
>
> Hello Nico,
>
> Upon reviewing the series, it occurred to me that BPF could solve this
> more cleanly. Adding a 'tva_flags' parameter to the BPF hook would
> handle this case and future scenarios without requiring new modes. The
> BPF mode could then serve as a unified solution.
Hi Yafang,

I dont see how this is the case? This would require users to
modify/add functionality rather than configuring the system in this
manner. What if BPF is not configured or being used? Having to use an
additional technology that requires precise configuration doesn't seem
cleaner.

Either way, thank you for taking a look into the series !

-- Nico
>
> --
> Regards
> Yafang
>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ