lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMkOCmGBhZKhKPrI@hpe.com>
Date: Tue, 16 Sep 2025 02:14:17 -0500
From: Kyle Meyer <kyle.meyer@....com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: corbet@....net, david@...hat.com, linmiaohe@...wei.com, shuah@...nel.org,
        tony.luck@...el.com, jane.chu@...cle.com, jiaqiyan@...gle.com,
        Liam.Howlett@...cle.com, bp@...en8.de, hannes@...xchg.org,
        jack@...e.cz, joel.granados@...nel.org, laoar.shao@...il.com,
        lorenzo.stoakes@...cle.com, mclapinski@...gle.com, mhocko@...e.com,
        nao.horiguchi@...il.com, osalvador@...e.de, rafael.j.wysocki@...el.com,
        rppt@...nel.org, russ.anderson@....com, shawn.fan@...el.com,
        surenb@...gle.com, vbabka@...e.cz, linux-acpi@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-kselftest@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2] mm/memory-failure: Support disabling soft offline for
 HugeTLB pages

On Mon, Sep 15, 2025 at 08:16:18PM -0700, Andrew Morton wrote:
> On Mon, 15 Sep 2025 19:27:41 -0500 Kyle Meyer <kyle.meyer@....com> wrote:
> 
> > Soft offlining a HugeTLB page reduces the HugeTLB page pool.
> > 
> > Commit 56374430c5dfc ("mm/memory-failure: userspace controls soft-offlining pages")
> > introduced the following sysctl interface to control soft offline:
> > 
> > /proc/sys/vm/enable_soft_offline
> > 
> > The interface does not distinguish between page types:
> > 
> >     0 - Soft offline is disabled
> >     1 - Soft offline is enabled
> > 
> > Convert enable_soft_offline to a bitmask and support disabling soft
> > offline for HugeTLB pages:
> > 
> > Bits:
> > 
> >     0 - Enable soft offline
> >     1 - Disable soft offline for HugeTLB pages
> > 
> > Supported values:
> > 
> >     0 - Soft offline is disabled
> >     1 - Soft offline is enabled
> >     3 - Soft offline is enabled (disabled for HugeTLB pages)
> > 
> > Existing behavior is preserved.
> 
> um, why?  What benefit does this patch provide to our users? 
> Use-cases, before-and-after scenarios, etc?

Thank you for the feedback.

Some BIOS suppress ("cloak") corrected memory errors until a threshold
is reached. Once that threshold is reached, BIOS reports a CPER with the
"error threshold exceeded" bit set via GHES and the corresponding page is
soft offlined.

BIOS does not know the page type of the corresponding page. If the
corresponding page happens to be a HugeTLB page, it will be dissolved,
permanently reducing the HugeTLB page pool. This can be problematic for
workloads that depend on a fixed number of HugeTLB pages.

Currently, soft offline must be disabled to prevent HugeTLB pages from
being soft offlined.

This patch provides a middle ground. Soft offline can be disabled for
HugeTLB pages while remaining enabled for non-HugeTLB pages, preserving
the benefits of soft offline without the risk of BIOS soft offlining
HugeTLB pages.

> > Update documentation and HugeTLB soft offline self tests.
> > 
> > Reported-by: Shawn Fan <shawn.fan@...el.com>
> 
> Interesting.  What did Shawn report? (Closes:!).

Tony or Shawn, could you please point me to the original report? Thanks!

> > Suggested-by: Tony Luck <tony.luck@...el.com>
> > Signed-off-by: Kyle Meyer <kyle.meyer@....com>
> >
> > ...
> >
> >  .../ABI/testing/sysfs-memory-page-offline     |  3 ++
> >  Documentation/admin-guide/sysctl/vm.rst       | 28 ++++++++++++++++---
> >  mm/memory-failure.c                           | 17 +++++++++--
> >  .../selftests/mm/hugetlb-soft-offline.c       | 19 ++++++++++---
> >  4 files changed, 56 insertions(+), 11 deletions(-)
> 
> I'll add it because testing, but please do explain why I added it?

Thanks,
Kyle Meyer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ