[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aMMNVA9EXXHYvmKH@agluck-desk3>
Date: Thu, 11 Sep 2025 10:56:36 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: David Hildenbrand <david@...hat.com>
CC: Kyle Meyer <kyle.meyer@....com>, <akpm@...ux-foundation.org>,
<corbet@....net>, <linmiaohe@...wei.com>, <shuah@...nel.org>,
<Liam.Howlett@...cle.com>, <bp@...en8.de>, <hannes@...xchg.org>,
<jack@...e.cz>, <jane.chu@...cle.com>, <jiaqiyan@...gle.com>,
<joel.granados@...nel.org>, <laoar.shao@...il.com>,
<lorenzo.stoakes@...cle.com>, <mclapinski@...gle.com>, <mhocko@...e.com>,
<nao.horiguchi@...il.com>, <osalvador@...e.de>, <rafael.j.wysocki@...el.com>,
<rppt@...nel.org>, <russ.anderson@....com>, <shawn.fan@...el.com>,
<surenb@...gle.com>, <vbabka@...e.cz>, <linux-acpi@...r.kernel.org>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>, <linux-mm@...ck.org>
Subject: Re: [PATCH] mm/memory-failure: Disable soft offline for HugeTLB
pages by default
On Thu, Sep 11, 2025 at 10:46:10AM +0200, David Hildenbrand wrote:
> On 10.09.25 18:15, Kyle Meyer wrote:
> > Soft offlining a HugeTLB page reduces the available HugeTLB page pool.
> > Since HugeTLB pages are preallocated, reducing the available HugeTLB
> > page pool can cause allocation failures.
> >
> > /proc/sys/vm/enable_soft_offline provides a sysctl interface to
> > disable/enable soft offline:
> >
> > 0 - Soft offline is disabled.
> > 1 - Soft offline is enabled.
> >
> > The current sysctl interface does not distinguish between HugeTLB pages
> > and other page types.
> >
> > Disable soft offline for HugeTLB pages by default (1) and extend the
> > sysctl interface to preserve existing behavior (2):
> >
> > 0 - Soft offline is disabled.
> > 1 - Soft offline is enabled (excluding HugeTLB pages).
> > 2 - Soft offline is enabled (including HugeTLB pages).
> >
> > Update documentation for the sysctl interface, reference the sysctl
> > interface in the sysfs ABI documentation, and update HugeTLB soft
> > offline selftests.
>
> I'm sure you spotted that the documentation for
> "/sys/devices/system/memory/soft_offline_pag" resides under "testing".
But that is only one of several places in the kernel that
feed into the page offline code.
This patch was motivated by the GHES path where BIOS indicates
a corrected error threshold was exceeded. There's also the
drivers/ras/cec.c path where Linux does it's own threshold
counting.
>
> If your read about MADV_SOFT_OFFLINE in the man page it clearly says:
>
> "This feature is intended for testing of memory error-handling code; it is
> available only if the kernel was configured with CONFIG_MEMORY_FAILURE."
Agreed that this all depends on CONFIG_MEMORY_FAILURE ... so if any
part of the flow is compiled in when that is "=n" then some
changes are needed to fix that.
>
> So I'm sorry to say: I miss why we should add all this complexity to make a
> feature used for testing soft-offlining work differently for hugetlb folios
> -- with a testing interface.
>
> --
> Cheers
>
> David / dhildenb
-Tony
Powered by blists - more mailing lists