[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250730201908.2395933-1-joshua.hahnjy@gmail.com>
Date: Wed, 30 Jul 2025 13:19:07 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: "Huang, Ying" <ying.huang@...ux.alibaba.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Zi Yan <ziy@...dia.com>,
Matthew Brost <matthew.brost@...el.com>,
Rakie Kim <rakie.kim@...com>,
Byungchul Park <byungchul@...com>,
Gregory Price <gourry@...rry.net>,
Alistair Popple <apopple@...dia.com>,
linux-kernel@...r.kernel.org,
linux-mm@...ck.org,
kernel-team@...a.com,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH] mempolicy: Clarify what RECLAIM_ZONE means
On Tue, 29 Jul 2025 08:58:49 +0800 "Huang, Ying" <ying.huang@...ux.alibaba.com> wrote:
> Joshua Hahn <joshua.hahnjy@...il.com> writes:
>
> > On Mon, 28 Jul 2025 09:44:06 +0800 "Huang, Ying" <ying.huang@...ux.alibaba.com> wrote:
> >
> >> Hi, Joshua,
> >>
> >> Joshua Hahn <joshua.hahnjy@...il.com> writes:
> >>
> >> > The zone_reclaim_mode API controls reclaim behavior when a node runs out of
> >> > memory. Contrary to its user-facing name, it is internally referred to as
> >> > "node_reclaim_mode". This is slightly confusing but there is not much we can
> >> > do given that it has already been exposed to userspace (since at least 2.6).
> >> >
> >> > However, what we can do is to make sure the internal description of what the
> >> > bits inside zone_reclaim_mode aligns with what it does in practice.
> >> > Setting RECLAIM_ZONE does indeed run shrink_inactive_list, but a more holistic
> >> > description would be to explain that zone reclaim modulates whether page
> >> > allocation (and khugepaged collapsing) prefers reclaiming & attempting to
> >> > allocate locally or should fall back to the next node in the zonelist.
> >> >
> >> > Change the description to clarify what zone reclaim entails.
> >> >
> >> > Signed-off-by: Joshua Hahn <joshua.hahnjy@...il.com>
> >> > ---
> >> > include/uapi/linux/mempolicy.h | 2 +-
> >> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
> >> > index 1f9bb10d1a47..24083809d920 100644
> >> > --- a/include/uapi/linux/mempolicy.h
> >> > +++ b/include/uapi/linux/mempolicy.h
> >> > @@ -69,7 +69,7 @@ enum {
> >> > * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
> >> > * ABI. New bits are OK, but existing bits can never change.
> >> > */
> >> > -#define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */
> >> > +#define RECLAIM_ZONE (1<<0) /* Prefer reclaiming & allocating locally */
> >> > #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
> >> > #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
> >> >
> >> >
> >> > base-commit: 25fae0b93d1d7ddb25958bcb90c3c0e5e0e202bd
> >
> > Hi Ying, thanks for your review, as always!
> >
> >> Please consider the document of zone_reclaim_mode in
> >> Documentation/admin-guide/sysctl/vm.rst too.
> >
> > Yes, will do. Along with SJ's comment, I think that the information in the
> > admin-guide should be sufficient enough to explain what these bits do, so
> > I think my patch is not very necessary.
> >
> >> And, IIUC, RECLAIM_ZONE doesn't mean "locally" exactly. It's legal to
> >> bind to some node other than "local node".
> >
> > You are correct, it seems you can also reclaim on non-local nodes once you
> > go further down in the zonelist. I think my intent with the new comment was just
> > to indicate a preference to reclaim and allocate on the *current* node, as
> > opposed to falling back to the next node in the zonelist.
> >
> > With that said, I think your comment along with SJ's feedback have gotten me
> > to understand that we proably don't need this change : -)
>
> TBH, I think that it's good to make some change to the comments.
> Because IMHO, the original comments are bound to some specific
> implementation details. Some more general words may be better for the
> user space API description.
Hi Ying, sorry for the late reply.
I think that is a good point. Then maybe in that case, we can take SJ's comment
and leave information about both the implementation detail (i.e. that it will
perform shrink inactive_list on the zone), and that it will prefer this over
allocating on the next node as a general description of what happens?
On that note, one thing that I felt was slightly undercaptured in
Documentation/admin-guide is what "zone reclaim" actually means. What it does
is of course well captured by its name, but it misses the nuance of preferring
reclaim over fallback allocation.
Actually the whole motivation behind all of this conversation is because I saw
zone reclaim preventing allocation into a second node in a 2-NUMA node system
and was a bit confused until I understood what the implication of having
zone reclaim was.
Anyways, I can probably spin the patch to include information about what
zone reclaim is, in the comment block above the bits.
But please feel free to correct me if you feel that the descriptions available
in both the mempolicy.h uapi file or the Documentation/admin-guide is already
enough.
Thanks for the review as always, Ying. Have a great day!
Joshua
> ---
> Best Regards,
> Huang, Ying
>
Sent using hkml (https://github.com/sjp38/hackermail)
Powered by blists - more mailing lists