[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250108165632.76746-1-joshuahahn@meta.com>
Date: Wed, 8 Jan 2025 11:56:32 -0500
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Hyeonggon Yoo <hyeonggon.yoo@...com>
Cc: Joshua Hahn <joshuahahn@...a.com>,
"Huang, Ying" <ying.huang@...ux.alibaba.com>,
Gregory Price <gourry@...rry.net>,
Joshua Hahn <joshua.hahnjy@...il.com>,
kernel_team@...ynix.com,
42.hyeyoo@...il.com,
"rafael@...nel.org" <rafael@...nel.org>,
"lenb@...nel.org" <lenb@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
김홍규(KIM HONGGYU) System SW <honggyu.kim@...com>,
김락기(KIM RAKIE) System SW <rakie.kim@...com>,
"dan.j.williams@...el.com" <dan.j.williams@...el.com>,
"Jonathan.Cameron@...wei.com" <Jonathan.Cameron@...wei.com>,
"dave.jiang@...el.com" <dave.jiang@...el.com>,
"horen.chuang@...ux.dev" <horen.chuang@...ux.dev>,
"hannes@...xchg.org" <hannes@...xchg.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"kernel-team@...a.com" <kernel-team@...a.com>
Subject: Re: [External Mail] Re: [External Mail] [RFC PATCH] mm/mempolicy: Weighted interleave auto-tuning
On Wed, 8 Jan 2025 10:19:19 +0900 Hyeonggon Yoo <hyeonggon.yoo@...com> wrote:
>
> On 2024-12-30 3:48 AM, Huang, Ying wrote:
> > Gregory Price <gourry@...rry.net> writes:
> >
> >> On Fri, Dec 27, 2024 at 09:59:30AM +0800, Huang, Ying wrote:
> >>> Gregory Price <gourry@...rry.net> writes:
> >>>> This still allows 0 to be a manual "reset specific node to default"
> >>>> mechanism for a specific node, and gives us a clean override.
> >>>
> >>> The difficulty is that users don't know the default value when they
> >>> reset a node's weight. We don't have an interface to show them. So, I
> >>> suggest to disable the functionality: "reset specific node to default".
> >>> They can still use "echo 1 > use_defaults" to reset all nodes to
> >>> default.
> >>>
> >>
> >> Good point, and agree. So lets just ditch 0. Since that "feature"
> >> wasn't even functional in the current code (it just reset it to 1 at
> >> this point), it's probably safe to just ditch it. Worst case scenario
> >> if someone takes issues, we can just have it revert the weight to 1.
> >
> > Before implementing the new version, it's better to summarize the user
> > space interface design first. So, we can check whether we have some
> > flaws.
>
> Hi, hope you all had a nice year-end holiday :)
Hi Hyeonggon, thank you for the review! I hope you have had a great
start to 2025 as well : -)
> Let me summarize the points we've discussed:
>
> - A new knob 'weightiness' is unnecessary until it's proven useful.
> Just using an internal default weightiness value will be enough.
Yup, and as Gregory mentioned, we're pretty confident that 32 gives
a good balance of reduction aggression & error minimization.
> - It will be counter-intuitive to update the value previously set by
> user, even if the value will no longer be valid (e.g. due to CXL
> memory hot-plug). User should update the weights accordingly in that
> case, instead of the kernel updating automatically overwriting them.
Agreed. I think we should lean on the side of keeping user-set values
at the highest priority, and avoid overwriting them.
> - Ditch the way of using 0 as 'system default' value because the user
> won't know what will be the default when setting it anyway. 0 value
> now means the kernel won't weight-interleave the node.
Yes, and I think this is going to be important to convey to users,
especially since the current expected behavior is that hotplugged nodes
are expected to automatically be included in interleaving.
> - Setting a node weight to default value (e.g. via the previous
> semantic of '0') could be problematic because it's not atomic -
> the system may be updating default values while the user's
> trying to set a node weight to default value.
>
> To deal with that, Huang suggested 'use_defaults' to atomically update
> all the weights to system default.
One concern that I have here is that even for the use_defaults interface,
the users don't know what the system default values are. From the user
perspective, when they make the judgement between using user values and
system defaults, one side of the choice is always going to be opaque,
as long as we only have a "one-layer" interface.
With that said, I think that in this new design where the choice is
either to only use system defaults vs. manually set everything, I think
the problem is much smaller, since there is no hybrid of using both.
I think this will bucket users into two categories: those who expect
weighted interleave to work out of the box with no configurations, and
those who will want to manually tune node weights. I think this is a
reasonable categorization, especially given Huang's earlier comment
on how "setting one node means the user thinks all other node weights
are reasonable". As long as we support these two user types, I think
we will cover most use cases.
Like mentioned previously by others, I think we don't have to support use
cases where some nodes are fixed to user values and others are fixed to
defaults, until an explicit use case comes up.
> Please let me know if there's any point we discussed that I am missing.
>
> Additionally I would like to mention that within an internal discussion
> my colleague Honggyu suggested introducing 'mode' parameter which can be
> either 'manual' or 'auto' instead of 'use_defaults' to be provide more
> intuitive interface.
>
> With Honggyu's suggestion and the points we've discussed,
> I think the interface could be:
>
> # At booting, the mode is 'auto' where the kernel can automatically
> # update any weights.
>
> mode auto # User hasn't specified any weight yet.
> effective [2, 1, -, -] # Using system defaults for node 0-1,
> # and node 2-3 not populated yet.
>
> # When a new NUMA node is added (e.g. via hotplug) in the 'auto' mode,
> # all weights are re-calculated based on ACPI HMAT table, including the
> # weight of the new node.
>
> mode auto # User hasn't specified weights yet.
> effective [2, 1, 1, -] # Using system defaults for node 0-2,
> # and node 3 not populated yet.
>
> # When user set at least one weight value, change the mode to 'manual'
> # where the kernel does not update any weights automatically without
> # user's consent.
>
> mode manual # User changed the weight of node 0 to 4,
> # changing the mode to manual config mode.
> effective [4, 1, 1, -]
>
>
> # When a new NUMA node is added (e.g. via hotplug) in the manual mode,
> # the new node's weight is zero because it's in manual mode and user
> # did not specify the weight for the new node yet.
>
> mode manual
> effective [4, 1, 1, 0]
>
> # When user changes the mode to 'auto', all weights are changed to
> # system defaults based on the ACPI HMAT table.
>
> mode auto
> effective [2, 1, 1, 1] # system defaults
>
> In the example I did not distinguish 'default weights' and 'user
> weights' because it's not important where the weight values came from --
> but it's important to know 1) what's the effective weights now and 2) if
> the kernel can update them.
>
> Any thoughts?
Please let me know if I am missing anything, but the way I understand the
"mode" interface, it follows the blanket use_defaults-like semantics of
auto / manual, but uses the one-layer interface like the v2 of this patch.
Personally, I think this makes a lot of sesne. I have a few thoughts:
First, it seems like re-weighting is something that only happens when
mode == auto. I think this makes sense, since the point of re-weighting
is to reduce ugly bandwidth values into interleave weight values. However,
I want to gather some thoughts on whether there are any scenarios that I am
missing, for when a user would want to be in manual mode but also trigger
a re-weighting.
One lingering thought I have is what happens when a hotplug event happens
in manual mode. The way I see it, there are 3 options on what to set the
value as:
- 0, since the user is in manual mode and has not set any value.
- 1, since we still want the node to be in use, even though the user has
not explicitly stated that they want it to be in use. 1 as a default
value makes sense, since it is the minimum weight a node can have,
and can only help by relieving bandwidth pressure from the other nodes.
- A value >= 1, determined by what the default value should have been
based on the bandwidths of the other nodes.
I am personally leaning towards the second option (1), but I also want to
hear everyone's thoughts on what makes the most sense.
> ---
> Best,
> Hyeonggon
Happy 2025 everyone! Thank you for all of your continued interest
and feedback on this patch. I am confident that we will come up with
a solution that makes sense for everyone, thanks to your time & efforts.
Have a great day!
Joshua
Powered by blists - more mailing lists