[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZagSW5TXzZeKErlW@memverge.com>
Date: Wed, 17 Jan 2024 12:46:03 -0500
From: Gregory Price <gregory.price@...verge.com>
To: "Huang, Ying" <ying.huang@...el.com>
Cc: Gregory Price <gourry.memverge@...il.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-api@...r.kernel.org,
corbet@....net, akpm@...ux-foundation.org, honggyu.kim@...com,
rakie.kim@...com, hyeongtak.ji@...com, mhocko@...nel.org,
vtavarespetr@...ron.com, jgroves@...ron.com,
ravis.opensrc@...ron.com, sthanneeru@...ron.com,
emirakhur@...ron.com, Hasan.Maruf@....com, seungjun.ha@...sung.com,
hannes@...xchg.org, dan.j.williams@...el.com
Subject: Re: [PATCH 1/3] mm/mempolicy: implement the sysfs-based
weighted_interleave interface
On Wed, Jan 17, 2024 at 02:58:08PM +0800, Huang, Ying wrote:
> Gregory Price <gregory.price@...verge.com> writes:
>
> > We haven't had the discussion on how/when this should happen yet,
> > though, and there's some research to be done. (i.e. when should DRAM
> > weights be set? should the entire table be reweighted on hotplug? etc)
>
> Before that, I'm OK to remove default_iw_table and use hard coded "1" as
> default weight for now.
>
Can't quite do that. default_iw_table is a static structure because we
need a reliable default structure not subject to module initialization
failure. Otherwise we can end up in a situation where iw_table is NULL
during some allocation path if the sysfs structure fails to setup fully.
There's no good reason to fail allocations just because sysfs failed to
initialization for some reason. I'll leave default_iw_table with a size
of MAX_NUMNODES for now (nr_node_ids is set up at runtime per your
reference to `setup_nr_node_ids` below, so we can't use it for this).
> >
> >> u8 __rcu *iw_table;
> >>
> >> Then, we only need to allocate nr_node_ids elements now.
> >>
> >
> > We need nr_possible_nodes to handle hotplug correctly.
>
> nr_node_ids >= num_possible_nodes(). It's larger than any possible node
> ID.
>
nr_node_ids gets setup at runtime, while the default_iw_table needs
to be a static structure (see above). I can make default_iw_table
MAX_NUMNODES and subsequent allocations of iw_table be nr_node_ids,
but that makes iw_table a different size at any given time.
This *will* break if "true hotplug" ever shows up and possible_nodes !=
MAX_NUMNODES. But I can write it up if it's a sticking point for you.
Ultimately we're squabbling over, at most, about ~3kb of memory, just
keep that in mind. (I guess if you spawn 3000 threads and each tries a
concurrent write to sysfs/node1, you'd eat 3MB view briefly, but that
is a truly degenerate case and I can think of more denegerate things).
>
> When "true node hotplug" becomes reality, we can make nr_node_ids ==
> MAX_NUMNODES. So, it's safe to use it. Please take a look at
> setup_nr_node_ids().
>
~Gregory
Powered by blists - more mailing lists