[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67ed66ef7c070_9dac294e0@dwillia2-xfh.jf.intel.com.notmuch>
Date: Wed, 2 Apr 2025 09:33:51 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Rakie Kim <rakie.kim@...com>, <gourry@...rry.net>
CC: <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
<joshua.hahnjy@...il.com>, <dan.j.williams@...el.com>,
<ying.huang@...ux.alibaba.com>, <david@...hat.com>,
<Jonathan.Cameron@...wei.com>, <kernel_team@...ynix.com>,
<honggyu.kim@...com>, <yunjeong.mun@...com>, <rakie.kim@...com>
Subject: Re: [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for
weighted interleave
Rakie Kim wrote:
> Previously, the weighted interleave sysfs structure was statically
> managed, preventing dynamic updates when nodes were added or removed.
>
> This patch restructures the weighted interleave sysfs to support
> dynamic insertion and deletion. The sysfs that was part of
> the 'weighted_interleave_group' is now globally accessible,
> allowing external access to that sysfs.
>
> With this change, sysfs management for weighted interleave is
> more flexible, supporting hotplug events and runtime updates
> more effectively.
I understand the urge to try to make a general case for a patch, but it
is better to state the explicit reason especially when someone is later
reading the history and may not realize that this is part of a series.
So instead of making claims like "this is more flexible / more effective
for runtime updates", state that motivation explicitly. Something like:
"In preparation for enabling weighted-interleave sysfs attributes to
react to node-online/offline events, introduce sysfs_wi_node_add() and
sysfs_wi_node_delete() helpers to dynamically manage the
weighted-interleave attributes.
A follow-on patch registers a memory-hotplug notifier to use these
helpers, for now just refactor the current "publish all possible node"
approach to use sysfs_wi_node_{add,delete}()."
>
> Signed-off-by: Rakie Kim <rakie.kim@...com>
> ---
> mm/mempolicy.c | 70 ++++++++++++++++++++++----------------------------
> 1 file changed, 30 insertions(+), 40 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 5950d5d5b85e..6c8843114afd 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3388,6 +3388,13 @@ struct iw_node_attr {
> int nid;
> };
>
> +struct sysfs_wi_group {
> + struct kobject wi_kobj;
> + struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *sgrp;
> +
> static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
> char *buf)
> {
> @@ -3430,27 +3437,23 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
> return count;
> }
>
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> - struct kobject *parent)
> +static void sysfs_wi_node_release(int nid)
I called this sysfs_wi_node_delete() above because _release() is
typically callback invoked on last put of a kobject.
> {
> - if (!node_attr)
> + if (!sgrp->nattrs[nid])
> return;
> - sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> - kfree(node_attr->kobj_attr.attr.name);
> - kfree(node_attr);
> +
> + sysfs_remove_file(&sgrp->wi_kobj, &sgrp->nattrs[nid]->kobj_attr.attr);
> + kfree(sgrp->nattrs[nid]->kobj_attr.attr.name);
> + kfree(sgrp->nattrs[nid]);
> }
>
> static void sysfs_wi_release(struct kobject *wi_kobj)
> {
> - int i;
> -
> - for (i = 0; i < nr_node_ids; i++)
> - sysfs_wi_node_release(node_attrs[i], wi_kobj);
> + int nid;
>
> - kfree(node_attrs);
> - kfree(wi_kobj);
> + for (nid = 0; nid < nr_node_ids; nid++)
> + sysfs_wi_node_release(nid);
> + kfree(sgrp);
This looks broken, are you sure that a kobject with a zero reference can
still host child attributes?
The teardown flow I would expect is:
sysfs_remove_file(node_attrs[i],
kobject_del(wi_kobj)
...that does final kobject_put()...
kfree(container_of(wi_kobj))
However, now I do not think patch1 is actually fixing anything because
there is never a kobject_del() of the mempolicy_kobj. Just like there is
never a kobject_del() of the mm_kobj.
So patch1 seems to potentially be addressing a bug introduced by this
dynamic work which is caused by the original code being confused about
the kobject shutdown path.
The original problems are that sysfs_wi_release() has a kobject_put()
which, yes, is broken, but equally problematic is that there is no
kobject_del() in sight for either of these kobjects(), even with the new
changes. mempolicy_kobj_release() seems to confuse the activities that I
would expect to be near a kobject_del() call with the minimal kfree() on
final put.
> }
>
> static const struct kobj_type wi_ktype = {
> @@ -3458,7 +3461,7 @@ static const struct kobj_type wi_ktype = {
> .release = sysfs_wi_release,
> };
>
> -static int add_weight_node(int nid, struct kobject *wi_kobj)
> +static int sysfs_wi_node_add(int nid)
> {
> struct iw_node_attr *node_attr;
> char *name;
> @@ -3480,57 +3483,44 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
> node_attr->kobj_attr.store = node_store;
> node_attr->nid = nid;
>
> - if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
> + if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> kfree(node_attr->kobj_attr.attr.name);
> kfree(node_attr);
> pr_err("failed to add attribute to weighted_interleave\n");
> return -ENOMEM;
> }
>
> - node_attrs[nid] = node_attr;
> + sgrp->nattrs[nid] = node_attr;
> return 0;
> }
>
> -static int add_weighted_interleave_group(struct kobject *root_kobj)
> +static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
> {
> - struct kobject *wi_kobj;
> int nid, err;
>
> - node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> - GFP_KERNEL);
> - if (!node_attrs)
> + sgrp = kzalloc(sizeof(struct sysfs_wi_group) + \
> + nr_node_ids * sizeof(struct iw_node_attr *), \
> + GFP_KERNEL);
The recommended way to allocate a struct with a flexible array is using
the struct_size() helper.
kzalloc(struct_size(sgrp, nattrs, nr_node_ids), GFP_KERNEL)
...but overall I think the original code needs a cleanup and to be clear
that I think there is no memory leak risk exposed to existing users
given the shutdown path is never invoked.
Powered by blists - more mailing lists