linux-kernel - Re: [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for weighted interleave

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67ed66ef7c070_9dac294e0@dwillia2-xfh.jf.intel.com.notmuch>
Date: Wed, 2 Apr 2025 09:33:51 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Rakie Kim <rakie.kim@...com>, <gourry@...rry.net>
CC: <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <linux-cxl@...r.kernel.org>,
	<joshua.hahnjy@...il.com>, <dan.j.williams@...el.com>,
	<ying.huang@...ux.alibaba.com>, <david@...hat.com>,
	<Jonathan.Cameron@...wei.com>, <kernel_team@...ynix.com>,
	<honggyu.kim@...com>, <yunjeong.mun@...com>, <rakie.kim@...com>
Subject: Re: [PATCH v3 2/3] mm/mempolicy: Support dynamic sysfs updates for
 weighted interleave

Rakie Kim wrote:
> Previously, the weighted interleave sysfs structure was statically
> managed, preventing dynamic updates when nodes were added or removed.
> 
> This patch restructures the weighted interleave sysfs to support
> dynamic insertion and deletion. The sysfs that was part of
> the 'weighted_interleave_group' is now globally accessible,
> allowing external access to that sysfs.
> 
> With this change, sysfs management for weighted interleave is
> more flexible, supporting hotplug events and runtime updates
> more effectively.

I understand the urge to try to make a general case for a patch, but it
is better to state the explicit reason especially when someone is later
reading the history and may not realize that this is part of a series.

So instead of making claims like "this is more flexible / more effective
for runtime updates", state that motivation explicitly.  Something like:

"In preparation for enabling weighted-interleave sysfs attributes to
react to node-online/offline events, introduce sysfs_wi_node_add() and
sysfs_wi_node_delete() helpers to dynamically manage the
weighted-interleave attributes.

A follow-on patch registers a memory-hotplug notifier to use these
helpers, for now just refactor the current "publish all possible node"
approach to use sysfs_wi_node_{add,delete}()."

> 
> Signed-off-by: Rakie Kim <rakie.kim@...com>
> ---
>  mm/mempolicy.c | 70 ++++++++++++++++++++++----------------------------
>  1 file changed, 30 insertions(+), 40 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 5950d5d5b85e..6c8843114afd 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -3388,6 +3388,13 @@ struct iw_node_attr {
>  	int nid;
>  };
>  
> +struct sysfs_wi_group {
> +	struct kobject wi_kobj;
> +	struct iw_node_attr *nattrs[];
> +};
> +
> +static struct sysfs_wi_group *sgrp;
> +
>  static ssize_t node_show(struct kobject *kobj, struct kobj_attribute *attr,
>  			 char *buf)
>  {
> @@ -3430,27 +3437,23 @@ static ssize_t node_store(struct kobject *kobj, struct kobj_attribute *attr,
>  	return count;
>  }
>  
> -static struct iw_node_attr **node_attrs;
> -
> -static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
> -				  struct kobject *parent)
> +static void sysfs_wi_node_release(int nid)

I called this sysfs_wi_node_delete() above because _release() is
typically callback invoked on last put of a kobject.

>  {
> -	if (!node_attr)
> +	if (!sgrp->nattrs[nid])
>  		return;
> -	sysfs_remove_file(parent, &node_attr->kobj_attr.attr);
> -	kfree(node_attr->kobj_attr.attr.name);
> -	kfree(node_attr);
> +
> +	sysfs_remove_file(&sgrp->wi_kobj, &sgrp->nattrs[nid]->kobj_attr.attr);
> +	kfree(sgrp->nattrs[nid]->kobj_attr.attr.name);
> +	kfree(sgrp->nattrs[nid]);
>  }
>  
>  static void sysfs_wi_release(struct kobject *wi_kobj)
>  {
> -	int i;
> -
> -	for (i = 0; i < nr_node_ids; i++)
> -		sysfs_wi_node_release(node_attrs[i], wi_kobj);
> +	int nid;
>  
> -	kfree(node_attrs);
> -	kfree(wi_kobj);
> +	for (nid = 0; nid < nr_node_ids; nid++)
> +		sysfs_wi_node_release(nid);
> +	kfree(sgrp);

This looks broken, are you sure that a kobject with a zero reference can
still host child attributes?

The teardown flow I would expect is:

sysfs_remove_file(node_attrs[i],
kobject_del(wi_kobj)
...that does final kobject_put()...
kfree(container_of(wi_kobj))

However, now I do not think patch1 is actually fixing anything because
there is never a kobject_del() of the mempolicy_kobj. Just like there is
never a kobject_del() of the mm_kobj.

So patch1 seems to potentially be addressing a bug introduced by this
dynamic work which is caused by the original code being confused about
the kobject shutdown path.

The original problems are that sysfs_wi_release() has a kobject_put()
which, yes, is broken, but equally problematic is that there is no
kobject_del() in sight for either of these kobjects(), even with the new
changes. mempolicy_kobj_release() seems to confuse the activities that I
would expect to be near a kobject_del() call with the minimal kfree() on
final put.

>  }
>  
>  static const struct kobj_type wi_ktype = {
> @@ -3458,7 +3461,7 @@ static const struct kobj_type wi_ktype = {
>  	.release = sysfs_wi_release,
>  };
>  
> -static int add_weight_node(int nid, struct kobject *wi_kobj)
> +static int sysfs_wi_node_add(int nid)
>  {
>  	struct iw_node_attr *node_attr;
>  	char *name;
> @@ -3480,57 +3483,44 @@ static int add_weight_node(int nid, struct kobject *wi_kobj)
>  	node_attr->kobj_attr.store = node_store;
>  	node_attr->nid = nid;
>  
> -	if (sysfs_create_file(wi_kobj, &node_attr->kobj_attr.attr)) {
> +	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
>  		kfree(node_attr->kobj_attr.attr.name);
>  		kfree(node_attr);
>  		pr_err("failed to add attribute to weighted_interleave\n");
>  		return -ENOMEM;
>  	}
>  
> -	node_attrs[nid] = node_attr;
> +	sgrp->nattrs[nid] = node_attr;
>  	return 0;
>  }
>  
> -static int add_weighted_interleave_group(struct kobject *root_kobj)
> +static int add_weighted_interleave_group(struct kobject *mempolicy_kobj)
>  {
> -	struct kobject *wi_kobj;
>  	int nid, err;
>  
> -	node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
> -			     GFP_KERNEL);
> -	if (!node_attrs)
> +	sgrp = kzalloc(sizeof(struct sysfs_wi_group) + 			\
> +		       nr_node_ids * sizeof(struct iw_node_attr *),	\
> +		       GFP_KERNEL);

The recommended way to allocate a struct with a flexible array is using
the struct_size() helper.

    kzalloc(struct_size(sgrp, nattrs, nr_node_ids), GFP_KERNEL)

...but overall I think the original code needs a cleanup and to be clear
that I think there is no memory leak risk exposed to existing users
given the shutdown path is never invoked.