[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250324085433.998-1-rakie.kim@sk.com>
Date: Mon, 24 Mar 2025 17:54:27 +0900
From: Rakie Kim <rakie.kim@...com>
To: Rakie Kim <rakie.kim@...com>
Cc: akpm@...ux-foundation.org,
linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
linux-cxl@...r.kernel.org,
joshua.hahnjy@...il.com,
dan.j.williams@...el.com,
ying.huang@...ux.alibaba.com,
david@...hat.com,
Jonathan.Cameron@...wei.com,
kernel_team@...ynix.com,
honggyu.kim@...com,
yunjeong.mun@...com,
Gregory Price <gourry@...rry.net>
Subject: Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted interleave
On Mon, 24 Mar 2025 17:48:39 +0900 Rakie Kim <rakie.kim@...com> wrote:
> On Fri, 21 Mar 2025 10:24:46 -0400 Gregory Price <gourry@...rry.net> wrote:
> > On Thu, Mar 20, 2025 at 01:17:48PM +0900, Rakie Kim wrote:
> > ... snip ...
> > > + mutex_lock(&sgrp->kobj_lock);
> > > + if (sgrp->nattrs[nid]) {
> > > + mutex_unlock(&sgrp->kobj_lock);
> > > + pr_info("Node [%d] already exists\n", nid);
> > > + kfree(new_attr);
> > > + kfree(name);
> > > + return 0;
> > > + }
> > >
> > > - if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> > > - kfree(node_attr->kobj_attr.attr.name);
> > > - kfree(node_attr);
> > > - pr_err("failed to add attribute to weighted_interleave\n");
> > > - return -ENOMEM;
> > > + sgrp->nattrs[nid] = new_attr;
> > > + mutex_unlock(&sgrp->kobj_lock);
> > > +
> > > + sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> > > + sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> > > + sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> > > + sgrp->nattrs[nid]->kobj_attr.show = node_show;
> > > + sgrp->nattrs[nid]->kobj_attr.store = node_store;
> > > + sgrp->nattrs[nid]->nid = nid;
> >
> > These accesses need to be inside the lock as well. Probably we can't
> > get here concurrently, but I can't so so definitively that I'm
> > comfortable blind-accessing it outside the lock.
>
> You're right, and I appreciate your point. It's not difficult to apply your
> suggestion, so I plan to update the code as follows:
>
> sgrp->nattrs[nid] = new_attr;
>
> sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> sgrp->nattrs[nid]->kobj_attr.show = node_show;
> sgrp->nattrs[nid]->kobj_attr.store = node_store;
> sgrp->nattrs[nid]->nid = nid;
>
> ret = sysfs_create_file(&sgrp->wi_kobj,
> &sgrp->nattrs[nid]->kobj_attr.attr);
> if (ret) {
> mutex_unlock(&sgrp->kobj_lock);
> ...
> }
> mutex_unlock(&sgrp->kobj_lock);
>
> >
> > > +static int wi_node_notifier(struct notifier_block *nb,
> > > + unsigned long action, void *data)
> > > +{
> > ... snip ...
> > > + case MEM_OFFLINE:
> > > + sysfs_wi_node_release(nid);
> >
> > I'm still not convinced this is correct. `offline_pages()` says this:
> >
> > /*
> > * {on,off}lining is constrained to full memory sections (or more
> > * precisely to memory blocks from the user space POV).
> > */
> >
> > And that is the function calling:
> > memory_notify(MEM_OFFLINE, &arg);
> >
> > David pointed out that this should be called when offlining each memory
> > block. This is not the same as simply doing `echo 0 > online`, you need
> > to remove the dax device associated with the memory.
> >
> > For example:
> >
> > node1
> > / \
> > dax0.0 dax1.0
> > | |
> > mb1 mb2
> >
> >
> > With this code, if I `daxctl reconfigure-device devmem dax0.0` it will
> > remove the first memory block, causing MEM_OFFLINE event to fire and
> > removing the node - despite the fact that dax1.0 is still present.
> >
> > This matters for systems with memory holes in CXL hotplug memory and
> > also for systems with Dynamic Capacity Devices surfacing capacity as
> > separate dax devices.
> >
> > ~Gregory
>
> If all memory blocks belonging to a node are offlined, the node will lose its
> `N_MEMORY` state before the notifier callback is invoked. This should help avoid
> the issue you mentioned.
> Please let me know your thoughts on this approach.
>
> Rakie
>
I'm sorry, the code is missing.
I may not fully understand the scenario you described, but I think your concern
can be addressed by adding a simple check like the following:
case MEM_OFFLINE:
if (!node_state(nid, N_MEMORY)) --> this point
sysfs_wi_node_release(nid);
If all memory blocks belonging to a node are offlined, the node will lose its
`N_MEMORY` state before the notifier callback is invoked. This should help avoid
the issue you mentioned.
Please let me know your thoughts on this approach.
Rakie.
Powered by blists - more mailing lists