lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z912rrV4xTOBwEP7@gourry-fedora-PF4VCD3F>
Date: Fri, 21 Mar 2025 10:24:46 -0400
From: Gregory Price <gourry@...rry.net>
To: Rakie Kim <rakie.kim@...com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, linux-cxl@...r.kernel.org,
	joshua.hahnjy@...il.com, dan.j.williams@...el.com,
	ying.huang@...ux.alibaba.com, david@...hat.com,
	Jonathan.Cameron@...wei.com, kernel_team@...ynix.com,
	honggyu.kim@...com, yunjeong.mun@...com
Subject: Re: [PATCH v3 3/3] mm/mempolicy: Support memory hotplug in weighted
 interleave

On Thu, Mar 20, 2025 at 01:17:48PM +0900, Rakie Kim wrote:
... snip ...
> +	mutex_lock(&sgrp->kobj_lock);
> +	if (sgrp->nattrs[nid]) {
> +		mutex_unlock(&sgrp->kobj_lock);
> +		pr_info("Node [%d] already exists\n", nid);
> +		kfree(new_attr);
> +		kfree(name);
> +		return 0;
> +	}
>  
> -	if (sysfs_create_file(&sgrp->wi_kobj, &node_attr->kobj_attr.attr)) {
> -		kfree(node_attr->kobj_attr.attr.name);
> -		kfree(node_attr);
> -		pr_err("failed to add attribute to weighted_interleave\n");
> -		return -ENOMEM;
> +	sgrp->nattrs[nid] = new_attr;
> +	mutex_unlock(&sgrp->kobj_lock);
> +
> +	sysfs_attr_init(&sgrp->nattrs[nid]->kobj_attr.attr);
> +	sgrp->nattrs[nid]->kobj_attr.attr.name = name;
> +	sgrp->nattrs[nid]->kobj_attr.attr.mode = 0644;
> +	sgrp->nattrs[nid]->kobj_attr.show = node_show;
> +	sgrp->nattrs[nid]->kobj_attr.store = node_store;
> +	sgrp->nattrs[nid]->nid = nid;

These accesses need to be inside the lock as well.  Probably we can't
get here concurrently, but I can't so so definitively that I'm
comfortable blind-accessing it outside the lock.

> +static int wi_node_notifier(struct notifier_block *nb,
> +			       unsigned long action, void *data)
> +{
... snip ...
> +	case MEM_OFFLINE:
> +		sysfs_wi_node_release(nid);

I'm still not convinced this is correct.  `offline_pages()` says this:

/*
 * {on,off}lining is constrained to full memory sections (or more
 * precisely to memory blocks from the user space POV).
 */

And that is the function calling:
	memory_notify(MEM_OFFLINE, &arg);

David pointed out that this should be called when offlining each memory
block.  This is not the same as simply doing `echo 0 > online`, you need
to remove the dax device associated with the memory.

For example:

      node1
    /       \
 dax0.0    dax1.0
   |          |
  mb1        mb2


With this code, if I `daxctl reconfigure-device devmem dax0.0` it will
remove the first memory block, causing MEM_OFFLINE event to fire and
removing the node - despite the fact that dax1.0 is still present.

This matters for systems with memory holes in CXL hotplug memory and
also for systems with Dynamic Capacity Devices surfacing capacity as
separate dax devices.

~Gregory

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ