linux-kernel - Re: [PATCH v12 20/23] mm: zswap: Per-CPU acomp_ctx resources exist from pool creation to deletion.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6xb7feds424kfld4udwmbtccftwnnx6vmbpvmjcwlionfdlmuj@vz4uzh6tog5g>
Date: Tue, 30 Sep 2025 21:20:02 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-mm@...ck.org" <linux-mm@...ck.org>, "hannes@...xchg.org" <hannes@...xchg.org>, 
	"nphamcs@...il.com" <nphamcs@...il.com>, "chengming.zhou@...ux.dev" <chengming.zhou@...ux.dev>, 
	"usamaarif642@...il.com" <usamaarif642@...il.com>, "ryan.roberts@....com" <ryan.roberts@....com>, 
	"21cnbao@...il.com" <21cnbao@...il.com>, "ying.huang@...ux.alibaba.com" <ying.huang@...ux.alibaba.com>, 
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "senozhatsky@...omium.org" <senozhatsky@...omium.org>, 
	"sj@...nel.org" <sj@...nel.org>, "kasong@...cent.com" <kasong@...cent.com>, 
	"linux-crypto@...r.kernel.org" <linux-crypto@...r.kernel.org>, "herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>, 
	"davem@...emloft.net" <davem@...emloft.net>, "clabbe@...libre.com" <clabbe@...libre.com>, 
	"ardb@...nel.org" <ardb@...nel.org>, "ebiggers@...gle.com" <ebiggers@...gle.com>, 
	"surenb@...gle.com" <surenb@...gle.com>, "Accardi, Kristen C" <kristen.c.accardi@...el.com>, 
	"Gomes, Vinicius" <vinicius.gomes@...el.com>, "Feghali, Wajdi K" <wajdi.k.feghali@...el.com>, 
	"Gopal, Vinodh" <vinodh.gopal@...el.com>
Subject: Re: [PATCH v12 20/23] mm: zswap: Per-CPU acomp_ctx resources exist
 from pool creation to deletion.

> > > > >  static struct zswap_pool *zswap_pool_create(char *compressor)
> > > > >  {
> > > > >  	struct zswap_pool *pool;
> > > > > @@ -263,19 +287,43 @@ static struct zswap_pool
> > > > *zswap_pool_create(char *compressor)
> > > > >
> > > > >  	strscpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
> > > > >
> > > > > -	pool->acomp_ctx = alloc_percpu(*pool->acomp_ctx);
> > > > > +	/* Many things rely on the zero-initialization. */
> > > > > +	pool->acomp_ctx = alloc_percpu_gfp(*pool->acomp_ctx,
> > > > > +					   GFP_KERNEL | __GFP_ZERO);
> > > > >  	if (!pool->acomp_ctx) {
> > > > >  		pr_err("percpu alloc failed\n");
> > > > >  		goto error;
> > > > >  	}
> > > > >
> > > > > -	for_each_possible_cpu(cpu)
> > > > > -		mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);
> > > > > -
> > > > > +	/*
> > > > > +	 * This is serialized against CPU hotplug operations. Hence, cores
> > > > > +	 * cannot be offlined until this finishes.
> > > > > +	 * In case of errors, we need to goto "ref_fail" instead of "error"
> > > > > +	 * because there is no teardown callback registered anymore, for
> > > > > +	 * cpuhp_state_add_instance() to de-allocate resources as it rolls
> > > > back
> > > > > +	 * state on cores before the CPU on which error was encountered.
> > > > > +	 */
> > > > >  	ret =
> > > > cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
> > > > >  				       &pool->node);
> > > > > +
> > > > > +	/*
> > > > > +	 * We only needed the multi state instance add operation to invoke
> > > > the
> > > > > +	 * startup callback for all cores without cores getting offlined. Since
> > > > > +	 * the acomp_ctx resources will now only be de-allocated when the
> > > > pool
> > > > > +	 * is destroyed, we can safely remove the multi state instance. This
> > > > > +	 * minimizes (but does not eliminate) the possibility of
> > > > > +	 * zswap_cpu_comp_prepare() being invoked again due to a CPU
> > > > > +	 * offline-online transition. Removing the instance also prevents race
> > > > > +	 * conditions between CPU onlining after initial pool creation, and
> > > > > +	 * acomp_ctx_dealloc() freeing the acomp_ctx resources.
> > > > > +	 * Note that we delete the instance before checking the error status
> > > > of
> > > > > +	 * the node list add operation because we want the instance removal
> > > > even
> > > > > +	 * in case of errors in the former.
> > > > > +	 */
> > > > > +	cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
> > > > &pool->node);
> > > > > +
> > > >
> > > > I don't understand what's wrong with the current flow? We call
> > > > cpuhp_state_remove_instance() in pool deletion before freeing up the
> > > > per-CPU resources. Why is this not enough?
> > >
> > > This is because with the changes proposed in this commit, the multi state
> > > add instance is used during pool creation as a way to create acomp_ctx
> > > resources correctly with just the offline/online state transitions guaranteed
> > > by CPU hotplug, without needing additional mutex locking as in the
> > mainline.
> > > In other words, the consistency wrt safely creating/deleting acomp_ctx
> > > resources with the changes being proposed is accomplished by the hotplug
> > > state transitions guarantee. Stated differently, the hotplug framework
> > > helps enforce the new design during pool creation without relying on the
> > > mutex and subsequent simplifications during zswap_[de]compress()
> > > proposed in this commit.
> > >
> > > Once this is done, deleting the CPU hotplug state seems cleaner, and
> > reflects
> > > the change in policy of the resources' lifetime. It also prevents race
> > conditions
> > > between zswap_cpu_comp_prepare() and acomp_ctx_dealloc() called from
> > > zswap_pool_destroy().
> > 
> > How is a race with zswap_cpu_comp_prepare() possible if we call
> > cpuhp_state_remove_instance() before acomp_ctx_dealloc() in the pool
> > deletion path?
> 
> Good point. I agree, calling cpuhp_state_remove_instance() before
> acomp_ctx_dealloc() will not cause a race. However, if we consider the
> time from pool creation to deletion: if there is an online-offline-online
> transition, can zswap_cpu_comp_prepare() race with the call to
> cpuhp_state_remove_instance()? If so, wouldn't this cause unpredictable
> behavior?

How will this race happen?

cpuhp_state_remove_instance() is called while a pool is being destroyed,
while zswap_cpu_comp_prepare() while the pool is being created or during
CPU onlining.

The former cannot race, and the latter should be synchronized by hotplug
code.

> 
> I agree, this can occur even with the code in this commit, but there is
> less risk of things going wrong because we remove the CPU hotplug
> instance before the pool is added to zswap_pools.
> 
> Further, removing the CPU hotplug instance directly codifies the
> intent of this commit, i.e., to use this as a facilitator and manage memory
> allotted to acomp_ctx, but not to manage those resources' lifetime
> thereafter.
> 
> Do you see any advantage of keeping the call to cpuhp_state_remove_instance()
> occur before acomp_ctx_dealloc() in zswap_pool_destroy()? Please let me know
> if I am missing something.

What about more CPUs going online? Without the hotplug instance we don't
get per-CPU resources for those. We are not using the hotplug mechanism
just to facilitate per-CPU resource allocation, we use it to
automatically allocate resources for newly onlined CPUs without having
to preallocate for all possible CPUs.

Also, this makes the code more difficult to reason about, and is an
unncessary change from the current behavior.

The only change needed is to drop the teardown callback and do the
freeing in the pool destruction path instead.