[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJD7tkaxS1wjn+swugt8QCvQ-rVF5RZnjxwPGX17k8x9zSManA@mail.gmail.com>
Date: Wed, 8 Jan 2025 16:25:56 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Johannes Weiner <hannes@...xchg.org>,
Nhat Pham <nphamcs@...il.com>, Chengming Zhou <chengming.zhou@...ux.dev>,
Vitaly Wool <vitalywool@...il.com>, Barry Song <baohua@...nel.org>,
Sam Sun <samsun1006219@...il.com>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: Re: [PATCH v2] mm: zswap: properly synchronize freeing resources
during CPU hotunplug
On Wed, Jan 8, 2025 at 4:12 PM Sridhar, Kanchana P
<kanchana.p.sridhar@...el.com> wrote:
>
>
> > -----Original Message-----
> > From: Yosry Ahmed <yosryahmed@...gle.com>
> > Sent: Wednesday, January 8, 2025 2:25 PM
> > To: Andrew Morton <akpm@...ux-foundation.org>
> > Cc: Johannes Weiner <hannes@...xchg.org>; Nhat Pham
> > <nphamcs@...il.com>; Chengming Zhou <chengming.zhou@...ux.dev>;
> > Vitaly Wool <vitalywool@...il.com>; Barry Song <baohua@...nel.org>; Sam
> > Sun <samsun1006219@...il.com>; Sridhar, Kanchana P
> > <kanchana.p.sridhar@...el.com>; linux-mm@...ck.org; linux-
> > kernel@...r.kernel.org; Yosry Ahmed <yosryahmed@...gle.com>;
> > stable@...r.kernel.org
> > Subject: [PATCH v2] mm: zswap: properly synchronize freeing resources
> > during CPU hotunplug
> >
> > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of
> > the
> > current CPU at the beginning of the operation is retrieved and used
> > throughout. However, since neither preemption nor migration are
> > disabled, it is possible that the operation continues on a different
> > CPU.
> >
> > If the original CPU is hotunplugged while the acomp_ctx is still in use,
> > we run into a UAF bug as some of the resources attached to the acomp_ctx
> > are freed during hotunplug in zswap_cpu_comp_dead() (i.e.
> > acomp_ctx.buffer, acomp_ctx.req, or acomp_ctx.acomp).
> >
> > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to
> > use crypto_acomp API for hardware acceleration") when the switch to the
> > crypto_acomp API was made. Prior to that, the per-CPU crypto_comp was
> > retrieved using get_cpu_ptr() which disables preemption and makes sure
> > the CPU cannot go away from under us. Preemption cannot be disabled
> > with the crypto_acomp API as a sleepable context is needed.
> >
> > Use the acomp_ctx.mutex to synchronize CPU hotplug callbacks allocating
> > and freeing resources with compression/decompression paths. Make sure
> > that acomp_ctx.req is NULL when the resources are freed. In the
> > compression/decompression paths, check if acomp_ctx.req is NULL after
> > acquiring the mutex (meaning the CPU was offlined) and retry on the new
> > CPU.
> >
> > The initialization of acomp_ctx.mutex is moved from the CPU hotplug
> > callback to the pool initialization where it belongs (where the mutex is
> > allocated). In addition to adding clarity, this makes sure that CPU
> > hotplug cannot reinitialize a mutex that is already locked by
> > compression/decompression.
> >
> > Previously a fix was attempted by holding cpus_read_lock() [1]. This
> > would have caused a potential deadlock as it is possible for code
> > already holding the lock to fall into reclaim and enter zswap (causing a
> > deadlock). A fix was also attempted using SRCU for synchronization, but
> > Johannes pointed out that synchronize_srcu() cannot be used in CPU
> > hotplug notifiers [2].
> >
> > Alternative fixes that were considered/attempted and could have worked:
> > - Refcounting the per-CPU acomp_ctx. This involves complexity in
> > handling the race between the refcount dropping to zero in
> > zswap_[de]compress() and the refcount being re-initialized when the
> > CPU is onlined.
> > - Disabling migration before getting the per-CPU acomp_ctx [3], but
> > that's discouraged and is a much bigger hammer than needed, and could
> > result in subtle performance issues.
> >
> > [1]https://lkml.kernel.org/20241219212437.2714151-1-
> > yosryahmed@...gle.com/
> > [2]https://lkml.kernel.org/20250107074724.1756696-2-
> > yosryahmed@...gle.com/
> > [3]https://lkml.kernel.org/20250107222236.2715883-2-
> > yosryahmed@...gle.com/
> >
> > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for
> > hardware acceleration")
> > Cc: <stable@...r.kernel.org>
> > Signed-off-by: Yosry Ahmed <yosryahmed@...gle.com>
> > Reported-by: Johannes Weiner <hannes@...xchg.org>
> > Closes:
> > https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
> > Reported-by: Sam Sun <samsun1006219@...il.com>
> > Closes:
> > https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4O
> > curuL4tPg6OaQ@...l.gmail.com/
> > ---
> >
> > This applies on top of the latest mm-hotfixes-unstable on top of 'Revert
> > "mm: zswap: fix race between [de]compression and CPU hotunplug"' and
> > after 'mm: zswap: disable migration while using per-CPU acomp_ctx' was
> > dropped.
> >
> > v1 -> v2:
> > - Move the initialization of the mutex to pool initialization.
> > - Use the mutex to also synchronize with the CPU hotplug callback (i.e.
> > zswap_cpu_comp_prep()).
> > - Naming cleanups.
> >
> > ---
> > mm/zswap.c | 60 +++++++++++++++++++++++++++++++++++++++++---------
> > ----
> > 1 file changed, 46 insertions(+), 14 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb236..4d7e564732267 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -251,7 +251,7 @@ static struct zswap_pool *zswap_pool_create(char
> > *type, char *compressor)
> > struct zswap_pool *pool;
> > char name[38]; /* 'zswap' + 32 char (max) num + \0 */
> > gfp_t gfp = __GFP_NORETRY | __GFP_NOWARN |
> > __GFP_KSWAPD_RECLAIM;
> > - int ret;
> > + int ret, cpu;
> >
> > if (!zswap_has_pool) {
> > /* if either are unset, pool initialization failed, and we
> > @@ -285,6 +285,9 @@ static struct zswap_pool *zswap_pool_create(char
> > *type, char *compressor)
> > goto error;
> > }
> >
> > + for_each_possible_cpu(cpu)
> > + mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);
> > +
> > ret =
> > cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
> > &pool->node);
> > if (ret)
> > @@ -821,11 +824,12 @@ static int zswap_cpu_comp_prepare(unsigned int
> > cpu, struct hlist_node *node)
> > struct acomp_req *req;
> > int ret;
> >
> > - mutex_init(&acomp_ctx->mutex);
> > -
> > + mutex_lock(&acomp_ctx->mutex);
> > acomp_ctx->buffer = kmalloc_node(PAGE_SIZE * 2, GFP_KERNEL,
> > cpu_to_node(cpu));
> > - if (!acomp_ctx->buffer)
> > - return -ENOMEM;
> > + if (!acomp_ctx->buffer) {
> > + ret = -ENOMEM;
> > + goto buffer_fail;
> > + }
> >
> > acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0,
> > cpu_to_node(cpu));
> > if (IS_ERR(acomp)) {
> > @@ -844,6 +848,8 @@ static int zswap_cpu_comp_prepare(unsigned int
> > cpu, struct hlist_node *node)
> > ret = -ENOMEM;
> > goto req_fail;
> > }
> > +
> > + /* acomp_ctx->req must be NULL if the acomp_ctx is not fully
> > initialized */
> > acomp_ctx->req = req;
>
> For this to happen, shouldn't we directly assign:
> acomp_ctx->req = acomp_request_alloc(acomp_ctx->acomp);
> if (!acomp_ctx->req) { ...}
For the initial zswap_cpu_comp_prepare() call on a CPU, it doesn't
matter because a failure will cause a failure of initialization or CPU
onlining. For calls after a CPU is offlined, zswap_cpu_comp_dead()
will have set acomp_ctx->req to NULL, so leaving it unmodified keeps
it as NULL.
Although I suppose the comment is not really clear because
zswap_cpu_comp_prepare() could fail before acomp_request_alloc() is
ever called and this would not be a problem (as the onlining will
fail).
Maybe it's best to just drop the comment.
Andrew, could you please fold this in?
diff --git a/mm/zswap.c b/mm/zswap.c
index 4d7e564732267..30f5a27a68620 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -848,8 +848,6 @@ static int zswap_cpu_comp_prepare(unsigned int
cpu, struct hlist_node *node)
ret = -ENOMEM;
goto req_fail;
}
-
- /* acomp_ctx->req must be NULL if the acomp_ctx is not fully
initialized */
acomp_ctx->req = req;
crypto_init_wait(&acomp_ctx->wait);
>
> I was wondering how error conditions encountered in zswap_cpu_comp_prepare()
> will impact zswap_[de]compress(). This is probably unrelated to this patch itself,
> but is my understanding correct that an error in this procedure will cause
> zswap_enabled to be set to false, which will cause any zswap_stores() to fail early?
Yeah that is my understanding, for the initial call. For later calls
when a CPU gets onlined I think the CPU onlining fails and the
acomp_ctx is not used.
Powered by blists - more mailing lists