linux-kernel - Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJD7tkbnYaZFYw0ieor81e--e6qJVgb3045x86c0EKV546TyWw@mail.gmail.com>
Date: Tue, 7 Jan 2025 15:25:54 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Barry Song <baohua@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Johannes Weiner <hannes@...xchg.org>, 
	Nhat Pham <nphamcs@...il.com>, Chengming Zhou <chengming.zhou@...ux.dev>, 
	Vitaly Wool <vitalywool@...il.com>, Sam Sun <samsun1006219@...il.com>, 
	Kanchana P Sridhar <kanchana.p.sridhar@...el.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2 2/2] mm: zswap: disable migration while using per-CPU acomp_ctx

On Tue, Jan 7, 2025 at 2:47 PM Barry Song <baohua@...nel.org> wrote:
>
> On Wed, Jan 8, 2025 at 11:22 AM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> >
> > In zswap_compress() and zswap_decompress(), the per-CPU acomp_ctx of the
> > current CPU at the beginning of the operation is retrieved and used
> > throughout.  However, since neither preemption nor migration are disabled,
> > it is possible that the operation continues on a different CPU.
> >
> > If the original CPU is hotunplugged while the acomp_ctx is still in use,
> > we run into a UAF bug as the resources attached to the acomp_ctx are freed
> > during hotunplug in zswap_cpu_comp_dead().
> >
> > The problem was introduced in commit 1ec3b5fe6eec ("mm/zswap: move to use
> > crypto_acomp API for hardware acceleration") when the switch to the
> > crypto_acomp API was made.  Prior to that, the per-CPU crypto_comp was
> > retrieved using get_cpu_ptr() which disables preemption and makes sure the
> > CPU cannot go away from under us.  Preemption cannot be disabled with the
> > crypto_acomp API as a sleepable context is needed.
> >
> > Commit 8ba2f844f050 ("mm/zswap: change per-cpu mutex and buffer to
> > per-acomp_ctx") increased the UAF surface area by making the per-CPU
> > buffers dynamic, adding yet another resource that can be freed from under
> > zswap compression/decompression by CPU hotunplug.
> >
> > This cannot be fixed by holding cpus_read_lock(), as it is possible for
> > code already holding the lock to fall into reclaim and enter zswap
> > (causing a deadlock). It also cannot be fixed by wrapping the usage of
> > acomp_ctx in an SRCU critical section and using synchronize_srcu() in
> > zswap_cpu_comp_dead(), because synchronize_srcu() is not allowed in
> > CPU-hotplug notifiers (see
> > Documentation/RCU/Design/Requirements/Requirements.rst).
> >
> > This can be fixed by refcounting the acomp_ctx, but it involves
> > complexity in handling the race between the refcount dropping to zero in
> > zswap_[de]compress() and the refcount being re-initialized when the CPU
> > is onlined.
> >
> > Keep things simple for now and just disable migration while using the
> > per-CPU acomp_ctx to block CPU hotunplug until the usage is over.
> >
> > Fixes: 1ec3b5fe6eec ("mm/zswap: move to use crypto_acomp API for hardware acceleration")
> > Cc: <stable@...r.kernel.org>
> > Signed-off-by: Yosry Ahmed <yosryahmed@...gle.com>
> > Reported-by: Johannes Weiner <hannes@...xchg.org>
> > Closes: https://lore.kernel.org/lkml/20241113213007.GB1564047@cmpxchg.org/
> > Reported-by: Sam Sun <samsun1006219@...il.com>
> > Closes: https://lore.kernel.org/lkml/CAEkJfYMtSdM5HceNsXUDf5haghD5+o2e7Qv4OcuruL4tPg6OaQ@mail.gmail.com/
> > ---
> >  mm/zswap.c | 19 ++++++++++++++++---
> >  1 file changed, 16 insertions(+), 3 deletions(-)
> >
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index f6316b66fb236..ecd86153e8a32 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -880,6 +880,18 @@ static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
> >         return 0;
> >  }
> >
> > +/* Remain on the CPU while using its acomp_ctx to stop it from going offline */
> > +static struct crypto_acomp_ctx *acomp_ctx_get_cpu(struct crypto_acomp_ctx __percpu *acomp_ctx)
> > +{
> > +       migrate_disable();
>
> I'm not entirely sure, but I feel it is quite unsafe. Allowing sleep
> during migrate_disable() and
> migrate_enable() would require the entire scheduler, runqueue,
> waitqueue, and CPU
> hotplug mechanisms to be aware that a task is pinned to a specific CPU.

My understanding is that sleeping is already allowed when migration is
disabled (unlike preemption). See delete_all_elements() in
kernel/bpf/hashtab.c for example, or __bpf_prog_enter_sleepable() in
kernel/bpf/trampoline.c. I am not sure exactly what you mean.

>
> If there is no sleep during this period, it seems to be only a
> runqueue issue—CPU hotplug can
> wait for the task to be unpinned while it is always in runqueue.
> However, if sleep is involved,
> the situation becomes significantly more complex.
>
> If static data doesn't consume much memory, it could be the simplest solution.

Do you mean allocating the buffers and requests for all possible CPUs
instead of allocating them dynamically in CPU hotplug notifiers? I am
not sure how much more memory this would be. Seems like it depends on
CONFIG options and the firmware.