[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXbwM1wiPKqmC94v@gondor.apana.org.au>
Date: Mon, 26 Jan 2026 12:40:19 +0800
From: Herbert Xu <herbert@...dor.apana.org.au>
To: Lianjie Wang <karin0.zst@...il.com>
Cc: Olivia Mackall <olivia@...enic.com>,
David Laight <david.laight.linux@...il.com>,
Jonathan McDowell <noodles@...a.com>, linux-crypto@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] hwrng: core - use RCU for current_rng to fix race
condition
On Sun, Jan 25, 2026 at 04:55:55AM +0900, Lianjie Wang wrote:
> Currently, hwrng_fill is not cleared until the hwrng_fillfn() thread
> exits. Since hwrng_unregister() reads hwrng_fill outside the rng_mutex
> lock, a concurrent hwrng_unregister() may call kthread_stop() again on
> the same task.
>
> Additionally, if the hwrng_unregister() call happens immediately after a
> hwrng_register() before, the stopped thread may have never been running,
> and thus hwrng_fill remains dirty even after the hwrng_unregister() call
> returns. In this case, further calls to hwrng_register() may not start
> new threads, and hwrng_unregister() will also call kthread_stop() on the
> same task, causing use-after-free and sometimes lockups:
>
> refcount_t: addition on 0; use-after-free.
> WARNING: ... at lib/refcount.c:25 refcount_warn_saturate+0xec/0x1c0
> Call Trace:
> kthread_stop+0x181/0x360
> hwrng_unregister+0x288/0x380
> virtrng_remove+0xe3/0x200
>
> This patch fixes the race by protecting the global hwrng_fill pointer
> inside the rng_mutex lock, so that hwrng_fillfn() thread is stopped only
> once, and calls to kthread_create() and kthread_stop() are serialized
> with the lock held.
>
> To avoid deadlock in hwrng_fillfn() while being stopped,
> get_current_rng() and put_rng() no longer hold the rng_mutex lock now.
> Instead, we convert current_rng to RCU.
>
> With hwrng_fill protected by the rng_mutex lock, hwrng_fillfn() can no
> longer clear hwrng_fill itself. Therefore, the kthread_stop() call is
> moved from hwrng_unregister() to drop_current_rng(), where the lock is
> already held. This ensures the task is joined via kthread_stop() on all
> possible paths (whether kthread_should_stop() is set, or
> get_current_rng() starts returning NULL).
>
> Since get_current_rng() no longer returns ERR_PTR values, the IS_ERR()
> checks are removed from its callers. The NULL check is also moved from
> put_rng() to its caller rng_current_show(), since all the other callers
> of put_rng() already check for NULL.
>
> Fixes: be4000bc4644 ("hwrng: create filler thread")
> Suggested-by: Herbert Xu <herbert@...dor.apana.org.au>
> Signed-off-by: Lianjie Wang <karin0.zst@...il.com>
> ---
> v2:
> - Convert the lock for get_current_rng() to RCU to break the deadlock, as
> suggested by Herbert Xu.
> - Remove rng_mutex from put_rng() and move NULL check to rng_current_show().
> - Move kthread_stop() to drop_current_rng() inside the lock to join the task
> on all paths, avoiding modifying hwrng_fill inside hwrng_fillfn().
> - Revert changes to rng_fillbuf.
>
> v1: https://lore.kernel.org/linux-crypto/20251221122448.246531-1-karin0.zst@gmail.com/
>
> drivers/char/hw_random/core.c | 145 +++++++++++++++++++---------------
> 1 file changed, 81 insertions(+), 64 deletions(-)
Thanks, this looks pretty good!
> static struct hwrng *get_current_rng(void)
> {
> struct hwrng *rng;
>
> - if (mutex_lock_interruptible(&rng_mutex))
> - return ERR_PTR(-ERESTARTSYS);
> + rcu_read_lock();
> + rng = rcu_dereference(current_rng);
> + if (rng && !kref_get_unless_zero(&rng->ref))
> + rng = NULL;
rng->ref should never be zero here as the final kref_put is delayed
by RCU. So this should be a plain kref_get.
> static void put_rng(struct hwrng *rng)
> {
> - /*
> - * Hold rng_mutex here so we serialize in case they set_current_rng
> - * on rng again immediately.
> - */
> - mutex_lock(&rng_mutex);
> - if (rng)
> - kref_put(&rng->ref, cleanup_rng);
> - mutex_unlock(&rng_mutex);
> + kref_put(&rng->ref, cleanup_rng);
> }
I think the mutex needs to be kept here as otherwise there is
a risk of a slow cleanup_rng racing against a subsequent hwrng_init
on the same RNG.
> @@ -371,11 +385,10 @@ static ssize_t rng_current_show(struct device *dev,
> struct hwrng *rng;
>
> rng = get_current_rng();
> - if (IS_ERR(rng))
> - return PTR_ERR(rng);
>
> ret = sysfs_emit(buf, "%s\n", rng ? rng->name : "none");
> - put_rng(rng);
> + if (rng)
> + put_rng(rng);
I don't think this NULL check is necessary as put_rng can handle
rng == NULL.
> @@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
> struct hwrng *rng;
>
> rng = get_current_rng();
> - if (IS_ERR(rng) || !rng)
> + if (!rng) {
> + /* This is only possible within drop_current_rng(),
> + * so just wait until we are stopped.
> + */
> + while (!kthread_should_stop()) {
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule();
> + }
> break;
> + }
> +
Is the schedule necessary? Shouldn't the break just work as it
did before?
Cheers,
--
Email: Herbert Xu <herbert@...dor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Powered by blists - more mailing lists