lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXbwM1wiPKqmC94v@gondor.apana.org.au>
Date: Mon, 26 Jan 2026 12:40:19 +0800
From: Herbert Xu <herbert@...dor.apana.org.au>
To: Lianjie Wang <karin0.zst@...il.com>
Cc: Olivia Mackall <olivia@...enic.com>,
	David Laight <david.laight.linux@...il.com>,
	Jonathan McDowell <noodles@...a.com>, linux-crypto@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] hwrng: core - use RCU for current_rng to fix race
 condition

On Sun, Jan 25, 2026 at 04:55:55AM +0900, Lianjie Wang wrote:
> Currently, hwrng_fill is not cleared until the hwrng_fillfn() thread
> exits. Since hwrng_unregister() reads hwrng_fill outside the rng_mutex
> lock, a concurrent hwrng_unregister() may call kthread_stop() again on
> the same task.
> 
> Additionally, if the hwrng_unregister() call happens immediately after a
> hwrng_register() before, the stopped thread may have never been running,
> and thus hwrng_fill remains dirty even after the hwrng_unregister() call
> returns. In this case, further calls to hwrng_register() may not start
> new threads, and hwrng_unregister() will also call kthread_stop() on the
> same task, causing use-after-free and sometimes lockups:
> 
> refcount_t: addition on 0; use-after-free.
> WARNING: ... at lib/refcount.c:25 refcount_warn_saturate+0xec/0x1c0
> Call Trace:
>  kthread_stop+0x181/0x360
>  hwrng_unregister+0x288/0x380
>  virtrng_remove+0xe3/0x200
> 
> This patch fixes the race by protecting the global hwrng_fill pointer
> inside the rng_mutex lock, so that hwrng_fillfn() thread is stopped only
> once, and calls to kthread_create() and kthread_stop() are serialized
> with the lock held.
> 
> To avoid deadlock in hwrng_fillfn() while being stopped,
> get_current_rng() and put_rng() no longer hold the rng_mutex lock now.
> Instead, we convert current_rng to RCU.
> 
> With hwrng_fill protected by the rng_mutex lock, hwrng_fillfn() can no
> longer clear hwrng_fill itself. Therefore, the kthread_stop() call is
> moved from hwrng_unregister() to drop_current_rng(), where the lock is
> already held. This ensures the task is joined via kthread_stop() on all
> possible paths (whether kthread_should_stop() is set, or
> get_current_rng() starts returning NULL).
> 
> Since get_current_rng() no longer returns ERR_PTR values, the IS_ERR()
> checks are removed from its callers. The NULL check is also moved from
> put_rng() to its caller rng_current_show(), since all the other callers
> of put_rng() already check for NULL.
> 
> Fixes: be4000bc4644 ("hwrng: create filler thread")
> Suggested-by: Herbert Xu <herbert@...dor.apana.org.au>
> Signed-off-by: Lianjie Wang <karin0.zst@...il.com>
> ---
> v2:
>  - Convert the lock for get_current_rng() to RCU to break the deadlock, as
>    suggested by Herbert Xu.
>  - Remove rng_mutex from put_rng() and move NULL check to rng_current_show().
>  - Move kthread_stop() to drop_current_rng() inside the lock to join the task
>    on all paths, avoiding modifying hwrng_fill inside hwrng_fillfn().
>  - Revert changes to rng_fillbuf.
> 
> v1: https://lore.kernel.org/linux-crypto/20251221122448.246531-1-karin0.zst@gmail.com/
> 
>  drivers/char/hw_random/core.c | 145 +++++++++++++++++++---------------
>  1 file changed, 81 insertions(+), 64 deletions(-)

Thanks, this looks pretty good!

>  static struct hwrng *get_current_rng(void)
>  {
>  	struct hwrng *rng;
> 
> -	if (mutex_lock_interruptible(&rng_mutex))
> -		return ERR_PTR(-ERESTARTSYS);
> +	rcu_read_lock();
> +	rng = rcu_dereference(current_rng);
> +	if (rng && !kref_get_unless_zero(&rng->ref))
> +		rng = NULL;

rng->ref should never be zero here as the final kref_put is delayed
by RCU.  So this should be a plain kref_get.

>  static void put_rng(struct hwrng *rng)
>  {
> -	/*
> -	 * Hold rng_mutex here so we serialize in case they set_current_rng
> -	 * on rng again immediately.
> -	 */
> -	mutex_lock(&rng_mutex);
> -	if (rng)
> -		kref_put(&rng->ref, cleanup_rng);
> -	mutex_unlock(&rng_mutex);
> +	kref_put(&rng->ref, cleanup_rng);
>  }

I think the mutex needs to be kept here as otherwise there is
a risk of a slow cleanup_rng racing against a subsequent hwrng_init
on the same RNG.

> @@ -371,11 +385,10 @@ static ssize_t rng_current_show(struct device *dev,
>  	struct hwrng *rng;
> 
>  	rng = get_current_rng();
> -	if (IS_ERR(rng))
> -		return PTR_ERR(rng);
> 
>  	ret = sysfs_emit(buf, "%s\n", rng ? rng->name : "none");
> -	put_rng(rng);
> +	if (rng)
> +		put_rng(rng);

I don't think this NULL check is necessary as put_rng can handle
rng == NULL.

> @@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
>  		struct hwrng *rng;
> 
>  		rng = get_current_rng();
> -		if (IS_ERR(rng) || !rng)
> +		if (!rng) {
> +			/* This is only possible within drop_current_rng(),
> +			 * so just wait until we are stopped.
> +			 */
> +			while (!kthread_should_stop()) {
> +				set_current_state(TASK_INTERRUPTIBLE);
> +				schedule();
> +			}
>  			break;
> +		}
> +

Is the schedule necessary? Shouldn't the break just work as it
did before?

Cheers,
-- 
Email: Herbert Xu <herbert@...dor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ