[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXirTffsg1GWk2Yw@kator>
Date: Tue, 27 Jan 2026 22:32:30 +0900
From: Lianjie Wang <karin0.zst@...il.com>
To: Herbert Xu <herbert@...dor.apana.org.au>
Cc: Olivia Mackall <olivia@...enic.com>,
David Laight <david.laight.linux@...il.com>,
Jonathan McDowell <noodles@...a.com>,
linux-crypto@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] hwrng: core - use RCU for current_rng to fix race condition
On Mon, Jan 26, 2026 at 12:40:19PM +0800, Herbert Xu wrote:
> > static struct hwrng *get_current_rng(void)
> > {
> > struct hwrng *rng;
> >
> > - if (mutex_lock_interruptible(&rng_mutex))
> > - return ERR_PTR(-ERESTARTSYS);
> > + rcu_read_lock();
> > + rng = rcu_dereference(current_rng);
> > + if (rng && !kref_get_unless_zero(&rng->ref))
> > + rng = NULL;
>
> rng->ref should never be zero here as the final kref_put is delayed
> by RCU. So this should be a plain kref_get.
Thanks for the review! I will change this to kref_get() in v3.
> > static void put_rng(struct hwrng *rng)
> > {
> > - /*
> > - * Hold rng_mutex here so we serialize in case they set_current_rng
> > - * on rng again immediately.
> > - */
> > - mutex_lock(&rng_mutex);
> > - if (rng)
> > - kref_put(&rng->ref, cleanup_rng);
> > - mutex_unlock(&rng_mutex);
> > + kref_put(&rng->ref, cleanup_rng);
> > }
>
> I think the mutex needs to be kept here as otherwise there is
> a risk of a slow cleanup_rng racing against a subsequent hwrng_init
> on the same RNG.
It is true that cleanup_rng() can race with hwrng_init(). However,
currently put_rng() is also called in hwrng_fillfn(), where we want to
avoid holding rng_mutex to prevent deadlock in hwrng_unregister().
To solve this race, should we introduce a separate lock (e.g.,
init_mutex) to serialize only hwrng_init() and cleanup_rng()?
Alternatively, I think we could stop the thread also in
set_current_rng() before switching current_rng, so that each lifetime of
hwrng_fillfn() thread strictly holds a single RNG instance, avoiding the
need to call get_current_rng() or put_rng() inside hwrng_fillfn().
> > @@ -371,11 +385,10 @@ static ssize_t rng_current_show(struct device *dev,
> > struct hwrng *rng;
> >
> > rng = get_current_rng();
> > - if (IS_ERR(rng))
> > - return PTR_ERR(rng);
> >
> > ret = sysfs_emit(buf, "%s\n", rng ? rng->name : "none");
> > - put_rng(rng);
> > + if (rng)
> > + put_rng(rng);
>
> I don't think this NULL check is necessary as put_rng can handle
> rng == NULL.
I removed the NULL check in put_rng() and moved it to rng_current_show()
in v2, since all the other callers of put_rng() already check for NULL
before calling put_rng(). I can restore the NULL check in put_rng() in
v3 if preferred.
> > @@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
> > struct hwrng *rng;
> >
> > rng = get_current_rng();
> > - if (IS_ERR(rng) || !rng)
> > + if (!rng) {
> > + /* This is only possible within drop_current_rng(),
> > + * so just wait until we are stopped.
> > + */
> > + while (!kthread_should_stop()) {
> > + set_current_state(TASK_INTERRUPTIBLE);
> > + schedule();
> > + }
> > break;
> > + }
> > +
>
> Is the schedule necessary? Shouldn't the break just work as it
> did before?
With the break alone, the task_struct might get freed before
kthread_stop() is called, which can still cause use-after-free
sometimes:
refcount_t: addition on 0; use-after-free.
WARNING: lib/refcount.c:25 at refcount_warn_saturate+0x103/0x130
...
WARNING: kernel/fork.c:778 at __put_task_struct+0x2ea/0x510
...
Call Trace:
<TASK>
kthread_stop+0x347/0x3b0
hwrng_unregister+0x2d4/0x360
remove_common+0x1d3/0x230
virtio_dev_remove+0xb6/0x200
...
</TASK>
As in the description of kthread_create_on_node() from kernel/kthread.c,
it seems we cannot return directly if we plan to call kthread_stop():
* ... @threadfn() can either return directly if it is a
* standalone thread for which no one will call kthread_stop(), or
* return when 'kthread_should_stop()' is true (which means
* kthread_stop() has been called). ...
And in kthread_stop():
* If threadfn() may call kthread_exit() itself, the caller must ensure
* task_struct can't go away.
So I intended to wait until kthread_should_stop() is set. Otherwise, we
might need to call get_task_struct() and put_task_struct() manually to
hold the task_struct and ensure kthread_stop() works.
Alternatively, we could stop the thread *before* clearing current_rng in
drop_current_rng(), so that get_current_rng() will never return NULL
inside hwrng_fillfn(). What do you think?
Best regards,
--
Lianjie Wang
Powered by blists - more mailing lists