lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXlpzqh9K6bqBh4T@gondor.apana.org.au>
Date: Wed, 28 Jan 2026 09:43:42 +0800
From: Herbert Xu <herbert@...dor.apana.org.au>
To: Lianjie Wang <karin0.zst@...il.com>
Cc: Olivia Mackall <olivia@...enic.com>,
	David Laight <david.laight.linux@...il.com>,
	Jonathan McDowell <noodles@...a.com>, linux-crypto@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] hwrng: core - use RCU for current_rng to fix race
 condition

On Tue, Jan 27, 2026 at 10:32:30PM +0900, Lianjie Wang wrote:
>
> It is true that cleanup_rng() can race with hwrng_init(). However,
> currently put_rng() is also called in hwrng_fillfn(), where we want to
> avoid holding rng_mutex to prevent deadlock in hwrng_unregister().
> 
> To solve this race, should we introduce a separate lock (e.g.,
> init_mutex) to serialize only hwrng_init() and cleanup_rng()?
> 
> Alternatively, I think we could stop the thread also in
> set_current_rng() before switching current_rng, so that each lifetime of
> hwrng_fillfn() thread strictly holds a single RNG instance, avoiding the
> need to call get_current_rng() or put_rng() inside hwrng_fillfn().

Yes I missed the dead-lock.

I suggest that we delay the work in cleanup_rng into a work_struct,
IOW cleanup_work will simply schedule a work_struct to perform the
actual cleanup.

Then the work_struct can take the mutex safely, immediately check
the reference count on the rng, and if it is non-zero it should return
because the rng has been re-used in the mean time.

If it is zero then it can proceed with the clean-up while holding the
mutex, thus preventing any re-use from occurring.

> I removed the NULL check in put_rng() and moved it to rng_current_show()
> in v2, since all the other callers of put_rng() already check for NULL
> before calling put_rng(). I can restore the NULL check in put_rng() in
> v3 if preferred.

Let's keep this patch simpler by keeping things as they are.  If
it turns out that moving the NULL check makes the code more readable,
you can do that in a follow-up patch.

> > > @@ -489,8 +502,17 @@ static int hwrng_fillfn(void *unused)
> > >  		struct hwrng *rng;
> > > 
> > >  		rng = get_current_rng();
> > > -		if (IS_ERR(rng) || !rng)
> > > +		if (!rng) {
> > > +			/* This is only possible within drop_current_rng(),
> > > +			 * so just wait until we are stopped.
> > > +			 */
> > > +			while (!kthread_should_stop()) {
> > > +				set_current_state(TASK_INTERRUPTIBLE);
> > > +				schedule();
> > > +			}
> > >  			break;
> > > +		}
> > > +
> > 
> > Is the schedule necessary? Shouldn't the break just work as it
> > did before?
> 
> With the break alone, the task_struct might get freed before
> kthread_stop() is called, which can still cause use-after-free
> sometimes:

Please change the comment above to say that this loop is needed
to keep the task_struct alive for kthread_stop as it isn't obvious.

I wonder why nobody has added a helper for this since this seems
to be a common pattern in other kthread users? For example,
fs/ext4/mmp.c also does this, and they also set the task state
to TASK_RUNNING, perhaps we should do that too?

Thanks,
-- 
Email: Herbert Xu <herbert@...dor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ