linux-kernel - Re: [PATCH] random: use raw spinlocks for use on RT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yw3i2N8J7yz3jnyt@linutronix.de>
Date:   Tue, 30 Aug 2022 12:13:44 +0200
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de
Subject: Re: [PATCH] random: use raw spinlocks for use on RT

On 2022-08-29 15:56:06 [-0400], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi,

> On Mon, Aug 29, 2022 at 09:45:10PM +0200, Sebastian Andrzej Siewior wrote:
> > > So why don't we actually fix this, so we don't have to keep coming up
> > > with hacks? The question is: does using raw spinlocks over this code
> > > result in any real issue for RT latency? If so, I'd like to know where,
> > > and maybe I can do something about that (or maybe I can't). If not, then
> > > this is a non problem and I'll apply this patch with your blessing.
> > 
> > It depends on what you do define as hacks. I suggested an explicit init
> > during boot for everyone. The only "hacky" thing might be the reschedule
> > of the worker every two secs in case random-core isn't ready yet.
> 
> The worker solution you proposed before was problematic in that it
> changes RNG semantics by making jitter entropy run early on at boot
> before even attempting to get entropy from later. Maybe that's an okay
> change, or maybe it's not, but either way it isn't one that should be
> forced by wacky vnsprintf changes.

The first patch did so yes. The second simply retried in two secs and
this shouldn't be problematic.

> Okay but this on-demand aspect of vnsprintf() is clearly a place where
> it makes sense to do it from the occasional irq context.
> 
> > So that local_lock_t is still breaking things since it can not be
> > acquired from blocking context. So in order to continue this needs to be
> > replaced somehow and checked again…
> > Assuming this has been done, round #2:
> > 
> > get_random_bytes()
> > -> _get_random_bytes()
> >   -> crng_make_state()
> >     -> crng_reseed()
> >       -> extract_entropy()
> >         -> blake2s_final()
> > 	  -> blake2s_compress()
> > 	    -> kernel_fpu_begin()…
> 
> kernel_fpu_begin() is no longer used from IRQ context, since there's no
> longer SIMD in IRQ context. So this callgraph isn't representative.

hard-IRQ context yes. But it is still used in preemptible context under
a raw_spinlock_t or with disabled interrupts/ preemption.
In the vsprintf/printk case it is invoked from preemptible context with
disabled interrupts.

> > This blake2s_compress() can be called again within this callchain (via
> > blake2s()). The problem here is that kernel_fpu_begin() disables
> > preemption and the following SIMD operation can be expensive (not to
> > mention the onetime register store) and so it is attempted to have a
> > scheduling point on a regular basis.
> > Invoking this call chain from an already preempt-disabled section would
> > not allow any scheduling at this point (and so build up the max. latency
> > worst case).
> 
> Irrelevant, since kernel_fpu_begin() shouldn't be called in this context
> any more, right?

wrong, see above. It only excludes the in-hardirq users. 

> > After looking at this after a break, while writing this and paging
> > everything in, I still think that initialising the random number at boot
> > up for vsprintf's sake is the easiest thing. One init for RT and non-RT
> > from an initcall. No hack, just one plain and simple init with no need
> > to perform anything later on demand. 
> 
> The "once at boot time" thing does not work here, as I've said over and
> over, if what we're talking about is the workqueued get_random_bytes_wait()
> call. The much smarter thing to do is let entropy be collected for as
> long as possible, and when the RNG is initialized, initialize the
> siphash secret, which is exactly what the current code does. So I think
> the current vnsprintf code can stay the same. What needs fixing, rather,
> are the lack of raw spinlocks in random.c...

Not get_random_bytes_wait() but get_random_bytes() + reschedule, see
	https://lkml.kernel.org/r/YueeIgPGUJgsnsAh@linutronix.de

> In light of my note on kernel_fpu_begin() not being used from IRQ
> context, can you now consider this raw spinlock patch?

No because it gives the wrong motivation to use this without fear from
section with disabled interrupts/ preemption as it is the case in the
current printk example. So it extends the runtime without the need for
it since it could have been upfront at a lower price.

I intend to resend the previously mentioned patch where there is _no_
get_random_bytes_wait() so I don't see how this can be a problem. Do you
see still one?

> Jason

Sebastian