linux-kernel - Re: [PATCH v4 2/2] random: defer fast pool mixing to worker

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YgVTpI/sYLecyWa3@linutronix.de>
Date:   Thu, 10 Feb 2022 19:04:20 +0100
From:   Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     linux-kernel@...r.kernel.org, linux-crypto@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Theodore Ts'o <tytso@....edu>,
        Sultan Alsawaf <sultan@...neltoast.com>,
        Jonathan Neuschäfer <j.neuschaefer@....net>,
        Eric Biggers <ebiggers@...nel.org>
Subject: Re: [PATCH v4 2/2] random: defer fast pool mixing to worker

On 2022-02-09 13:56:44 [+0100], Jason A. Donenfeld wrote:
> +static void mix_interrupt_randomness(struct work_struct *work)
> +{
> +	struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix);
> +	u8 pool[sizeof(fast_pool->pool)];

So.
- CPU1 schedules a worker
- CPU1 goes offline before the gets on the CPU.
- The worker runs CPU2
- CPU2 is back online
- and now
   CPU1						CPU2
   new_count = ++fast_pool->count;
    reg = fast_pool->count (FAST_POOL_MIX_INFLIGHT | 64)
    incl reg (FAST_POOL_MIX_INFLIGHT | 65)
    						WRITE_ONCE(fast_pool->count, 0);
    fast_pool->count = reg ((FAST_POOL_MIX_INFLIGHT | 65)

So we lost the WRITE_ONCE(, 0), FAST_POOL_MIX_INFLIGHT is still set and
worker is not scheduled. Not easy to trigger, not by an ordinary user.
Just wanted to mention…

…
> @@ -999,9 +1016,10 @@ void add_interrupt_randomness(int irq)
>  
>  	fast_mix(fast_pool);
>  	add_interrupt_bench(cycles);
> +	new_count = ++fast_pool->count;
>  
>  	if (unlikely(crng_init == 0)) {
> -		if ((fast_pool->count >= 64) &&
> +		if (new_count >= 64 &&
>  		    crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) {

crng_fast_load() does spin_trylock_irqsave() in hardirq context. It does
not produce any warning on RT but is still wrong IMHO:
- lockdep will see a random task and I remember in the past it produced
  strange lock chains based on this.

- Should another task attempt to acquire this lock then it will PI-boost the
  wrong task.

If we just could move this, too.

I don't know how timing critical this is but the first backtrace from
crng_fast_load() came (to my surprise) from hwrng_fillfn() (a kthread)
and added 64bytes in one go.

I did move that crng_fast_load() into the worker and did made some
numbers:
           <idle>-0       [000] d..h1..     2.069924: add_interrupt_randomness: Tick

first interrupt
…
        swapper/0-1       [000] d..h.11     2.341938: add_interrupt_randomness: Tick
        swapper/0-1       [000] d..h.11     2.341938: add_interrupt_randomness: work

the 64th interrupt, scheduling the worker.

        swapper/0-1       [000] d..h.11     2.345937: add_interrupt_randomness: Tick
        swapper/0-1       [000] d..h111     2.349938: add_interrupt_randomness: Tick
        swapper/0-1       [000] d..h.11     2.353939: add_interrupt_randomness: Tick
        swapper/0-1       [000] d..h.11     2.357940: add_interrupt_randomness: Tick
        swapper/0-1       [000] d..h111     2.361939: add_interrupt_randomness: Tick
        swapper/0-1       [000] d..h111     2.365939: add_interrupt_randomness: Tick
        swapper/0-1       [000] d..h.11     2.369941: add_interrupt_randomness: Tick
     kworker/0:0H-6       [000] .......     2.384714: mix_interrupt_randomness: load
     kworker/0:0H-6       [000] .......     2.384715: crng_fast_load: 16
           <idle>-0       [001] dn.h1..     3.205766: add_interrupt_randomness: Tick
           <idle>-0       [019] dn.h1..     6.771047: add_interrupt_randomness: Tick

7 interrupts got lost before the worker could run & load first 16 bytes.
The workqueue core gets initialized at that point and spawns first
worker. After that the interrupts took a break.
And then the work-to-load delay was quite low:

           <idle>-0       [019] dn.h1..     7.586234: add_interrupt_randomness: Tick
           <idle>-0       [019] dn.h1..     7.586234: add_interrupt_randomness: work
    kworker/19:0H-175     [019] .......     7.586504: mix_interrupt_randomness: load
    kworker/19:0H-175     [019] .......     7.586507: crng_fast_load: 16
           <idle>-0       [020] dn.h1..     7.614649: add_interrupt_randomness: Tick
           <idle>-0       [020] dn.h1..     7.614651: add_interrupt_randomness: work
           <idle>-0       [020] dn.h1..     7.614736: add_interrupt_randomness: Tick
    kworker/20:0H-183     [020] dn.h...     7.614859: add_interrupt_randomness: Tick
    kworker/20:0H-183     [020] .......     7.614871: mix_interrupt_randomness: load
    kworker/20:0H-183     [020] .......     7.614872: crng_fast_load: 16
           <idle>-0       [018] dn.h1..     8.352423: add_interrupt_randomness: Tick
           <idle>-0       [018] dn.h1..     8.352423: add_interrupt_randomness: work
    kworker/18:0H-167     [018] dn.h1..     8.352438: add_interrupt_randomness: Tick
    kworker/18:0H-167     [018] dn.h1..     8.352448: add_interrupt_randomness: Tick
    kworker/18:0H-167     [018] dn.h1..     8.352459: add_interrupt_randomness: Tick
    kworker/18:0H-167     [018] dn.h1..     8.352491: add_interrupt_randomness: Tick
    kworker/18:0H-167     [018] .......     8.352505: mix_interrupt_randomness: load
    kworker/18:0H-167     [018] .......     8.352506: crng_fast_load: 16

In total we lost 13 ticks.
I did the same test on PREEMPT_VOLUNTARY and lost 2 ticks only.

Sebastian