linux-kernel - Re: RFC: Intervals to schedule the worker for mix_interrupt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHmME9rqhUcVD+rE5NEcvRu6G79aNkf8RZwXZnWtHNmTf3rfaw@mail.gmail.com>
Date:   Mon, 28 Feb 2022 19:58:05 +0100
From:   "Jason A. Donenfeld" <Jason@...c4.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Eric Biggers <ebiggers@...nel.org>,
        "Theodore Ts'o" <tytso@....edu>,
        Dominik Brodowski <linux@...inikbrodowski.net>
Subject: Re: RFC: Intervals to schedule the worker for mix_interrupt_randomness().

Hi Sebastian,

I'm actually trying quite hard not to change the details of entropy
gathering for 5.18. There are lots of little arguments for why each
little thing is the way it is, and these have built up over a long
period of time, and reworking /that/ requires some anthropological
work to understand all the original intents, which I consider to be a
bit of a different "project" from what I'm working on now with
random.c. So I'd like to minimize changes to the semantics. Right now,
those semantics are:

A) crng_init==0: pre_init_inject after 64 interrupts.
B) crng_init!=0: mix_pool_bytes after 64 interrupts OR after 1 second
has elapsed.

In trying to "reverse engineer" what the rationales for (A) and (B)
might be, I imagine the reasoning breaks down to:

A) Since crng_pre_init_inject will increase crng_init_cnt by 16, we
want to make sure it's decently entropic [maybe? kind of a weak
argument perhaps].
B) We're crediting only 1 bit in this case, so let's consider that to
have accumulated after 64 fairly fast interrupts, which may be
"regular", or after a fewer amount that occur within a second, since
these might be irregular, and so perhaps more entropic per-each. That
sounds mostly sensible to me, and rather conservative too. So it seems
okay with me.

Maybe we can revisit these for 5.19, but I'd like not to tinker too
much with that for now. I think I can actually argue both for and
against the points I tried to synthesize in (A) and (B) above. It's
also related to what we collect, which you alluded to in your message.
Right now we're collecting:

1) random_get_entropy() (i.e. RDTSC): this is the big huge important
thing that's captured.
2) jiffies: does this matter if we're already gathering RDTSC? I can
see an argument both ways. More analysis and thought required
3) the irq number: maybe it matters, maybe it doesn't.
4) the return address: maybe it matters, maybe it doesn't.
5) the value of some register: maybe it matters, maybe it doesn't.

Maybe this could be reduced to only capturing (1); maybe we benefit
from having all of (1)-(5). Again, I can see the argument both ways,
and I think that needs a lot more investigation and thought, and if
that's to happen, it really seems like a 5.19 thing rather than a 5.18
thing at this point.

But all this brings me to what I'm really wondering when reading your
email: do your observations matter? Are you observing a performance or
reliability issue or something like that with those workqueues
pending? Is this whole workqueue approach a mistake and we should
revert it? Or is it still okay, but you were just idly wondering about
that time limit? As you can tell, I'm mostly concerned with not
breaking something by accident.

Regards,
Jason