linux-kernel - Re: [patch 3/3] x86/fpu: Make FPU protection more robust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87czgtjlfq.ffs@tglx>
Date:   Wed, 04 May 2022 18:45:45 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Filipe Manana <fdmanana@...e.com>, linux-crypto@...r.kernel.org
Subject: Re: [patch 3/3] x86/fpu: Make FPU protection more robust

Jason,

On Wed, May 04 2022 at 17:55, Jason A. Donenfeld wrote:
> On Wed, May 04, 2022 at 05:36:38PM +0200, Thomas Gleixner wrote:
>> But the only use case which utilizes FPU from hard interrupt context is
>> the random generator via add_randomness_...().
>> 
>> I did a benchmark of these functions, which invoke blake2s_update()
>> three times in a row, on a SKL-X and a ZEN3. The generic code and the
>> FPU accelerated code are pretty much on par vs. execution time of the
>> algorithm itself plus/minus noise.
>>
>> IOW, using the FPU blindly for this kind of computations is not
>> necessarily a good plan. I have no idea how these things are analyzed
>> and evaluated if at all. Maybe the crypto people can shed some light on
>> this.
>
> drivers/net/wireguard/{noise,cookie}.c makes pretty heavy use of BLAKE2s
> in hot paths where the FPU is already being used for other algorithms,
> and so there the save/restore is worth it (assuming restore finally
> works lazily). In benchmarks, the SIMD code made a real difference.

I'm sure there are very valid use cases, but just the two things I
looked at turned out to be questionable at least.

> But this presumably regards mix_pool_bytes() in the RNG. If it turns out
> that supporting the FPU in hard IRQ context is a major PITA, and the
> RNG

Supporting FPU in hard interrupt context is possible if required and the
preexisting bug which survived 10+ years has been fixed.
x
I just started to look into this because of that bug and due to the
inconsistency between the FPU protections we have. The inconsistency
comes from the hardirq requirement.

> is the only thing making use of it, then sure, drop hard IRQ context
> support for it. However... This may be unearthing a larger bug.
> Sebastian and I put in a decent amount of work during 5.18 to remove all
> calls to mix_pool_bytes() (and hence to blake2s_compress()) from
> add_interrupt_randomness(). Have a look:

I know.

> It now accumulates in some per-CPU buffer, and then every 64 interrupts
> a worker runs that does the actual mix_pool_bytes() from kthread
> context.

That's add_interrupt_randomness() and not affected by this.

> So the question is: what is still hitting mix_pool_bytes() from hard IRQ
> context? I'll investigate a bit and see.

add_disk_randomness() on !RT kernels. That's what made me look into this
in the first place as it unearthed the long standing FPU protection
bug. See the first patch in this thread.

Possibly add_device_randomness() too, but I haven't seen evidence so far.

Thanks,

        tglx