linux-kernel - Re: [patch 3/3] x86/fpu: Make FPU protection more robust

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 05 May 2022 03:21:43 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Filipe Manana <fdmanana@...e.com>, linux-crypto@...r.kernel.org
Subject: Re: [patch 3/3] x86/fpu: Make FPU protection more robust

Jason,

On Thu, May 05 2022 at 03:11, Jason A. Donenfeld wrote:
> On Thu, May 05, 2022 at 02:55:58AM +0200, Thomas Gleixner wrote:
>> > So if truly the only user of this is random.c as of 5.18 (is it? I'm
>> > assuming from a not very thorough survey...), and if the performance
>> > boost doesn't even exist, then yeah, I think it'd make sense to just get
>> > rid of it, and have kernel_fpu_usable() return false in those cases.
>> >
>> > I'll run some benchmarks on a little bit more hardware in representative
>> > cases and see.
>> 
>> Find below a combo patch which makes use of strict softirq serialization
>> for the price of not supporting the hardirq FPU usage. 
>
> Thanks, I'll give it a shot in the morning (3am) when trying to do a
> more realistic benchmark. But just as a synthetic thing, I ran the
> numbers in kBench900 and am getting:
>
>      generic:    430 cycles per call
>        ssse3:    315 cycles per call
>       avx512:    277 cycles per call
>
> for a single call to the compression function, which is the most any of
> those mix_pool_bytes() calls do from add_{input,disk}_randomness(), on
> Tiger Lake, using RDPMC from kernel space.

I'm well aware of the difference between synthetic benchmarks and real
world scenarios and with the more in depth instrumentation of these
things I'm even more concerned that the difference is underestimated.

> This _doesn't_ take into account the price of calling kernel_fpu_begin().
> That's a little hard to bench synthetically by running it in a loop and
> taking medians because of the lazy restoration. But that's an indication
> anyway that I should be looking at the cost of the actual function as
> its running in random.c, rather than the synthetic test. Will keep this
> thread updated.

Appreciated.

Thanks,

        tglx