lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 05 May 2022 03:21:43 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Borislav Petkov <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Filipe Manana <fdmanana@...e.com>, linux-crypto@...r.kernel.org
Subject: Re: [patch 3/3] x86/fpu: Make FPU protection more robust

Jason,

On Thu, May 05 2022 at 03:11, Jason A. Donenfeld wrote:
> On Thu, May 05, 2022 at 02:55:58AM +0200, Thomas Gleixner wrote:
>> > So if truly the only user of this is random.c as of 5.18 (is it? I'm
>> > assuming from a not very thorough survey...), and if the performance
>> > boost doesn't even exist, then yeah, I think it'd make sense to just get
>> > rid of it, and have kernel_fpu_usable() return false in those cases.
>> >
>> > I'll run some benchmarks on a little bit more hardware in representative
>> > cases and see.
>> 
>> Find below a combo patch which makes use of strict softirq serialization
>> for the price of not supporting the hardirq FPU usage. 
>
> Thanks, I'll give it a shot in the morning (3am) when trying to do a
> more realistic benchmark. But just as a synthetic thing, I ran the
> numbers in kBench900 and am getting:
>
>      generic:    430 cycles per call
>        ssse3:    315 cycles per call
>       avx512:    277 cycles per call
>
> for a single call to the compression function, which is the most any of
> those mix_pool_bytes() calls do from add_{input,disk}_randomness(), on
> Tiger Lake, using RDPMC from kernel space.

I'm well aware of the difference between synthetic benchmarks and real
world scenarios and with the more in depth instrumentation of these
things I'm even more concerned that the difference is underestimated.

> This _doesn't_ take into account the price of calling kernel_fpu_begin().
> That's a little hard to bench synthetically by running it in a loop and
> taking medians because of the lazy restoration. But that's an indication
> anyway that I should be looking at the cost of the actual function as
> its running in random.c, rather than the synthetic test. Will keep this
> thread updated.

Appreciated.

Thanks,

        tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ