linux-kernel - Re: [PATCH v2 01/17] asm: simd context helper API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <01BF319B-D6F3-432F-AE1A-1B8B4E3A36A4@amacapital.net>
Date:   Sun, 26 Aug 2018 07:25:01 -0700
From:   Andy Lutomirski <luto@...capital.net>
To:     "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>,
        David Miller <davem@...emloft.net>,
        Andrew Lutomirski <luto@...nel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Samuel Neves <sneves@....uc.pt>, linux-arch@...r.kernel.org,
        Rik van Riel <riel@...riel.com>
Subject: Re: [PATCH v2 01/17] asm: simd context helper API




> On Aug 26, 2018, at 7:18 AM, Jason A. Donenfeld <Jason@...c4.com> wrote:
> 
> On Sun, Aug 26, 2018 at 8:06 AM Thomas Gleixner <tglx@...utronix.de> wrote:
>>> Do you mean to say you intend to make kernel_fpu_end() and
>>> kernel_neon_end() only actually do something upon context switch, but
>>> not when it's actually called? So that multiple calls to
>>> kernel_fpu_begin() and kernel_neon_begin() can be made without
>>> penalty?
>> 
>> On context switch and exit to user. That allows to keep those code pathes
>> fully preemptible. Still twisting my brain around the details.
> 
> Just to make sure we're on the same page, the goal is so that this code:
> 
> kernel_fpu_begin();
> kernel_fpu_end();
> kernel_fpu_begin();
> kernel_fpu_end();
> kernel_fpu_begin();
> kernel_fpu_end();
> kernel_fpu_begin();
> kernel_fpu_end();
> kernel_fpu_begin();
> kernel_fpu_end();
> kernel_fpu_begin();
> kernel_fpu_end();
> ...
> 
> has the same performance as this code:
> 
> kernel_fpu_begin();
> kernel_fpu_end();
> 
> (Unless of course the process is preempted or the like.)
> 
> Currently the present situation makes the performance of the above
> wildly different, since kernel_fpu_end() does something immediately.
> 
> What about something like this:
> - Add a tristate flag connected to task_struct (or in the global fpu
> struct in the case that this happens in irq and there isn't a valid
> current).
> - On kernel_fpu_begin(), if the flag is 0, do the usual expensive
> XSAVE stuff, and set the flag to 1.
> - On kernel_fpu_begin(), if the flag is non-0, just set the flag to 1
> and return.
> - On kernel_fpu_end(), if the flag is non-0, set the flag to 2.
> (Otherwise WARN() or BUG() or something.)
> - On context switch / preemption / etc away from the task, if the flag
> is non-0, XRSTOR and such.

It’s not that simple. First, these states need names, at least for thinking about. 0 is “user state in regs”. 1 is “kernel state active”. 2 is “nothing active”.

The actual encoding will be something like TIF_XSTATE_UNLOADED: user state is not in regs.  TIF_KERNEL_XSTATE: kernel is using FPU. And this fundamentally doubles the size of struct fpu.

Tglx, that doubling-the-size-of-fpu makes me question the idea of letting the kernel use the fpu while preemptible.

> - On context switch / preemption / etc back to the task, if the flag
> is 1, XSAVE and such. If the flag is 2, set it to 0.
> 



> Jason