linux-kernel - Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJvTdK=mfc3giXCy_fteyR4UiZfnN5f0hvREN4TjXc5KxtiP+w@mail.gmail.com>
Date:   Mon, 29 Mar 2021 18:16:28 -0400
From:   Len Brown <lenb@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     "Chang S. Bae" <chang.seok.bae@...el.com>,
        Borislav Petkov <bp@...e.de>,
        Andy Lutomirski <luto@...nel.org>,
        Ingo Molnar <mingo@...nel.org>, X86 ML <x86@...nel.org>,
        "Brown, Len" <len.brown@...el.com>,
        Dave Hansen <dave.hansen@...el.com>,
        "Liu, Jing2" <jing2.liu@...el.com>,
        "Ravi V. Shankar" <ravi.v.shankar@...el.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the
 first use of dynamic user state

On Mon, Mar 29, 2021 at 2:49 PM Thomas Gleixner <tglx@...utronix.de> wrote:

> According to documentation it is irrelevant whether AMX usage is
> disabled via XCR0, CR4.OSXSAVE or XFD[18]. In any case the effect of
> AMX INIT=0 will prevent C6.
>
> As I explained in great length there are enough ways to get into a
> situation where this can happen and a CPU goes idle with AMX INIT=0.
>
> So what are we supposed to do?

Let me know if this problem description is fair:

Many-core Xeon servers will support AMX, and when I run an AMX application
on one, when I take an interrupt with AMX INIT=0, Linux may go idle on my CPU.
If Linux cpuidle requests C6, the hardware will demote to C1E.

The concern is that a core in C1E will negatively impact power of
self, or performance
of a neighboring core.

This is what we are talking about, right?

First, I should mention that if I threw a dart at a map of Xeons
deployed across the universe, the chances are "significant" that I'd
hit one that is configured with C6 disabled, and this discussion would be moot.

Second, I should mention that Linux cpuidle demotes from deep C-states
to shallow ones all day long.  This is typically due to expected timer
expiration,
and other heuristics.

Third, I should mention that the processor itself demotes from C6 to C1E
for a number of reasons -- basically like what Linux is doing, but in HW.

Albeit, the hardware does have the capability to "un-demote" when it demotes
and recognizes it made a mistake, and that "un-demote" capability would
not be present if the reason for demotion was AVX INIT=0.

Okay, that said, let's assume we have found a system where this problem
could happen, and we use it in a way that makes it happen.  Would we notice?

If your system were profoundly idle, and one or more cores were in C1E,
then it would prevent the SOC from entering Package C6 (if enabled).
Yes, there is a measurable idle power difference between Package C1E
and Package C6.  (indeed, this is why Package C6 exists).

I'm delighted that there are Xeon customers, who care about this power savings.
Unfortunately, they are the exception, not the rule.

If you were to provoke this scenario on many cores simultaneously, then
I expect you could detect a power difference between C1E and CC6.
However, that difference would be smaller than the difference
in power due to the frequency choice of the running cores,
because it is basically just the L2-leakage vs L2-off difference.

Regarding frequency credits for a core being in C1E vs C6.
Yes, this is factored into the frequency credits for turbo mode.
How much impact, I can't say, because that information is not yet available.
However, this is mitigated by the fact that Xeon single core turbo
is deployed differently than client.  Xeon's are deployed
more with multi-core turbo in mind, and so how much you'll
notice C1E vs C6 may not be significant, unless perhaps it happened
on all the cores across the system.

>    - Use TILERELEASE on context switch after XSAVES?

Yes, that would be perfectly reasonable.

>    - Any other mechanism on context switch

XRESTOR of a context with INIT=1 would also do it.

>    - Clear XFD[18] when going idle and issue TILERELEASE depending
>      on the last state

I think you mean to *set* XFD.
When the task touched AMX, he took a #NM, and we cleared XFD for that task.
So when we get here, XFD is already clear (unarmed).
Nevertheless, the setting of XFD is moot here.

>    - Use any other means to set the thing back into INIT=1 state when
>      going idle

TILERELEASE and XRESTOR are the tools in the toolbox, if necessary.

> There is no option 'shrug and ignore' unfortunately.

I'm not going to say it is impossible that this path will matter.
If some terrible things go wrong with the hardware, and the hardware
is configured and used in a very specific way, yes, this could matter.

In the grand scheme of things, this is a pretty small issue,
say, compared to the API discussion.

thanks,
Len Brown, Intel Open Source Technology Center

-Len