[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1817232.MPthNTNLIG@tjmaciei-mobl5>
Date: Tue, 13 Jul 2021 12:13:16 -0700
From: Thiago Macieira <thiago.macieira@...el.com>
To: "Chang S. Bae" <chang.seok.bae@...el.com>
CC: <bp@...e.de>, <luto@...nel.org>, <tglx@...utronix.de>,
<mingo@...nel.org>, <x86@...nel.org>, <len.brown@...el.com>,
<dave.hansen@...el.com>, <jing2.liu@...el.com>,
<ravi.v.shankar@...el.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v7 12/26] x86/fpu/xstate: Use feature disable (XFD) to protect dynamic user state
On Saturday, 10 July 2021 06:02:59 PDT Chang S. Bae wrote:
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index a58800973aed..f45b2cefd6cf 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -1112,6 +1112,44 @@ DEFINE_IDTENTRY(exc_device_not_available)
[cut]
> + /* Raise a signal when it failed to handle.
> */ + if (err)
> + force_sig(SIGSEGV);
> + }
> + return;
Hello Chang
Can I make a suggestion that you send a different signal than SIGSEGV for the
failure of unauthorised instructions? I would recommend SIGILL. Additionally,
please consider a new ILL_* constant for the si_code field.
I have multiple reasons for that:
1) the XFD failure is not a memory issue, so SIGSEGV is not really
appropriate, despite coming from an #NM interrupt
2) SIGILL is sent for the AMX instructions in other circumstances, due to CPU
#UD, notably:
- running on a CPU without AMX support
- running under an OS that did not enable the AMX state in XCR0 (like Linux
before this patch series)
When a developer is debugging code and sees a SIGILL on a valid instruction
stream in disassembly, they know they've got to code they should never have
got to, bypassing CPU checks. Forgetting to ask for permission is now a
variant of that case.
3) the very first AMX instruction to cause the #NM is likely going to be an
LDTILECFG or TILELOADD, which are memory-related instructions, so may #GP for
using bad pointers (and LDTILECFG can #GP for bad tile configurations).
Knowing that the issue was the instruction itself instead of the pointer or
data being loaded is going to come in handy.
4) SIGSEGV will also be sent for another reason by the kernel. Your cover
message had:
> 4. Applications touching AMX without permission results in process exit.
>
> Armed XFD results in #NM, results in SIGSEGV, typically resulting in
> process exit.
> 6. NM handler allocation failure results in process exit.
>
> If the #NM handler can not allocate the 8KB buffer, the task will
> receive a SIGSEGV at the instruction that took the #NM fault, typically
> resulting in process exit.
Knowing that it was caused by reaching code that shouldn't have been reached,
instead of an OOM issue, is handy.
Do note that this SIGSEGV for allocation is unlikely to happen. If the kernel
is under memory pressure, the OOM killer will probably kick in and may kill
(SIGKILL) this process instead. But at least #6 is a legitimate memory issue.
On the same topic, is there a way to save this state in a core dump? The FS
and GS bases would also be very handy.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel DPG Cloud Engineering
Powered by blists - more mailing lists