[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzZlT7sO56TzXgNc@debian.me>
Date: Fri, 30 Sep 2022 10:41:03 +0700
From: Bagas Sanjaya <bagasdotme@...il.com>
To: Rick Edgecombe <rick.p.edgecombe@...el.com>
Cc: x86@...nel.org, "H . Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-arch@...r.kernel.org, linux-api@...r.kernel.org,
Arnd Bergmann <arnd@...db.de>,
Andy Lutomirski <luto@...nel.org>,
Balbir Singh <bsingharora@...il.com>,
Borislav Petkov <bp@...en8.de>,
Cyrill Gorcunov <gorcunov@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Eugene Syromiatnikov <esyr@...hat.com>,
Florian Weimer <fweimer@...hat.com>,
"H . J . Lu" <hjl.tools@...il.com>, Jann Horn <jannh@...gle.com>,
Jonathan Corbet <corbet@....net>,
Kees Cook <keescook@...omium.org>,
Mike Kravetz <mike.kravetz@...cle.com>,
Nadav Amit <nadav.amit@...il.com>,
Oleg Nesterov <oleg@...hat.com>, Pavel Machek <pavel@....cz>,
Peter Zijlstra <peterz@...radead.org>,
Randy Dunlap <rdunlap@...radead.org>,
"Ravi V . Shankar" <ravi.v.shankar@...el.com>,
Weijiang Yang <weijiang.yang@...el.com>,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
joao.moreira@...el.com, John Allen <john.allen@....com>,
kcc@...gle.com, eranian@...gle.com, rppt@...nel.org,
jamorris@...ux.microsoft.com, dethoma@...rosoft.com,
Yu-cheng Yu <yu-cheng.yu@...el.com>
Subject: Re: [PATCH v2 01/39] Documentation/x86: Add CET description
On Thu, Sep 29, 2022 at 03:28:58PM -0700, Rick Edgecombe wrote:
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=========================================
> +Control-flow Enforcement Technology (CET)
> +=========================================
> +
> +Overview
> +========
> +
> +Control-flow Enforcement Technology (CET) is term referring to several
> +related x86 processor features that provides protection against control
> +flow hijacking attacks. The HW feature itself can be set up to protect
> +both applications and the kernel. Only user-mode protection is implemented
> +in the 64-bit kernel.
> +
> +CET introduces Shadow Stack and Indirect Branch Tracking. Shadow stack is
> +a secondary stack allocated from memory and cannot be directly modified by
> +applications. When executing a CALL instruction, the processor pushes the
> +return address to both the normal stack and the shadow stack. Upon
> +function return, the processor pops the shadow stack copy and compares it
> +to the normal stack copy. If the two differ, the processor raises a
> +control-protection fault. Indirect branch tracking verifies indirect
> +CALL/JMP targets are intended as marked by the compiler with 'ENDBR'
> +opcodes. Not all CPU's have both Shadow Stack and Indirect Branch Tracking
> +and only Shadow Stack is currently supported in the kernel.
> +
> +The Kconfig options is X86_SHADOW_STACK, and it can be disabled with
> +the kernel parameter clearcpuid, like this: "clearcpuid=shstk".
> +
> +To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or LLVM v10.0.1
> +or later are required. To build a CET-enabled application, GLIBC v2.28 or
> +later is also required.
> +
> +At run time, /proc/cpuinfo shows CET features if the processor supports
> +CET.
> +
> +Application Enabling
> +====================
> +
> +An application's CET capability is marked in its ELF header and can be
> +verified from readelf/llvm-readelf output:
> +
> + readelf -n <application> | grep -a SHSTK
> + properties: x86 feature: SHSTK
> +
> +The kernel does not process these applications directly. Applications must
> +enable them using the interface descriped in section 4. Typically this
> +would be done in dynamic loader or static runtime objects, as is the case
> +in glibc.
> +
> +Backward Compatibility
> +======================
> +
> +GLIBC provides a few CET tunables via the GLIBC_TUNABLES environment
> +variable:
> +
> +GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-WRSS
> + Turn off SHSTK/WRSS.
> +
> +GLIBC_TUNABLES=glibc.tune.x86_shstk=<on, permissive>
> + This controls how dlopen() handles SHSTK legacy libraries::
> +
> + on - continue with SHSTK enabled;
> + permissive - continue with SHSTK off.
> +
> +Details can be found in the GLIBC manual pages.
> +
> +CET arch_prctl()'s
> +==================
> +
> +Elf features should be enabled by the loader using the below arch_prctl's.
> +
> +arch_prctl(ARCH_CET_ENABLE, unsigned int feature)
> + Enable a single feature specified in 'feature'. Can only operate on
> + one feature at a time.
> +
> +arch_prctl(ARCH_CET_DISABLE, unsigned int feature)
> + Disable features specified in 'feature'. Can only operate on
> + one feature at a time.
> +
> +arch_prctl(ARCH_CET_LOCK, unsigned int features)
> + Lock in features at their current enabled or disabled status.
> +
> +The return values are as following:
> + On success, return 0. On error, errno can be::
> +
> + -EPERM if any of the passed feature are locked.
> + -EOPNOTSUPP if the feature is not supported by the hardware or
> + disabled by kernel parameter.
> + -EINVAL arguments (non existing feature, etc)
> +
> +Currently shadow stack and WRSS are supported via this interface. WRSS
> +can only be enabled with shadow stack, and is automatically disabled
> +if shadow stack is disabled.
> +
> +Proc status
> +===========
> +To check if an application is actually running with shadow stack, the
> +user can read the /proc/$PID/arch_status. It will report "wrss" or
> +"shstk" depending on what is enabled.
> +
> +The implementation of the Shadow Stack
> +======================================
> +
> +Shadow Stack size
> +-----------------
> +
> +A task's shadow stack is allocated from memory to a fixed size of
> +MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
> +the maximum size of the normal stack, but capped to 4 GB. However,
> +a compat-mode application's address space is smaller, each of its thread's
> +shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB).
> +
> +Signal
> +------
> +
> +By default, the main program and its signal handlers use the same shadow
> +stack. Because the shadow stack stores only return addresses, a large
> +shadow stack covers the condition that both the program stack and the
> +signal alternate stack run out.
> +
> +The kernel creates a restore token for the shadow stack and pushes the
> +restorer address to the shadow stack. Then verifies that token when
> +restoring from the signal handler.
> +
> +Fork
> +----
> +
> +The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
> +to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
> +shadow access triggers a page fault with the shadow stack access bit set
> +in the page fault error code.
> +
> +When a task forks a child, its shadow stack PTEs are copied and both the
> +parent's and the child's shadow stack PTEs are cleared of the dirty bit.
> +Upon the next shadow stack access, the resulting shadow stack page fault
> +is handled by page copy/re-use.
> +
> +When a pthread child is created, the kernel allocates a new shadow stack
> +for the new thread.
The documentation above can be improved (both grammar and formatting):
---- >8 ----
diff --git a/Documentation/x86/cet.rst b/Documentation/x86/cet.rst
index 6b270a24ebc3a2..f691f7995cf088 100644
--- a/Documentation/x86/cet.rst
+++ b/Documentation/x86/cet.rst
@@ -15,92 +15,101 @@ in the 64-bit kernel.
CET introduces Shadow Stack and Indirect Branch Tracking. Shadow stack is
a secondary stack allocated from memory and cannot be directly modified by
-applications. When executing a CALL instruction, the processor pushes the
+applications. When executing a ``CALL`` instruction, the processor pushes the
return address to both the normal stack and the shadow stack. Upon
function return, the processor pops the shadow stack copy and compares it
to the normal stack copy. If the two differ, the processor raises a
control-protection fault. Indirect branch tracking verifies indirect
-CALL/JMP targets are intended as marked by the compiler with 'ENDBR'
-opcodes. Not all CPU's have both Shadow Stack and Indirect Branch Tracking
-and only Shadow Stack is currently supported in the kernel.
+``CALL``/``JMP`` targets are intended as marked by the compiler with ``ENDBR``
+opcodes. Not all CPUs have both Shadow Stack and Indirect Branch Tracking
+and only Shadow Stack is currently supported by the kernel.
-The Kconfig options is X86_SHADOW_STACK, and it can be disabled with
-the kernel parameter clearcpuid, like this: "clearcpuid=shstk".
+The Kconfig options is ``X86_SHADOW_STACK`` and it can be overridden with
+the kernel command-line parameter ``clearcpuid`` (for example
+``clearcpuid=shstk``).
To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or LLVM v10.0.1
-or later are required. To build a CET-enabled application, GLIBC v2.28 or
+or later are required. To build a CET-enabled application, glibc v2.28 or
later is also required.
-At run time, /proc/cpuinfo shows CET features if the processor supports
-CET.
+At run time, ``/proc/cpuinfo`` shows CET features if the processor supports
+them
-Application Enabling
-====================
+Enabling CET in applications
+============================
-An application's CET capability is marked in its ELF header and can be
-verified from readelf/llvm-readelf output:
+The CET capability of an application is marked in its ELF header and can be
+verified from ``readelf``/``llvm-readelf`` output::
readelf -n <application> | grep -a SHSTK
properties: x86 feature: SHSTK
The kernel does not process these applications directly. Applications must
-enable them using the interface descriped in section 4. Typically this
+enable them using :ref:`cet-arch_prctl`. Typically this
would be done in dynamic loader or static runtime objects, as is the case
in glibc.
Backward Compatibility
======================
-GLIBC provides a few CET tunables via the GLIBC_TUNABLES environment
+glibc provides a few CET tunables via the ``GLIBC_TUNABLES`` environment
variable:
-GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-WRSS
+ * ``GLIBC_TUNABLES=glibc.tune.hwcaps=-SHSTK,-WRSS``
+
Turn off SHSTK/WRSS.
-GLIBC_TUNABLES=glibc.tune.x86_shstk=<on, permissive>
- This controls how dlopen() handles SHSTK legacy libraries::
+ * ``GLIBC_TUNABLES=glibc.tune.x86_shstk=<on, permissive>``
- on - continue with SHSTK enabled;
- permissive - continue with SHSTK off.
+ This controls how :manpage:`dlopen(3)` handles SHSTK legacy libraries.
+ Possible values are:
-Details can be found in the GLIBC manual pages.
+ * ``on`` - continue with SHSTK enabled;
+ * ``permissive`` - continue with SHSTK off.
-CET arch_prctl()'s
-==================
+.. _cet-arch_prctl:
-Elf features should be enabled by the loader using the below arch_prctl's.
+CET arch_prctl() interface
+==========================
-arch_prctl(ARCH_CET_ENABLE, unsigned int feature)
- Enable a single feature specified in 'feature'. Can only operate on
+ELF features should be enabled by the loader using the following
+:manpage:`arch_prctl(2)` subfunctions:
+
+ * ``arch_prctl(ARCH_CET_ENABLE, unsigned int feature)``
+
+ Enable a single feature specified in ``feature``. Can only operate on
one feature at a time.
-arch_prctl(ARCH_CET_DISABLE, unsigned int feature)
- Disable features specified in 'feature'. Can only operate on
+ * ``arch_prctl(ARCH_CET_DISABLE, unsigned int feature)``
+
+ Disable features specified in ``feature``. Can only operate on
one feature at a time.
-arch_prctl(ARCH_CET_LOCK, unsigned int features)
- Lock in features at their current enabled or disabled status.
+ * ``arch_prctl(ARCH_CET_LOCK, unsigned int features)``
+
+ Lock in features at their current status.
+
+ * ``arch_prctl(ARCH_CET_UNLOCK, unsigned int features)``
-arch_prctl(ARCH_CET_UNLOCK, unsigned int features)
Unlock features.
-The return values are as following:
- On success, return 0. On error, errno can be::
+On success, :manpage:`arch_prctl(2)` returns 0, otherwise the errno
+can be:
- -EPERM if any of the passed feature are locked.
- -EOPNOTSUPP if the feature is not supported by the hardware or
- disabled by kernel parameter.
- -EINVAL arguments (non existing feature, etc)
+ - ``EPERM`` if any of the passed feature are locked.
+ - ``EOPNOTSUPP`` if the feature is not supported by the hardware or
+ disabled by the kernel command-line parameter.
+ - ``EINVAL`` if the arguments are invalid (non existing feature, etc).
Currently shadow stack and WRSS are supported via this interface. WRSS
can only be enabled with shadow stack, and is automatically disabled
if shadow stack is disabled.
-Proc status
+proc status
===========
-To check if an application is actually running with shadow stack, the
-user can read the /proc/$PID/arch_status. It will report "wrss" or
-"shstk" depending on what is enabled.
+To check if an application is actually running with shadow stack, users can
+read ``/proc/$PID/arch_status``. It will report ``wrss`` or
+``shstk`` depending on what is enabled.
The implementation of the Shadow Stack
======================================
@@ -108,11 +117,11 @@ The implementation of the Shadow Stack
Shadow Stack size
-----------------
-A task's shadow stack is allocated from memory to a fixed size of
-MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
+The shadow stack of a task is allocated from memory to a fixed size of
+``MIN(RLIMIT_STACK, 4 GB)``. In other words, the shadow stack is allocated to
the maximum size of the normal stack, but capped to 4 GB. However,
-a compat-mode application's address space is smaller, each of its thread's
-shadow stack size is MIN(1/4 RLIMIT_STACK, 4 GB).
+the address space of a compat-mode application is smaller; the shadow stack
+size of each of its thread is ``MIN(1/4 RLIMIT_STACK, 4 GB)``.
Signal
------
@@ -123,19 +132,19 @@ shadow stack covers the condition that both the program stack and the
signal alternate stack run out.
The kernel creates a restore token for the shadow stack and pushes the
-restorer address to the shadow stack. Then verifies that token when
-restoring from the signal handler.
+restorer address to it. Then the kernel verifies that token when restoring
+from the signal handler.
Fork
----
-The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
-to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
+The shadow stack vma has ``VM_SHADOW_STACK`` flag set; its PTEs are required
+to be read-only and dirty. When a shadow stack PTE is read-write and dirty, a
shadow access triggers a page fault with the shadow stack access bit set
in the page fault error code.
When a task forks a child, its shadow stack PTEs are copied and both the
-parent's and the child's shadow stack PTEs are cleared of the dirty bit.
+shadow stack PTEs of parent and child are cleared of the dirty bit.
Upon the next shadow stack access, the resulting shadow stack page fault
is handled by page copy/re-use.
Thanks.
--
An old man doll... just what I always wanted! - Clara
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists