[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202205091304.434A9B45@keescook>
Date: Mon, 9 May 2022 14:38:38 -0700
From: Kees Cook <keescook@...omium.org>
To: ira.weiny@...el.com
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>,
Dan Williams <dan.j.williams@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
Rick Edgecombe <rick.p.edgecombe@...el.com>,
"Shankar, Ravi V" <ravi.v.shankar@...el.com>,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org
Subject: Re: [PATCH V10 14/44] mm/pkeys: Introduce pks_set_readwrite()
On Tue, Apr 19, 2022 at 10:06:19AM -0700, ira.weiny@...el.com wrote:
> From: Ira Weiny <ira.weiny@...el.com>
>
> When kernel code needs access to a PKS protected page they will need to
> change the protections for the pkey to Read/Write.
I'm excited to have this infrastructure available! It'll finally give us
the "write rarely" infrastructure we've needed:
https://github.com/KSPP/linux/issues/130
>
> Define pks_set_readwrite() to update the specified pkey. Define
> pks_update_protection() as a helper to do the heavy lifting and allow
> for subsequent pks_set_*() calls.
>
> Define PKEY_READ_WRITE rather than use a magic value of '0' in
> pks_update_protection().
>
> Finally, ensure preemption is disabled for pks_write_pkrs() because the
> context of this call can not generally be predicted.
>
> pks.h is created to avoid conflicts and header dependencies with the
> user space pkey code.
>
> Add documentation.
>
> Signed-off-by: Ira Weiny <ira.weiny@...el.com>
>
> ---
> changes for v9
> Move MSR documentation note to this patch
> move declarations to incline/linux/pks.h
> from rick edgecombe
> change pkey type to u8
> validate pkey range in pks_update_protection
> from 0day
> fix documentation link
> from dave hansen
> s/pks_mk_*/pks_set_*/
> use pkey
> s/pks_saved_pkrs/pkrs/
>
> changes for v8
> define pkey_read_write
> make the call inline
> clean up the names
> use pks_write_pkrs() with preemption disabled
> split this out from 'add pks kernel api'
> include documentation in this patch
> ---
> Documentation/core-api/protection-keys.rst | 15 +++++++++++
> arch/x86/mm/pkeys.c | 31 ++++++++++++++++++++++
> include/linux/pks.h | 31 ++++++++++++++++++++++
> include/uapi/asm-generic/mman-common.h | 1 +
> 4 files changed, 78 insertions(+)
> create mode 100644 include/linux/pks.h
>
> diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
> index fe63acf5abbe..3af92e1cbffd 100644
> --- a/Documentation/core-api/protection-keys.rst
> +++ b/Documentation/core-api/protection-keys.rst
> @@ -142,3 +142,18 @@ Adding pages to a pkey protected domain
>
> .. kernel-doc:: arch/x86/include/asm/pgtable_types.h
> :doc: PKS_KEY_ASSIGNMENT
> +
> +Changing permissions of individual keys
> +---------------------------------------
> +
> +.. kernel-doc:: include/linux/pks.h
> + :identifiers: pks_set_readwrite
> +
> +MSR details
> +~~~~~~~~~~~
> +
> +WRMSR is typically an architecturally serializing instruction. However,
> +WRMSR(MSR_IA32_PKRS) is an exception. It is not a serializing instruction and
> +instead maintains ordering properties similar to WRPKRU. Thus it is safe to
> +immediately use a mapping when the pks_set*() functions returns. Check the
> +latest SDM for details.
> diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
> index 39e4c2cbc279..e4cbc79686ea 100644
> --- a/arch/x86/mm/pkeys.c
> +++ b/arch/x86/mm/pkeys.c
> @@ -6,6 +6,7 @@
> #include <linux/debugfs.h> /* debugfs_create_u32() */
> #include <linux/mm_types.h> /* mm_struct, vma, etc... */
> #include <linux/pkeys.h> /* PKEY_* */
> +#include <linux/pks.h>
> #include <linux/pks-keys.h>
> #include <uapi/asm-generic/mman-common.h>
>
> @@ -275,4 +276,34 @@ void pks_setup(void)
> cr4_set_bits(X86_CR4_PKS);
> }
>
> +/*
> + * Do not call this directly, see pks_set*().
> + *
> + * @pkey: Key for the domain to change
> + * @protection: protection bits to be used
> + *
> + * Protection utilizes the same protection bits specified for User pkeys
> + * PKEY_DISABLE_ACCESS
> + * PKEY_DISABLE_WRITE
> + *
> + */
> +void pks_update_protection(u8 pkey, u8 protection)
For better security, I think this should be a static inline, not a
callable (i.e. as a non-inline it could be the target of a control
flow attack).
> +{
> + u32 pkrs;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_PKS))
> + return;
> +
> + if (WARN_ON_ONCE(pkey >= PKS_KEY_MAX))
> + return;
I think this should enforce arguments being __builtin_constant_p(). i.e.
making sure that all callers of pks_update_protection() are using a
compile-time constant value. That makes it so that the caller location
and key become hard-coded (i.e. further reduction in risk to becoming a
control-flow gadget: the inlining of a const value means no arguments
any more). For example:
BUILD_BUG_ON(!__builtin_constant_p(pkey));
BUILD_BUG_ON(!__builtin_constant_p(protection));
(I think the test code will need some tweaking, but it should be
possible to adjust it.)
> +
> + pkrs = current->thread.pkrs;
> + current->thread.pkrs = pkey_update_pkval(pkrs, pkey,
> + protection);
> + preempt_disable();
> + pks_write_pkrs(current->thread.pkrs);
> + preempt_enable();
To resist cross-thread attacks, please:
- make pkey_update_pkval() also an inline
- use the pkrs variable directly and store it back only after the write
For example:
preempt_disable();
pkrs = pkey_update_pkval(current->thread.pkrs,
pkey, protection);
pks_write_pkrs(pkrs);
current->thread.pkrs = pkrs;
preempt_enable();
This means that the pkey/protection relationship always lives in a
CPU-local register and cannot be manipulated by another CPU before the
msr write completes. Better yet would be:
preempt_disable();
rdmsrl(MSR_IA32_PKRS, pkrs);
pkrs = pkey_update_pkval(pkrs, pkey, protection);
pks_write_pkrs(pkrs);
current->thread.pkrs = pkrs;
preempt_enable();
Then cross-thread attacks cannot corrupt the _other_ PKS keys (i.e.
write the desired changes to target's current->thread.kprs and trigger
an update to a different pkey, resulting in flushing the attacker's
changes to that CPU's pkey state.
> +/**
> + * pks_set_readwrite() - Make the domain Read/Write
> + * @pkey: the pkey for which the access should change.
> + *
> + * Allow all access, read and write, to the domain specified by pkey. This is
> + * not a global update and only affects the current running thread.
> + */
> +static inline void pks_set_readwrite(u8 pkey)
> +{
> + pks_update_protection(pkey, PKEY_READ_WRITE);
> +}
While adding these, can you please also add pks_set_nowrite()? This
will be needed for protecting writes to memory that should be otherwise
readable.
With these changes it should be possible to protect the kernel's page
table entries from "stray" writes. :)
-Kees
> +
> +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
> +
> +static inline void pks_set_readwrite(u8 pkey) {}
> +
> +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */
> +
> +#endif /* _LINUX_PKS_H */
> diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
> index 6c1aa92a92e4..f179544bd33a 100644
> --- a/include/uapi/asm-generic/mman-common.h
> +++ b/include/uapi/asm-generic/mman-common.h
> @@ -80,6 +80,7 @@
> /* compatibility flags */
> #define MAP_FILE 0
>
> +#define PKEY_READ_WRITE 0x0
> #define PKEY_DISABLE_ACCESS 0x1
> #define PKEY_DISABLE_WRITE 0x2
> #define PKEY_ACCESS_MASK (PKEY_DISABLE_ACCESS |\
> --
> 2.35.1
>
--
Kees Cook
Powered by blists - more mailing lists