[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202002251214.8B2063AA87@keescook>
Date: Tue, 25 Feb 2020 12:14:34 -0800
From: Kees Cook <keescook@...omium.org>
To: Yu-cheng Yu <yu-cheng.yu@...el.com>
Cc: x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
linux-doc@...r.kernel.org, linux-mm@...ck.org,
linux-arch@...r.kernel.org, linux-api@...r.kernel.org,
Arnd Bergmann <arnd@...db.de>,
Andy Lutomirski <luto@...nel.org>,
Balbir Singh <bsingharora@...il.com>,
Borislav Petkov <bp@...en8.de>,
Cyrill Gorcunov <gorcunov@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Eugene Syromiatnikov <esyr@...hat.com>,
Florian Weimer <fweimer@...hat.com>,
"H.J. Lu" <hjl.tools@...il.com>, Jann Horn <jannh@...gle.com>,
Jonathan Corbet <corbet@....net>,
Mike Kravetz <mike.kravetz@...cle.com>,
Nadav Amit <nadav.amit@...il.com>,
Oleg Nesterov <oleg@...hat.com>, Pavel Machek <pavel@....cz>,
Peter Zijlstra <peterz@...radead.org>,
Randy Dunlap <rdunlap@...radead.org>,
"Ravi V. Shankar" <ravi.v.shankar@...el.com>,
Vedvyas Shanbhogue <vedvyas.shanbhogue@...el.com>,
Dave Martin <Dave.Martin@....com>, x86-patch-review@...el.com
Subject: Re: [RFC PATCH v9 12/27] x86/mm: Modify ptep_set_wrprotect and
pmdp_set_wrprotect for _PAGE_DIRTY_SW
On Wed, Feb 05, 2020 at 10:19:20AM -0800, Yu-cheng Yu wrote:
> When Shadow Stack (SHSTK) is enabled, the [R/O + PAGE_DIRTY_HW] setting is
> reserved only for SHSTK. Non-Shadow Stack R/O PTEs are
> [R/O + PAGE_DIRTY_SW].
>
> When a PTE goes from [R/W + PAGE_DIRTY_HW] to [R/O + PAGE_DIRTY_SW], it
> could become a transient SHSTK PTE in two cases.
>
> The first case is that some processors can start a write but end up seeing
> a read-only PTE by the time they get to the Dirty bit, creating a transient
> SHSTK PTE. However, this will not occur on processors supporting SHSTK
> therefore we don't need a TLB flush here.
>
> The second case is that when the software, without atomic, tests & replaces
> PAGE_DIRTY_HW with PAGE_DIRTY_SW, a transient SHSTK PTE can exist. This is
> prevented with cmpxchg.
>
> Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many
> insights to the issue. Jann Horn provided the cmpxchg solution.
>
> v9:
> - Change compile-time conditionals to runtime checks.
> - Fix parameters of try_cmpxchg(): change pte_t/pmd_t to
> pte_t.pte/pmd_t.pmd.
>
> v4:
> - Implement try_cmpxchg().
>
> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@...el.com>
Reviewed-by: Kees Cook <keescook@...omium.org>
-Kees
> ---
> arch/x86/include/asm/pgtable.h | 66 ++++++++++++++++++++++++++++++++++
> 1 file changed, 66 insertions(+)
>
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 2733e7ec16b3..43cb27379208 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -1253,6 +1253,39 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
> static inline void ptep_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep)
> {
> + /*
> + * Some processors can start a write, but end up seeing a read-only
> + * PTE by the time they get to the Dirty bit. In this case, they
> + * will set the Dirty bit, leaving a read-only, Dirty PTE which
> + * looks like a Shadow Stack PTE.
> + *
> + * However, this behavior has been improved and will not occur on
> + * processors supporting Shadow Stack. Without this guarantee, a
> + * transition to a non-present PTE and flush the TLB would be
> + * needed.
> + *
> + * When changing a writable PTE to read-only and if the PTE has
> + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so that
> + * the PTE is not a valid Shadow Stack PTE.
> + */
> +#ifdef CONFIG_X86_64
> + if (static_cpu_has(X86_FEATURE_SHSTK)) {
> + pte_t new_pte, pte = READ_ONCE(*ptep);
> +
> + do {
> + /*
> + * This is the same as moving _PAGE_DIRTY_HW
> + * to _PAGE_DIRTY_SW.
> + */
> + new_pte = pte_wrprotect(pte);
> + new_pte.pte |= (new_pte.pte & _PAGE_DIRTY_HW) >>
> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW;
> + new_pte.pte &= ~_PAGE_DIRTY_HW;
> + } while (!try_cmpxchg(&ptep->pte, &pte.pte, new_pte.pte));
> +
> + return;
> + }
> +#endif
> clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
> }
>
> @@ -1303,6 +1336,39 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
> static inline void pmdp_set_wrprotect(struct mm_struct *mm,
> unsigned long addr, pmd_t *pmdp)
> {
> + /*
> + * Some processors can start a write, but end up seeing a read-only
> + * PMD by the time they get to the Dirty bit. In this case, they
> + * will set the Dirty bit, leaving a read-only, Dirty PMD which
> + * looks like a Shadow Stack PMD.
> + *
> + * However, this behavior has been improved and will not occur on
> + * processors supporting Shadow Stack. Without this guarantee, a
> + * transition to a non-present PMD and flush the TLB would be
> + * needed.
> + *
> + * When changing a writable PMD to read-only and if the PMD has
> + * _PAGE_DIRTY_HW set, we move that bit to _PAGE_DIRTY_SW so that
> + * the PMD is not a valid Shadow Stack PMD.
> + */
> +#ifdef CONFIG_X86_64
> + if (static_cpu_has(X86_FEATURE_SHSTK)) {
> + pmd_t new_pmd, pmd = READ_ONCE(*pmdp);
> +
> + do {
> + /*
> + * This is the same as moving _PAGE_DIRTY_HW
> + * to _PAGE_DIRTY_SW.
> + */
> + new_pmd = pmd_wrprotect(pmd);
> + new_pmd.pmd |= (new_pmd.pmd & _PAGE_DIRTY_HW) >>
> + _PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW;
> + new_pmd.pmd &= ~_PAGE_DIRTY_HW;
> + } while (!try_cmpxchg(&pmdp->pmd, &pmd.pmd, new_pmd.pmd));
> +
> + return;
> + }
> +#endif
> clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp);
> }
>
> --
> 2.21.0
>
--
Kees Cook
Powered by blists - more mailing lists