[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87c2a595a79b1421a36a5db6ae43960bf6cece64.camel@infradead.org>
Date: Tue, 26 Nov 2024 16:09:58 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Dave Hansen <dave.hansen@...el.com>, kexec@...ts.infradead.org,
Schönherr, "Jan H."
<jschoenh@...zon.de>, Rik van Riel <riel@...riel.com>
Cc: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>, "Kirill A. Shutemov"
<kirill.shutemov@...ux.intel.com>, Kai Huang <kai.huang@...el.com>, Nikolay
Borisov <nik.borisov@...e.com>, linux-kernel@...r.kernel.org, Simon Horman
<horms@...nel.org>, Dave Young <dyoung@...hat.com>, Peter Zijlstra
<peterz@...radead.org>, jpoimboe@...nel.org, bsz@...zon.de
Subject: Re: [RFC PATCH] x86/mm: Disable PTI for kernel_ident_mapping_init()
On Tue, 2024-11-26 at 07:49 -0800, Dave Hansen wrote:
> On 11/26/24 03:42, David Woodhouse wrote:
> > I threw this version together and it didn't immediately explode...
>
> It's better than playing #define games. The damage is also pretty
> limited and it helps us avoid plumbing a bit through the page table
> handling function arguments.
Thanks. I guess I should write it a commit message then... and add
Cc:stable since AFAICT the other users of this *are* already broken.
From b474ea925a2ea566061552d8a6ca9e4a2945f45d Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@...zon.co.uk>
Date: Tue, 26 Nov 2024 15:55:13 +0000
Subject: [PATCH 05/20] x86/mm: Add _PAGE_NOPTISHADOW bit to avoid updating
userspace page tables
The set_p4d() and set_pgd() functions (in 4-level or 5-level page table setups
respectively) assume that the root page table is actually a 8KiB allocation,
with the userspace root immediately after the kernel root page table (so that
the former can enforce NX on on all the subordinate pages, which are actually
shared).
However, users of the kernel_ident_mapping_init() code do not give it an 8KiB
allocation for its PGD. Both swsusp_arch_resume() and acpi_mp_setup_reset()
allocate only a single 4KiB page. The kexec code on x86_64 currently gets
away with it purely by chance, because it allocates 8KiB for its "control
code page" and then actually uses the first half for the PGD, then copies the
actual trampoline code into the second half only after the identmap code has
finished scribbling over it.
Fix this by defining a _PAGE_NOPTISHADOW bit (which can use the same bit as
_PAGE_SAVED_DIRTY since one is only for the PGD/P4D root and the other is
exclusively for leaf PTEs.). This instructs __pti_set_user_pgtbl() not to
write to the userspace 'shadow' PGD.
Strictly, the _PAGE_NOPTISHADOW bit doesn't need to be written out to the
actual page tables; since __pti_set_user_pgtbl() returns the value to be
written to the kernel page table, it could be filtered out. But there seems
to be no benefit to actually doing so.
Cc: stable@...nel.org
Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
---
arch/x86/include/asm/pgtable_types.h | 8 ++++++--
arch/x86/mm/ident_map.c | 6 +++---
arch/x86/mm/pti.c | 2 +-
3 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 6f82e75b6149..4b804531b03c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -36,10 +36,12 @@
#define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
#ifdef CONFIG_X86_64
-#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit */
+#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW5 /* Saved Dirty bit (leaf) */
+#define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW5 /* No PTI shadow (root PGD) */
#else
/* Shared with _PAGE_BIT_UFFD_WP which is not supported on 32 bit */
-#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit */
+#define _PAGE_BIT_SAVED_DIRTY _PAGE_BIT_SOFTW2 /* Saved Dirty bit (leaf) */
+#define _PAGE_BIT_NOPTISHADOW _PAGE_BIT_SOFTW2 /* No PTI shadow (root PGD) */
#endif
/* If _PAGE_BIT_PRESENT is clear, we use these: */
@@ -139,6 +141,8 @@
#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)
+#define _PAGE_NOPTISHADOW (_AT(pteval_t, 1) << _PAGE_BIT_NOPTISHADOW)
+
/*
* Set of bits not changed in pte_modify. The pte's
* protection key is treated like _PAGE_RW, for
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 437e96fb4977..5ab7bd2f1983 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -174,7 +174,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page,
if (result)
return result;
- set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag));
+ set_p4d(p4d, __p4d(__pa(pud) | info->kernpg_flag | _PAGE_NOPTISHADOW));
}
return 0;
@@ -218,14 +218,14 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page,
if (result)
return result;
if (pgtable_l5_enabled()) {
- set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag));
+ set_pgd(pgd, __pgd(__pa(p4d) | info->kernpg_flag | _PAGE_NOPTISHADOW));
} else {
/*
* With p4d folded, pgd is equal to p4d.
* The pgd entry has to point to the pud page table in this case.
*/
pud_t *pud = pud_offset(p4d, 0);
- set_pgd(pgd, __pgd(__pa(pud) | info->kernpg_flag));
+ set_pgd(pgd, __pgd(__pa(pud) | info->kernpg_flag | _PAGE_NOPTISHADOW));
}
}
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 851ec8f1363a..5f0d579932c6 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -132,7 +132,7 @@ pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
* Top-level entries added to init_mm's usermode pgd after boot
* will not be automatically propagated to other mms.
*/
- if (!pgdp_maps_userspace(pgdp))
+ if (!pgdp_maps_userspace(pgdp) || (pgd.pgd & _PAGE_NOPTISHADOW))
return pgd;
/*
--
2.43.0
Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)
Powered by blists - more mailing lists