[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250110-asi-rfc-v2-v2-16-8419288bc805@google.com>
Date: Fri, 10 Jan 2025 18:40:42 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Richard Henderson <richard.henderson@...aro.org>, Matt Turner <mattst88@...il.com>,
Vineet Gupta <vgupta@...nel.org>, Russell King <linux@...linux.org.uk>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, Guo Ren <guoren@...nel.org>,
Brian Cain <bcain@...cinc.com>, Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>, Geert Uytterhoeven <geert@...ux-m68k.org>,
Michal Simek <monstr@...str.eu>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Dinh Nguyen <dinguyen@...nel.org>, Jonas Bonn <jonas@...thpole.se>,
Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>, Stafford Horne <shorne@...il.com>,
"James E.J. Bottomley" <James.Bottomley@...senPartnership.com>, Helge Deller <deller@....de>,
Michael Ellerman <mpe@...erman.id.au>, Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>, Naveen N Rao <naveen@...nel.org>,
Madhavan Srinivasan <maddy@...ux.ibm.com>, Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Yoshinori Sato <ysato@...rs.sourceforge.jp>,
Rich Felker <dalias@...c.org>, John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
"David S. Miller" <davem@...emloft.net>, Andreas Larsson <andreas@...sler.com>,
Richard Weinberger <richard@....at>, Anton Ivanov <anton.ivanov@...bridgegreys.com>,
Johannes Berg <johannes@...solutions.net>, Chris Zankel <chris@...kel.net>,
Max Filippov <jcmvbkbc@...il.com>, Arnd Bergmann <arnd@...db.de>,
Andrew Morton <akpm@...ux-foundation.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Uladzislau Rezki <urezki@...il.com>,
Christoph Hellwig <hch@...radead.org>, Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Mike Rapoport <rppt@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>,
Ard Biesheuvel <ardb@...nel.org>, Josh Poimboeuf <jpoimboe@...nel.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, linux-alpha@...r.kernel.org,
linux-snps-arc@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
linux-csky@...r.kernel.org, linux-hexagon@...r.kernel.org,
loongarch@...ts.linux.dev, linux-m68k@...ts.linux-m68k.org,
linux-mips@...r.kernel.org, linux-openrisc@...r.kernel.org,
linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-sh@...r.kernel.org, sparclinux@...r.kernel.org,
linux-um@...ts.infradead.org, linux-arch@...r.kernel.org, linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
kvm@...r.kernel.org, linux-efi@...r.kernel.org,
Brendan Jackman <jackmanb@...gle.com>
Subject: [PATCH RFC v2 16/29] mm: asi: Map kernel text and static data as nonsensitive
Basically we need to map the kernel code and all its static variables.
Per-CPU variables need to be treated specially as described in the
comments. The cpu_entry_area is similar - this needs to be
nonsensitive so that the CPU can access the GDT etc when handling
a page fault.
Under 5-level paging, most of the kernel memory comes under a single PGD
entry (see Documentation/x86/x86_64/mm.rst. Basically, the mapping is
for this big region is the same as under 4-level, just wrapped in an
outer PGD entry). For that region, the "clone" logic is moved down one
step of the paging hierarchy.
Note that the p4d_alloc in asi_clone_p4d won't actually be used in
practice; the relevant PGD entry will always have been populated by
prior asi_map calls so this code would "work" if we just wrote
p4d_offset (but asi_clone_p4d would be broken if viewed in isolation).
The vmemmap area is not under this single PGD, it has its own 2-PGD
area, so we still use asi_clone_pgd for that one.
Signed-off-by: Brendan Jackman <jackmanb@...gle.com>
---
arch/x86/mm/asi.c | 105 +++++++++++++++++++++++++++++++++++++-
include/asm-generic/vmlinux.lds.h | 11 ++++
2 files changed, 115 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c
index b951f2100b8bdea5738ded16166255deb29faf57..bc2cf0475a0e7344a66d81453f55034b2fc77eef 100644
--- a/arch/x86/mm/asi.c
+++ b/arch/x86/mm/asi.c
@@ -7,7 +7,6 @@
#include <linux/init.h>
#include <linux/pgtable.h>
-#include <asm/asi.h>
#include <asm/cmdline.h>
#include <asm/cpufeature.h>
#include <asm/page.h>
@@ -186,8 +185,68 @@ void __init asi_check_boottime_disable(void)
pr_info("ASI enablement ignored due to incomplete implementation.\n");
}
+/*
+ * Map data by sharing sub-PGD pagetables with the unrestricted mapping. This is
+ * more efficient than asi_map, but only works when you know the whole top-level
+ * page needs to be mapped in the restricted tables. Note that the size of the
+ * mappings this creates differs between 4 and 5-level paging.
+ */
+static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr)
+{
+ pgd_t *src = pgd_offset_pgd(src_table, addr);
+ pgd_t *dst = pgd_offset_pgd(dst_table, addr);
+
+ if (!pgd_val(*dst))
+ set_pgd(dst, *src);
+ else
+ WARN_ON_ONCE(pgd_val(*dst) != pgd_val(*src));
+}
+
+/*
+ * For 4-level paging this is exactly the same as asi_clone_pgd. For 5-level
+ * paging it clones one level lower. So this always creates a mapping of the
+ * same size.
+ */
+static void asi_clone_p4d(pgd_t *dst_table, pgd_t *src_table, size_t addr)
+{
+ pgd_t *src_pgd = pgd_offset_pgd(src_table, addr);
+ pgd_t *dst_pgd = pgd_offset_pgd(dst_table, addr);
+ p4d_t *src_p4d = p4d_alloc(&init_mm, src_pgd, addr);
+ p4d_t *dst_p4d = p4d_alloc(&init_mm, dst_pgd, addr);
+
+ if (!p4d_val(*dst_p4d))
+ set_p4d(dst_p4d, *src_p4d);
+ else
+ WARN_ON_ONCE(p4d_val(*dst_p4d) != p4d_val(*src_p4d));
+}
+
+/*
+ * percpu_addr is where the linker put the percpu variable. asi_map_percpu finds
+ * the place where the percpu allocator copied the data during boot.
+ *
+ * This is necessary even when the page allocator defaults to
+ * global-nonsensitive, because the percpu allocator uses the memblock allocator
+ * for early allocations.
+ */
+static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len)
+{
+ int cpu, err;
+ void *ptr;
+
+ for_each_possible_cpu(cpu) {
+ ptr = per_cpu_ptr(percpu_addr, cpu);
+ err = asi_map(asi, ptr, len);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int __init asi_global_init(void)
{
+ int err;
+
if (!boot_cpu_has(X86_FEATURE_ASI))
return 0;
@@ -207,6 +266,46 @@ static int __init asi_global_init(void)
VMALLOC_START, VMALLOC_END,
"ASI Global Non-sensitive vmalloc");
+ /* Map all kernel text and static data */
+ err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)__START_KERNEL,
+ (size_t)_end - __START_KERNEL);
+ if (WARN_ON(err))
+ return err;
+ err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)FIXADDR_START,
+ FIXADDR_SIZE);
+ if (WARN_ON(err))
+ return err;
+ /* Map all static percpu data */
+ err = asi_map_percpu(
+ ASI_GLOBAL_NONSENSITIVE,
+ __per_cpu_start, __per_cpu_end - __per_cpu_start);
+ if (WARN_ON(err))
+ return err;
+
+ /*
+ * The next areas are mapped using shared sub-P4D paging structures
+ * (asi_clone_p4d instead of asi_map), since we know the whole P4D will
+ * be mapped.
+ */
+ asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd,
+ CPU_ENTRY_AREA_BASE);
+#ifdef CONFIG_X86_ESPFIX64
+ asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd,
+ ESPFIX_BASE_ADDR);
+#endif
+ /*
+ * The vmemmap area actually _must_ be cloned via shared paging
+ * structures, since mappings can potentially change dynamically when
+ * hugetlbfs pages are created or broken down.
+ *
+ * We always clone 2 PGDs, this is a corrolary of the sizes of struct
+ * page, a page, and the physical address space.
+ */
+ WARN_ON(sizeof(struct page) * MAXMEM / PAGE_SIZE != 2 * (1UL << PGDIR_SHIFT));
+ asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, VMEMMAP_START);
+ asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd,
+ VMEMMAP_START + (1UL << PGDIR_SHIFT));
+
return 0;
}
subsys_initcall(asi_global_init)
@@ -599,6 +698,10 @@ static bool follow_physaddr(
* Map the given range into the ASI page tables. The source of the mapping is
* the regular unrestricted page tables. Can be used to map any kernel memory.
*
+ * In contrast to some internal ASI logic (asi_clone_pgd and asi_clone_p4d) this
+ * never shares pagetables between restricted and unrestricted address spaces,
+ * instead it creates wholly new equivalent mappings.
+ *
* The caller MUST ensure that the source mapping will not change during this
* function. For dynamic kernel memory, this is generally ensured by mapping the
* memory within the allocator.
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index eeadbaeccf88b73af40efe5221760a7cb37058d2..18f6c0448baf5dfbd0721ba9a6d89000fa86f061 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -1022,6 +1022,16 @@
COMMON_DISCARDS \
}
+/*
+ * ASI maps certain sections with certain sensitivity levels, so they need to
+ * have a page-aligned size.
+ */
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+#define ASI_ALIGN() ALIGN(PAGE_SIZE)
+#else
+#define ASI_ALIGN() .
+#endif
+
/**
* PERCPU_INPUT - the percpu input sections
* @cacheline: cacheline size
@@ -1043,6 +1053,7 @@
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
PERCPU_DECRYPTED_SECTION \
+ . = ASI_ALIGN(); \
__per_cpu_end = .;
/**
--
2.47.1.613.gc27f4b7a9f-goog
Powered by blists - more mailing lists