[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250110-asi-rfc-v2-v2-10-8419288bc805@google.com>
Date: Fri, 10 Jan 2025 18:40:36 +0000
From: Brendan Jackman <jackmanb@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Richard Henderson <richard.henderson@...aro.org>, Matt Turner <mattst88@...il.com>,
Vineet Gupta <vgupta@...nel.org>, Russell King <linux@...linux.org.uk>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, Guo Ren <guoren@...nel.org>,
Brian Cain <bcain@...cinc.com>, Huacai Chen <chenhuacai@...nel.org>,
WANG Xuerui <kernel@...0n.name>, Geert Uytterhoeven <geert@...ux-m68k.org>,
Michal Simek <monstr@...str.eu>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Dinh Nguyen <dinguyen@...nel.org>, Jonas Bonn <jonas@...thpole.se>,
Stefan Kristiansson <stefan.kristiansson@...nalahti.fi>, Stafford Horne <shorne@...il.com>,
"James E.J. Bottomley" <James.Bottomley@...senPartnership.com>, Helge Deller <deller@....de>,
Michael Ellerman <mpe@...erman.id.au>, Nicholas Piggin <npiggin@...il.com>,
Christophe Leroy <christophe.leroy@...roup.eu>, Naveen N Rao <naveen@...nel.org>,
Madhavan Srinivasan <maddy@...ux.ibm.com>, Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Heiko Carstens <hca@...ux.ibm.com>, Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>, Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>, Yoshinori Sato <ysato@...rs.sourceforge.jp>,
Rich Felker <dalias@...c.org>, John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>,
"David S. Miller" <davem@...emloft.net>, Andreas Larsson <andreas@...sler.com>,
Richard Weinberger <richard@....at>, Anton Ivanov <anton.ivanov@...bridgegreys.com>,
Johannes Berg <johannes@...solutions.net>, Chris Zankel <chris@...kel.net>,
Max Filippov <jcmvbkbc@...il.com>, Arnd Bergmann <arnd@...db.de>,
Andrew Morton <akpm@...ux-foundation.org>, Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>, Uladzislau Rezki <urezki@...il.com>,
Christoph Hellwig <hch@...radead.org>, Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Mike Rapoport <rppt@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>, Christoph Lameter <cl@...ux.com>,
Sean Christopherson <seanjc@...gle.com>, Paolo Bonzini <pbonzini@...hat.com>,
Ard Biesheuvel <ardb@...nel.org>, Josh Poimboeuf <jpoimboe@...nel.org>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, linux-alpha@...r.kernel.org,
linux-snps-arc@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
linux-csky@...r.kernel.org, linux-hexagon@...r.kernel.org,
loongarch@...ts.linux.dev, linux-m68k@...ts.linux-m68k.org,
linux-mips@...r.kernel.org, linux-openrisc@...r.kernel.org,
linux-parisc@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-sh@...r.kernel.org, sparclinux@...r.kernel.org,
linux-um@...ts.infradead.org, linux-arch@...r.kernel.org, linux-mm@...ck.org,
linux-trace-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
kvm@...r.kernel.org, linux-efi@...r.kernel.org,
Brendan Jackman <jackmanb@...gle.com>, Ofir Weisse <oweisse@...gle.com>
Subject: [PATCH RFC v2 10/29] mm: asi: asi_exit() on PF, skip handling if
address is accessible
From: Ofir Weisse <oweisse@...gle.com>
On a page-fault - do asi_exit(). Then check if now after the exit the
address is accessible. We do this by refactoring spurious_kernel_fault()
into two parts:
1. Verify that the error code value is something that could arise from a
lazy TLB update.
2. Walk the page table and verify permissions, which is now called
is_address_accessible(). We also define PTE_PRESENT() and PMD_PRESENT()
which are suitable for checking userspace pages. For the sake of
spurious faults, pte_present() and pmd_present() are only good for
kernelspace pages. This is because these macros might return true even
if the present bit is 0 (only relevant for userspace).
checkpatch.pl VSPRINTF_SPECIFIER_PX - it's in a WARN that only fires in
a debug build of the kernel when we hit a disastrous bug, seems OK to
leak addresses.
RFC note: A separate refactoring/prep commit should be split out of this
patch.
Checkpatch-args: --ignore=VSPRINTF_SPECIFIER_PX
Signed-off-by: Ofir Weisse <oweisse@...gle.com>
Signed-off-by: Brendan Jackman <jackmanb@...gle.com>
---
arch/x86/mm/fault.c | 118 +++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 103 insertions(+), 15 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e6c469b323ccb748de22adc7d9f0a16dd195edad..ee8f5417174e2956391d538f41e2475553ca4972 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -948,7 +948,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
}
-static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
+static __always_inline int kernel_protection_ok(unsigned long error_code, pte_t *pte)
{
if ((error_code & X86_PF_WRITE) && !pte_write(*pte))
return 0;
@@ -959,6 +959,8 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte)
return 1;
}
+static int kernel_access_ok(unsigned long error_code, unsigned long address, pgd_t *pgd);
+
/*
* Handle a spurious fault caused by a stale TLB entry.
*
@@ -984,11 +986,6 @@ static noinline int
spurious_kernel_fault(unsigned long error_code, unsigned long address)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
- pte_t *pte;
- int ret;
/*
* Only writes to RO or instruction fetches from NX may cause
@@ -1004,6 +1001,50 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return 0;
pgd = init_mm.pgd + pgd_index(address);
+ return kernel_access_ok(error_code, address, pgd);
+}
+NOKPROBE_SYMBOL(spurious_kernel_fault);
+
+/*
+ * For kernel addresses, pte_present and pmd_present are sufficient for
+ * is_address_accessible. For user addresses these functions will return true
+ * even though the pte is not actually accessible by hardware (i.e _PAGE_PRESENT
+ * is not set). This happens in cases where the pages are physically present in
+ * memory, but they are not made accessible to hardware as they need software
+ * handling first:
+ *
+ * - ptes/pmds with _PAGE_PROTNONE need autonuma balancing (see pte_protnone(),
+ * change_prot_numa(), and do_numa_page()).
+ *
+ * - pmds with _PAGE_PSE & !_PAGE_PRESENT are undergoing splitting (see
+ * split_huge_page()).
+ *
+ * Here, we care about whether the hardware can actually access the page right
+ * now.
+ *
+ * These issues aren't currently present for PUD but we also have a custom
+ * PUD_PRESENT for a layer of future-proofing.
+ */
+#define PUD_PRESENT(pud) (pud_flags(pud) & _PAGE_PRESENT)
+#define PMD_PRESENT(pmd) (pmd_flags(pmd) & _PAGE_PRESENT)
+#define PTE_PRESENT(pte) (pte_flags(pte) & _PAGE_PRESENT)
+
+/*
+ * Check if an access by the kernel would cause a page fault. The access is
+ * described by a page fault error code (whether it was a write/instruction
+ * fetch) and address. This doesn't check for types of faults that are not
+ * expected to affect the kernel, e.g. PKU. The address can be user or kernel
+ * space, if user then we assume the access would happen via the uaccess API.
+ */
+static noinstr int
+kernel_access_ok(unsigned long error_code, unsigned long address, pgd_t *pgd)
+{
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+ int ret;
+
if (!pgd_present(*pgd))
return 0;
@@ -1012,27 +1053,27 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
return 0;
if (p4d_leaf(*p4d))
- return spurious_kernel_fault_check(error_code, (pte_t *) p4d);
+ return kernel_protection_ok(error_code, (pte_t *) p4d);
pud = pud_offset(p4d, address);
- if (!pud_present(*pud))
+ if (!PUD_PRESENT(*pud))
return 0;
if (pud_leaf(*pud))
- return spurious_kernel_fault_check(error_code, (pte_t *) pud);
+ return kernel_protection_ok(error_code, (pte_t *) pud);
pmd = pmd_offset(pud, address);
- if (!pmd_present(*pmd))
+ if (!PMD_PRESENT(*pmd))
return 0;
if (pmd_leaf(*pmd))
- return spurious_kernel_fault_check(error_code, (pte_t *) pmd);
+ return kernel_protection_ok(error_code, (pte_t *) pmd);
pte = pte_offset_kernel(pmd, address);
- if (!pte_present(*pte))
+ if (!PTE_PRESENT(*pte))
return 0;
- ret = spurious_kernel_fault_check(error_code, pte);
+ ret = kernel_protection_ok(error_code, pte);
if (!ret)
return 0;
@@ -1040,12 +1081,11 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address)
* Make sure we have permissions in PMD.
* If not, then there's a bug in the page tables:
*/
- ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd);
+ ret = kernel_protection_ok(error_code, (pte_t *) pmd);
WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
return ret;
}
-NOKPROBE_SYMBOL(spurious_kernel_fault);
int show_unhandled_signals = 1;
@@ -1490,6 +1530,29 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
}
}
+static __always_inline void warn_if_bad_asi_pf(
+ unsigned long error_code, unsigned long address)
+{
+#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION
+ struct asi *target;
+
+ /*
+ * It's a bug to access sensitive data from the "critical section", i.e.
+ * on the path between asi_enter and asi_relax, where untrusted code
+ * gets run. #PF in this state sees asi_intr_nest_depth() as 1 because
+ * #PF increments it. We can't think of a better way to determine if
+ * this has happened than to check the ASI pagetables, hence we can't
+ * really have this check in non-debug builds unfortunately.
+ */
+ VM_WARN_ONCE(
+ (target = asi_get_target(current)) != NULL &&
+ asi_intr_nest_depth() == 1 &&
+ !kernel_access_ok(error_code, address, asi_pgd(target)),
+ "ASI-sensitive data access from critical section, addr=%px error_code=%lx class=%s",
+ (void *) address, error_code, asi_class_name(target->class_id));
+#endif
+}
+
DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
{
irqentry_state_t state;
@@ -1497,6 +1560,31 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
address = cpu_feature_enabled(X86_FEATURE_FRED) ? fred_event_data(regs) : read_cr2();
+ if (static_asi_enabled() && !user_mode(regs)) {
+ pgd_t *pgd;
+
+ /* Can be a NOP even for ASI faults, because of NMIs */
+ asi_exit();
+
+ /*
+ * handle_page_fault() might oops if we run it for a kernel
+ * address in kernel mode. This might be the case if we got here
+ * due to an ASI fault. We avoid this case by checking whether
+ * the address is now, after asi_exit(), accessible by hardware.
+ * If it is - there's nothing to do. Note that this is a bit of
+ * a shotgun; we can also bail early from user-address faults
+ * here that weren't actually caused by ASI. So we might wanna
+ * move this logic later in the handler. In particular, we might
+ * be losing some stats here. However for now this keeps ASI
+ * page faults nice and fast.
+ */
+ pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address);
+ if (!user_mode(regs) && kernel_access_ok(error_code, address, pgd)) {
+ warn_if_bad_asi_pf(error_code, address);
+ return;
+ }
+ }
+
prefetchw(¤t->mm->mmap_lock);
/*
--
2.47.1.613.gc27f4b7a9f-goog
Powered by blists - more mailing lists