lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200911160053.w66xit3imcqsn33g@treble>
Date:   Fri, 11 Sep 2020 11:00:53 -0500
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Logan Gunthorpe <logang@...tatee.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Naresh Kamboju <naresh.kamboju@...aro.org>,
        Borislav Petkov <bp@...en8.de>
Subject: Re: WARNING: Kernel stack regs has bad 'bp' value

On Thu, Sep 10, 2020 at 01:42:21PM -0600, Logan Gunthorpe wrote:
> Hi,
> 
> A couple of times now, I've hit a very rare kernel warning (see below)
> while doing IO to an NVMe drive. I do not have a reliable way to
> reproduce this bug but it seems to have started very roughly around v5.8.
> 
> I've found someone else (Naresh Kamboju) has reported a very similar
> issue here[1] though there were no responses and I can't find the email
> anywhere else but through that link. Naresh mentions a method to
> reproduce the bug which I have not tried.
> 
> After some research on similar occurrences of this warning[2], it seems
> to be caused by assembly code making use of the %rbp register and an
> interrupt calling unwind_stack_frame() at just the wrong time (this
> happens more frequently with KASAN enabled, which is the case on my
> setup). When this happens, the offending function is seen in the stack dump.
> 
> One such function, which is common in all the stack dumps, is
> asm_call_on_stack(). This was introduced in v5.8 and pushes and replaces
> %rbp.
> 
> 931b94145981 ("x86/entry: Provide helpers for executing on the irqstack")
> 
> I'm not sure if this is the cause of the bug but it seems worth looking
> at. A comment in the code suggests that %rbp is saved for the ORC
> unwinder, but perhaps this doesn't play nicely with the Frame Pointer
> unwinder which is printing this warning.

Hi Logan,

Thanks for the bug report.  (Sorry I missed the first one, Naresh.)

The problem is that ret_from_fork() is no longer in .entry.text, so the
following check in the FP unwinder doesn't work when ret_from_fork()
gets interrupted.

	/*
	 * Don't warn if the unwinder got lost due to an interrupt in entry
	 * code or in the C handler before the first frame pointer got set up:
	 */
	if (state->got_irq && in_entry_code(state->ip))
		goto the_end;

If you have the ability to recreate, can you try the following patch?

A combination of a lot of forks and a lot of interrupts should trigger
it.  I'll try to recreate as well.

diff --git a/arch/x86/include/asm/frame.h b/arch/x86/include/asm/frame.h
index 296b346184b2..fb42659f6e98 100644
--- a/arch/x86/include/asm/frame.h
+++ b/arch/x86/include/asm/frame.h
@@ -60,12 +60,26 @@
 #define FRAME_END "pop %" _ASM_BP "\n"
 
 #ifdef CONFIG_X86_64
+
 #define ENCODE_FRAME_POINTER			\
 	"lea 1(%rsp), %rbp\n\t"
+
+static inline unsigned long encode_frame_pointer(struct pt_regs *regs)
+{
+	return (unsigned long)regs + 1;
+}
+
 #else /* !CONFIG_X86_64 */
+
 #define ENCODE_FRAME_POINTER			\
 	"movl %esp, %ebp\n\t"			\
 	"andl $0x7fffffff, %ebp\n\t"
+
+static inline unsigned long encode_frame_pointer(struct pt_regs *regs)
+{
+	return (unsigned long)regs & 0x7fffffff;
+}
+
 #endif /* CONFIG_X86_64 */
 
 #endif /* __ASSEMBLY__ */
@@ -83,6 +97,11 @@
 
 #define ENCODE_FRAME_POINTER
 
+static inline unsigned long encode_frame_pointer(struct pt_regs *regs)
+{
+	return 0;
+}
+
 #endif
 
 #define FRAME_BEGIN
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 13ce616cc7af..ba4593a913fa 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -42,6 +42,7 @@
 #include <asm/spec-ctrl.h>
 #include <asm/io_bitmap.h>
 #include <asm/proto.h>
+#include <asm/frame.h>
 
 #include "process.h"
 
@@ -133,7 +134,7 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg,
 	fork_frame = container_of(childregs, struct fork_frame, regs);
 	frame = &fork_frame->frame;
 
-	frame->bp = 0;
+	frame->bp = encode_frame_pointer(childregs);
 	frame->ret_addr = (unsigned long) ret_from_fork;
 	p->thread.sp = (unsigned long) fork_frame;
 	p->thread.io_bitmap = NULL;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ