lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171230175746.5snazk5sem4ribk6@treble>
Date:   Sat, 30 Dec 2017 11:57:46 -0600
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Alexander Tsoy <alexander@...y.me>
Cc:     Andy Lutomirski <luto@...capital.net>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Toralf Förster <toralf.foerster@....de>,
        stable <stable@...r.kernel.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>
Subject: Re: 4.14.9 doesn't boot (regression)

On Sat, Dec 30, 2017 at 11:09:46AM -0600, Josh Poimboeuf wrote:
> On Sat, Dec 30, 2017 at 11:45:13AM +0300, Alexander Tsoy wrote:
> > В Пт, 29/12/2017 в 21:49 -0600, Josh Poimboeuf пишет:
> > > On Fri, Dec 29, 2017 at 05:10:35PM -0700, Andy Lutomirski wrote:
> > > > (Also, Josh, the oops code should have printed the contents of the
> > > > struct pt_regs at the top of the DF stack.  Any idea why it
> > > > didn't?)
> > > 
> > > Looking at one of the dumps:
> > > 
> > >   [  392.774879] NMI backtrace for cpu 0
> > >   [  392.774881] CPU: 0 PID: 1 Comm: init Not tainted 4.14.9-gentoo
> > > #1
> > >   [  392.774881] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> > >   [  392.774882] task: ffff8802368b8000 task.stack: ffffc9000000c000
> > >   [  392.774885] RIP: 0010:double_fault+0x0/0x30
> > >   [  392.774886] RSP: 0000:ffffffffff527fd0 EFLAGS: 00000086
> > >   [  392.774887] RAX: 000000003fc00000 RBX: 0000000000000001 RCX:
> > > 00000000c0000101
> > >   [  392.774887] RDX: 00000000ffff8802 RSI: 0000000000000000 RDI:
> > > ffffffffff527f58
> > >   [  392.774887] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > > 0000000000000000
> > >   [  392.774888] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffffff816ae726
> > >   [  392.774888] R13: 0000000000000000 R14: 0000000000000000 R15:
> > > 0000000000000000
> > >   [  392.774889] FS:  0000000000000000(0000)
> > > GS:ffff88023fc00000(0000) knlGS:0000000000000000
> > >   [  392.774889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >   [  392.774890] CR2: ffffffffff526f08 CR3: 0000000235b48002 CR4:
> > > 00000000001606f0
> > >   [  392.774892] Call Trace:
> > >   [  392.774894]  <#DF>
> > >   [  392.774897]  do_double_fault+0xb/0x140
> > >   [  392.774898]  </#DF>
> > > 
> > > It should have at least printed the #DF iret frame registers, which I
> > > recently added support for in "x86/unwinder: Handle stack overflows
> > > more
> > > gracefully", which is in both 4.14.9 and 4.15-rc5.
> > > 
> > > I think the missing iret regs are due to a bug in
> > > show_trace_log_lvl(),
> > > where if the unwind starts with two regs frames in a row, the second
> > > regs don't get printed.
> > > 
> > > Alexander, would you mind reproducing again with the below patch?  It
> > > should still fail, but this time it should hopefully show another
> > > RIP/RSP/EFLAGS instead of the "do_double_fault+0xb/0x140" line.
> > > 
> > 
> > Yes, it works:
> > 
> > [   23.058064] NMI backtrace for cpu 2
> > [   23.058068] CPU: 2 PID: 1 Comm: init Not tainted 4.15.0-rc5+ #1
> > [   23.058069] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 1.10.2-1.fc27 04/01/2014
> > [   23.058074] RIP: 0010:double_fault+0x0/0x30
> > [   23.058075] RSP: 0000:fffffe800005ffd0 EFLAGS: 00000086
> > [   23.058077] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > 00000000c0000101
> > [   23.058077] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > fffffe800005ff58
> > [   23.058078] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > 0000000000000000
> > [   23.058079] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffff92001426
> > [   23.058080] R13: 0000000000000000 R14: 0000000000000000 R15:
> > 0000000000000000
> > [   23.058083] FS:  0000000000000000(0000) GS:ffff96813fd00000(0000)
> > knlGS:0000000000000000
> > [   23.058084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   23.058085] CR2: fffffe800005ef08 CR3: 0000000137a09000 CR4:
> > 00000000000406a0
> > [   23.058089] Call Trace:
> > [   23.058101]  <#DF>
> > [   23.058104] RIP: 0010:do_double_fault+0xb/0x140
> > [   23.058105] RSP: 0000:fffffe800005ef18 EFLAGS: 00010086 ORIG_RAX:
> > 0000000000000000
> > [   23.058106] RAX: 000000003fd00000 RBX: 0000000000000001 RCX:
> > 00000000c0000101
> > [   23.058107] RDX: 00000000ffff9681 RSI: 0000000000000000 RDI:
> > fffffe800005ff58
> > [   23.058107] RBP: 0000000000000000 R08: 0000000000000000 R09:
> > 0000000000000000
> > [   23.058108] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffffff92001426
> > [   23.058108] R13: 0000000000000000 R14: 0000000000000000 R15:
> > 0000000000000000
> > [   23.058111]  </#DF>
> > [   23.058111] Code: 05 00 00 48 89 e7 31 f6 e8 2e 8c 61 ff e9 69 06 00
> > 00 e8 94 05 00 00 48 89 e7 31 f6 e8 1a 8c 61 ff e9 55 06 00 00 0f 1f 44
> > 00 00 <0f> 1f 00 48 83 c4 88 e8 e4 04 00 00 48 89 e7 48 8b 74 24 78 48
> 
> That's better indeed, though still not quite right.  It should have only
> shown a subset of those registers.  One more bug to fix there...

Turns out my previous code to print iret frames was a bit ... misguided,
to put it nicely.  Not sure what I was smoking.

Hopefully the below patch should fix it (in place of the previous
patch).  Would you mind testing again?

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index c1688c2d0a12..1f86e1b0a5cd 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -56,18 +56,27 @@ void unwind_start(struct unwind_state *state, struct task_struct *task,
 
 #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER)
 /*
- * WARNING: The entire pt_regs may not be safe to dereference.  In some cases,
- * only the iret frame registers are accessible.  Use with caution!
+ * If 'partial' returns true, only the iret frame registers are valid.
  */
-static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
+						    bool *partial)
 {
 	if (unwind_done(state))
 		return NULL;
 
+	if (partial) {
+#ifdef CONFIG_UNWINDER_ORC
+		*partial = !state->full_regs;
+#else
+		*partial = false;
+#endif
+	}
+
 	return state->regs;
 }
 #else
-static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
+static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state,
+						    bool *partial)
 {
 	return NULL;
 }
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 36b17e0febe8..6e49d2e0c243 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -76,12 +76,11 @@ void show_iret_regs(struct pt_regs *regs)
 		regs->sp, regs->flags);
 }
 
-static void show_regs_safe(struct stack_info *info, struct pt_regs *regs)
+static void show_regs_full_or_partial(struct pt_regs *regs, bool partial)
 {
-	if (on_stack(info, regs, sizeof(*regs)))
+	if (!partial)
 		__show_regs(regs, 0);
-	else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET,
-			  IRET_FRAME_SIZE)) {
+	else {
 		/*
 		 * When an interrupt or exception occurs in entry code, the
 		 * full pt_regs might not have been saved yet.  In that case
@@ -98,11 +97,13 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	struct stack_info stack_info = {0};
 	unsigned long visit_mask = 0;
 	int graph_idx = 0;
+	bool partial;
 
 	printk("%sCall Trace:\n", log_lvl);
 
 	unwind_start(&state, task, regs, stack);
 	stack = stack ? : get_stack_pointer(task, regs);
+	regs = unwind_get_entry_regs(&state, &partial);
 
 	/*
 	 * Iterate through the stacks, starting with the current stack pointer.
@@ -120,7 +121,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 	 * - hardirq stack
 	 * - entry stack
 	 */
-	for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
+	for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
 		const char *stack_name;
 
 		if (get_stack_info(stack, task, &stack_info, &visit_mask)) {
@@ -140,7 +141,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			printk("%s <%s>\n", log_lvl, stack_name);
 
 		if (regs)
-			show_regs_safe(&stack_info, regs);
+			show_regs_full_or_partial(regs, partial);
 
 		/*
 		 * Scan the stack, printing any text addresses we find.  At the
@@ -164,7 +165,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 
 			/*
 			 * Don't print regs->ip again if it was already printed
-			 * by show_regs_safe() below.
+			 * by show_regs_full_or_partial() below.
 			 */
 			if (regs && stack == &regs->ip)
 				goto next;
@@ -199,9 +200,9 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs,
 			unwind_next_frame(&state);
 
 			/* if the frame has entry regs, print them */
-			regs = unwind_get_entry_regs(&state);
+			regs = unwind_get_entry_regs(&state, &partial);
 			if (regs)
-				show_regs_safe(&stack_info, regs);
+				show_regs_full_or_partial(regs, partial);
 		}
 
 		if (stack_name)
diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
index 77835bc021c7..7dd0d2a0d142 100644
--- a/arch/x86/kernel/stacktrace.c
+++ b/arch/x86/kernel/stacktrace.c
@@ -102,7 +102,7 @@ __save_stack_trace_reliable(struct stack_trace *trace,
 	for (unwind_start(&state, task, NULL, NULL); !unwind_done(&state);
 	     unwind_next_frame(&state)) {
 
-		regs = unwind_get_entry_regs(&state);
+		regs = unwind_get_entry_regs(&state, NULL);
 		if (regs) {
 			/*
 			 * Kernel mode registers on the stack indicate an

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ