linux-kernel - Re: [PATCH v3 14/51] x86/asm/head: put real return address on idle task stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20160817211008.suuq2xrwseijqb7l@treble>
Date:	Wed, 17 Aug 2016 16:10:08 -0500
From:	Josh Poimboeuf <jpoimboe@...hat.com>
To:	Nilay Vaish <nilayvaish@...il.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	"H . Peter Anvin" <hpa@...or.com>, x86 <x86@...nel.org>,
	Linux Kernel list <linux-kernel@...r.kernel.org>,
	Andy Lutomirski <luto@...capital.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Brian Gerst <brgerst@...il.com>,
	Kees Cook <keescook@...omium.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Byungchul Park <byungchul.park@....com>
Subject: Re: [PATCH v3 14/51] x86/asm/head: put real return address on idle
 task stack

On Wed, Aug 17, 2016 at 03:30:55PM -0500, Nilay Vaish wrote:
> On 12 August 2016 at 09:28, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> > The frame at the end of each idle task stack has a zeroed return
> > address.  This is inconsistent with real task stacks, which have a real
> > return address at that spot.  This inconsistency can be confusing for
> > stack unwinders.
> >
> > Make it a real address by using the side effect of a call instruction to
> > push the instruction pointer on the stack.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@...hat.com>
> > ---
> >  arch/x86/kernel/head_64.S | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index 3621ad2..c90f481 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -298,8 +298,9 @@ ENTRY(start_cpu)
> >          *      REX.W + FF /5 JMP m16:64 Jump far, absolute indirect,
> >          *              address given in m16:64.
> >          */
> > -       movq    initial_code(%rip),%rax
> > -       pushq   $0              # fake return address to stop unwinder
> > +       call    1f              # put return address on stack for unwinder
> > +1:     xorq    %rbp, %rbp      # clear frame pointer
> > +       movq    initial_code(%rip), %rax
> >         pushq   $__KERNEL_CS    # set correct cs
> >         pushq   %rax            # target address in negative space
> >         lretq
> 
> 
> Josh,  I have a couple of questions.
> 
> It seems to me that this patch and the patch 16/51 are both aiming at
> the same thing, but are for two different architectures: 32-bit and
> 64-bit versions of x86.  But you have taken slightly different
> approaches in the two patches (for 64-bit, we first jump and then make
> a function call, for 32-bit we directly call the function).  Is there
> any particular reason for this?  May be I missed out on something.

Yes, the 64-bit code is different: it has to use a far return (lretq) in
order to jump to the 64-bit address stored in 'initial_code'.

So instead of calling the initial code directly, I had to use a more
obtuse approach there ("call 1f") which just places a start_cpu()
address at the right place on the stack before the lretq (which
"returns" to the intial code).

> Second, this is for the whole patch series.  If I wanted to test this
> series, how should I go about doing so?

That's a loaded question. :-) I'd recommend doing lots of stack dumps
and traces in various situations and making sure everything looks sane,
on both 32-bit and 64-bit, and with both CONFIG_FRAME_POINTER=n and
CONFIG_FRAME_POINTER=y.  Some easy ones are:

  cat /proc/*/stack
  echo 1 > /proc/sys/kernel/sysrq; echo l > /proc/sysrq-trigger
  echo 1 > /proc/sys/kernel/sysrq; echo t > /proc/sysrq-trigger
  perf record -g <cmd>; perf report

Also, if you have CONFIG_LOCKDEP enabled, it will silently save a bunch
of stacks in the background, though you won't ever see any evidence of
that unless something goes wrong (you can check if the unwinder spit out
any warnings with 'dmesg |grep WARNING').

For my own testing, I also hacked up my kernel to dump the stack in
various scenarios: interrupts, NMIs, page faults, preemption, perf
callchain record, etc.

BTW, I'll be posting v4 soon, probably Thursday morning US Central time.

-- 
Josh