linux-kernel - Re: [PATCH 3/4] x86: Rewrite switch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160523184406.dwy46gjfwtlvgwnu@treble>
Date:	Mon, 23 May 2016 13:44:06 -0500
From:	Josh Poimboeuf <jpoimboe@...hat.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Brian Gerst <brgerst@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Ingo Molnar <mingo@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Borislav Petkov <bp@...e.de>, "H. Peter Anvin" <hpa@...or.com>,
	X86 ML <x86@...nel.org>, Dave Anderson <anderson@...hat.com>
Subject: Re: [PATCH 3/4] x86: Rewrite switch_to() code

On Mon, May 23, 2016 at 10:03:54AM -0700, Andy Lutomirski wrote:
> On Mon, May 23, 2016 at 9:46 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> > On Mon, May 23, 2016 at 06:49:03AM -0500, Josh Poimboeuf wrote:
> >> On Mon, May 23, 2016 at 06:47:22AM -0500, Josh Poimboeuf wrote:
> >> > On Mon, May 23, 2016 at 07:14:14AM -0400, Brian Gerst wrote:
> >> > > On Sun, May 22, 2016 at 10:34 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> >> > > > On Sun, May 22, 2016 at 10:59:38AM -0700, Andy Lutomirski wrote:
> >> > > >> cc: Josh Poimboeuf: do you care about the exact stack layout of the
> >> > > >> bottom of the stack of an inactive task?
> >> > > >
> >> > > > So there's one minor issue with this patch, relating to unwinding the
> >> > > > stack of a newly forked task.  For detecting reliable stacks, the
> >> > > > unwinder needs to unwind all the way to the syscall pt_regs to make sure
> >> > > > the stack is sane.  But for newly forked tasks, that won't be possible
> >> > > > here because the unwinding will stop at the fork_frame instead.
> >> > > >
> >> > > > So from an unwinder standpoint it might be nice for copy_thread_tls() to
> >> > > > place a frame pointer on the stack next to the ret_from_fork return
> >> > > > address, so that it would resemble an actual stack frame.  The frame
> >> > > > pointer could probably just be hard-coded to zero.  And then the first
> >> > > > bp in fork_frame would need to be a pointer to it instead of zero.  That
> >> > > > would make it nicely resemble the stack of any other task.
> >> > > >
> >> > > > Alternatively I could teach the unwinder that if the unwinding starts at
> >> > > > the fork_frame offset from the end of the stack page, and the saved rbp
> >> > > > is zero, it can assume that it's a newly forked task.  But that seems a
> >> > > > little more brittle to me, as it requires the unwinder to understand
> >> > > > more of the internal workings of the fork code.
> >> > > >
> >> > > > But overall I think this patch is a really nice cleanup, and other than
> >> > > > the above minor issue it should be fine with my reliable unwinder, since
> >> > > > rbp is still at the top of the stack.
> >> > >
> >> > > Ok, how about if it pushed RBP first, then we teach get_wchan() to add
> >> > > the fixed offset from thread.sp to get bp?  that way it don't have to
> >> > > push it twice.
> >> >
> >> > In theory I like the idea, and it would work: the unwinder could just
> >> > use the inactive_task_frame struct (as Andy suggested) to find the frame
> >> > pointer.
> >> >
> >> > But I suspect it would break all existing unwinders, both in-tree and
> >> > out-of-tree.  The only out-of-tree one I know of is crash, not sure if
> >> > there are more out there.
> >>
> >> I should mention it would only affect those unwinders which know how to
> >> do sleeping kernel tasks.  So generic tools like gdb wouldn't be
> >> affected.
> >
> > [continuing my conversation with myself...]
> >
> > To clarify, I still think we should do it.  The stack format of a
> > sleeping task isn't exactly an ABI, and I wouldn't expect many tools to
> > rely on it.  I can help with the fixing of in-tree unwinders if needed.
> >
> > Or I could even do the moving of the frame pointer as a separate patch
> > on top of this one, since it might cause breakage elsewhere.
> 
> Do you have any understanding of why there are so many unwinder
> implementations?  Your reliable unwinder seems to be yet another copy
> of more or less the same code.

Yeah, there are way too many instantations of stacktrace_ops and there's
definitely a lot of room for consolidation and simplification.  There
are different requirements needed by all the different codes relying on
dump_trace():

- starting with a given pt_regs
- starting with a given task
- whether to skip sched code functions
- whether to skip the random '?' ktext addresses found on the stack
- whether frame pointers are enabled
- whether the stack is reliable
- output to an arch-independent "struct stack_trace" array

So everybody implements their own callbacks for dump_trace().  It's kind
of a big mess.

> I'd like to see a single, high-quality unwinder implemented as a state
> machine, along the lines of:
> 
> struct unwind_state state;
> unwind_start_inactive_task(&state, ...); or
> unwind_start_pt_regs(&state, regs); or whatever.
> unwind_next_frame(&state);
> 
> where, after unwind_next_frame, state encodes whatever registers are
> known (at least bp and ip, but all the GPRs would be nice and are
> probably mandatory for DWARF) and an indication of whether this is a
> real frame or a guessed frame (the things that currently show up as
> '?').

I like the idea of a state machine.  I'll probably end up doing
something like that before introducing the DWARF unwinder.

-- 
Josh