[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160429212546.t26mvthtvh7543ff@treble>
Date: Fri, 29 Apr 2016 16:25:46 -0500
From: Josh Poimboeuf <jpoimboe@...hat.com>
To: Andy Lutomirski <luto@...capital.net>
Cc: Jessica Yu <jeyu@...hat.com>, Jiri Kosina <jikos@...nel.org>,
Miroslav Benes <mbenes@...e.cz>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Michael Ellerman <mpe@...erman.id.au>,
Heiko Carstens <heiko.carstens@...ibm.com>,
live-patching@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
X86 ML <x86@...nel.org>, linuxppc-dev@...ts.ozlabs.org,
"linux-s390@...r.kernel.org" <linux-s390@...r.kernel.org>,
Vojtech Pavlik <vojtech@...e.com>, Jiri Slaby <jslaby@...e.cz>,
Petr Mladek <pmladek@...e.com>,
Chris J Arges <chris.j.arges@...onical.com>,
Andy Lutomirski <luto@...nel.org>
Subject: Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ
tracking
On Fri, Apr 29, 2016 at 01:32:53PM -0700, Andy Lutomirski wrote:
> On Fri, Apr 29, 2016 at 1:27 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> > On Fri, Apr 29, 2016 at 01:19:23PM -0700, Andy Lutomirski wrote:
> >> On Fri, Apr 29, 2016 at 1:11 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> >> > On Fri, Apr 29, 2016 at 11:06:53AM -0700, Andy Lutomirski wrote:
> >> >> On Thu, Apr 28, 2016 at 1:44 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> >> >> > A preempted function might not have had a chance to save the frame
> >> >> > pointer to the stack yet, which can result in its caller getting skipped
> >> >> > on a stack trace.
> >> >> >
> >> >> > Add a flag to indicate when the task has been preempted so that stack
> >> >> > dump code can determine whether the stack trace is reliable.
> >> >>
> >> >> I think I like this, but how do you handle the rather similar case in
> >> >> which a task goes to sleep because it's waiting on IO that happened in
> >> >> response to get_user, put_user, copy_from_user, etc?
> >> >
> >> > Hm, good question. I was thinking that page faults had a dedicated
> >> > stack, but now looking at the entry and traps code, that doesn't seem to
> >> > be the case.
> >> >
> >> > Anyway I think it shouldn't be a problem if we make sure that any kernel
> >> > function which might trigger a valid page fault (e.g.,
> >> > copy_user_generic_string) do the proper frame pointer setup first. Then
> >> > the stack should still be reliable.
> >> >
> >> > In fact I might be able to teach objtool to enforce that: any function
> >> > which uses an exception table should create a stack frame.
> >> >
> >> > Or alternatively, maybe set some kind of flag for page faults, similar
> >> > to what I did with this patch.
> >> >
> >>
> >> How about doing it the other way around: teach the unwinder to detect
> >> when it hits a non-outermost entry (i.e. it lands in idtentry, etc)
> >> and use some reasonable heuristic as to whether it's okay to keep
> >> unwinding. You should be able to handle preemption like that, too --
> >> the unwind process will end up in an IRQ frame.
> >
> > How exactly would the unwinder detect if a text address is in an
> > idtentry? Maybe put all the idt entries in a special ELF section?
> >
>
> Hmm.
>
> What actually happens when you unwind all the way into the entry code?
> Don't you end up in something that isn't in an ELF function? Can you
> detect that?
For entry from user space (e.g., syscalls), it's easy to detect because
there's always a pt_regs at the bottom of the stack. So if the unwinder
reaches the stack address at (thread.sp0 - sizeof(pt_regs)), it knows
it's done.
But for nested entry (e.g. in-kernel irqs/exceptions like preemption and
page faults which don't have dedicated stacks), where the pt_regs is
stored somewhere in the middle of the stack instead of the bottom,
there's no reliable way to detect that.
> Ideally, the unwinder could actually detect that it's
> hit a pt_regs struct and report that. If used for stack dumps, it
> could display some indication of this and then continue its unwinding
> by decoding the pt_regs. If used for patching, it could take some
> other appropriate action.
>
> I would have no objection to annotating all the pt_regs-style entry
> code, whether by putting it in a separate section or by making a table
> of addresses.
I think the easiest way to make it work would be to modify the idtentry
macro to put all the idt entries in a dedicated section. Then the
unwinder could easily detect any calls from that code.
> There are a couple of nasty cases if NMI or MCE is involved but, as of
> 4.6, outside of NMI, MCE, and vmalloc faults (ugh!), there should
> always be a complete pt_regs on the stack before interrupts get
> enabled for each entry. Of course, finding the thing may be
> nontrivial in case other things were pushed.
NMI, MCE and interrupts aren't a problem because they have dedicated
stacks, which are easy to detect. If the tasks' stack is on an
exception stack or an irq stack, we consider it unreliable.
And also, they don't sleep. The stack of any running task (other than
current) is automatically considered unreliable anyway, since they could
be modifying it while we're reading it.
> I suppose we could try to rejigger the code so that rbp points to
> pt_regs or similar.
I think we should avoid doing something like that because it would break
gdb and all the other unwinders who don't know about it.
--
Josh
Powered by blists - more mailing lists