linux-kernel - Re: [PATCH 19/19] x86/dumpstack: print any pt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 21 Jul 2016 22:13:03 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Josh Poimboeuf <jpoimboe@...hat.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>,
	"H . Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Brian Gerst <brgerst@...il.com>,
	Kees Cook <keescook@...omium.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Byungchul Park <byungchul.park@....com>
Subject: Re: [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack

On Thu, Jul 21, 2016 at 8:30 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> On Thu, Jul 21, 2016 at 03:32:32PM -0700, Andy Lutomirski wrote:
>> On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
>> > Now that we can find pt_regs registers in the middle of the stack due to
>> > an interrupt or exception, we can print them.  Here's what it looks
>> > like:
>> >
>> >    ...
>> >    [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
>> >    [<ffffffff8189f558>] async_page_fault+0x28/0x30
>> >   RIP: 0010:[<ffffffff814529e2>]  [<ffffffff814529e2>] __clear_user+0x42/0x70
>> >   RSP: 0018:ffff88007876fd38  EFLAGS: 00010202
>> >   RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
>> >   RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
>> >   RBP: ffff88007876fd48 R08: 0000000dc2ced0d0 R09: 0000000000000000
>> >   R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
>> >   R13: 0000000000000000 R14: ffff880078770000 R15: ffff880079947200
>> >    [<ffffffff814529e2>] ? __clear_user+0x42/0x70
>> >    [<ffffffff814529c3>] ? __clear_user+0x23/0x70
>> >    [<ffffffff81452a7b>] clear_user+0x2b/0x40
>> >    ...
>>
>> This looks wrong.  Here are some theories:
>>
>> (a) __clear_user is a reliable address that is indicated by RIP: ....
>> Then it's found again as an unreliable address as "?
>> __clear_user+0x42/0x70" by scanning the stack.  "?
>> __clear_user+0x23/0x70" is a genuine leftover artifact on the stack.
>> In this case, shouldn't "? __clear_user+0x42/0x70" have been
>> suppressed because it matched a reliable address?
>>
>> (b) You actually intended for all the addresses to be printed, in
>> which case "? __clear_user+0x42/0x70" should have been
>> "__clear_user+0x42/0x70" and you have a bug.  In this case, it's
>> plausible that your state machine got a bit lost leading to "?
>> __clear_user+0x23/0x70" as well (i.e. it's not just an artifact --
>> it's a real frame and you didn't find it).
>>
>> (c) Something else and I'm confused.
>
> So there's a subtle difference between addresses reported by regs->ip
> and normal return addresses.  For example:
>
>    ...
>    [<ffffffff8189ff4d>] smp_apic_timer_interrupt+0x3d/0x50
>    [<ffffffff8189de6e>] apic_timer_interrupt+0x9e/0xb0
>   RIP: 0010:[<ffffffff8129b350>]  [<ffffffff8129b350>] path_init+0x0/0x750
>   RSP: 0018:ffff880036a3fd80  EFLAGS: 00000296
>   RAX: ffff88003691aa40 RBX: ffff880036a3ff08 RCX: ffff880036a3ff08
>   RDX: ffff880036a3ff08 RSI: 0000000000000041 RDI: ffff880036a3fdb0
>   RBP: ffff880036a3fda0 R08: 0000000000000000 R09: 0000000000000010
>   R10: 8080808080808080 R11: fefefefefefefeff R12: ffff880036a3fdb0
>   R13: 0000000000000001 R14: ffff880036a3ff08 R15: 0000000000000000
>    <EOI>
>    [<ffffffff8129b350>] ? lookup_fast+0x3d0/0x3d0
>    [<ffffffff8129c81b>] ? path_lookupat+0x1b/0x120
>    [<ffffffff8129ddd1>] filename_lookup+0xb1/0x180
>    ...
>
> In this case the irq hit right after path_lookupat() called into
> path_init().  So the "path_init+0x0" printed by __show_regs() is right.
>
> Note the backtrace reports the same address, but it instead describes it
> as "lookup_fast+0x3d0", which is the end of lookup_fast().  That's
> because normally, such an address after a call instruction at the end of
> a function would indicate a tail call (e.g., to a noreturn function).
> If that were the case, printing "path_init+0x0" would be completely
> wrong, because path_init() just happens to be the function located
> immediately after lookup_fast().
>
> Maybe I could add some special logic to say: "if this return address was
> from a call, use printk_stack_address(); else if it was from a pt_regs,
> use printk_address()."  (The former prints the preceding function, the
> latter prints the current function.)  Then we could remove the question
> mark.
>
> There's also the question of whether or not the address should be
> printed again, after it's already been printed by __show_regs().  I
> don't have a strong opinion either way.
>

IIRC we don't show the actual faulting function in the call trace, so
we probably shouldn't duplicate the entry after the show_regs.

That being said, I'm still confused by the question marks.  What
exactly is going on?  Is the code really doing the right thing wrt
resuming the unwind?  Is there a git tree with these patches applied
somewhere so I can look at it easily in context?

--Andy