lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140427120820.GC22116@gmail.com>
Date:	Sun, 27 Apr 2014 14:08:20 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Richard Yao <ryao@...too.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tejun Heo <tj@...nel.org>, Vineet Gupta <vgupta@...opsys.com>,
	Jesper Nilsson <jesper.nilsson@...s.com>,
	Jiri Slaby <jslaby@...e.cz>, linux-kernel@...r.kernel.org,
	kernel@...too.org, Brian Behlendorf <behlendorf1@...l.gov>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Jiri Olsa <jolsa@...hat.com>
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers


* Richard Yao <ryao@...too.org> wrote:

> Stack traces are generated by scanning the stack and interpeting 
> anything that looks like it could be a pointer to something. We do 
> not need to do this when we have frame pointers, but we do it 
> anyway, with the distinction that we use the return pointers to mark 
> actual frames by the absence of a question mark.
> 
> The additional verbosity of stack scanning tends to bombard us with 
> walls of text for no gain in practice, so lets switch to printing 
> only stack frames when frame pointers are available. That we can 
> spend less time reading stack traces and more time looking at code.
> 
> Signed-off-by: Richard Yao <ryao@...too.org>
> ---
>  arch/x86/kernel/dumpstack.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index d9c12d3..94ffe06 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
>  static const struct stacktrace_ops print_trace_ops = {
>  	.stack			= print_trace_stack,
>  	.address		= print_trace_address,
> +#ifdef CONFIG_FRAME_POINTER
> +	.walk_stack		= print_context_stack_bp,
> +#else
>  	.walk_stack		= print_context_stack,
> +#endif
>  };

I don't really like this patch, and this question comes up often, so 
let me try to explain it more verbosely.

Lets take a real-life example of a BP-only trace about a lockup:

 Call Trace:
  [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
  [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
  [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
  [<790391e8>] prepare_to_wait+0x18/0x57
  [<792b7158>] __wait_on_bit+0x20/0x5f
  [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
  [<790e4eff>] __wait_on_buffer+0x1b/0x1e
  [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
  [<7911015f>] ext3_iget+0x45/0x346
  [<79114e03>] ext3_lookup+0x97/0xa6
  [<790c9c0e>] lookup_real+0x20/0x35
  [<790ca1d2>] __lookup_hash+0x2a/0x31
  [<790caa0e>] lookup_slow+0x36/0x8c
  [<790cbdda>] path_lookupat+0xf9/0x5c5
  [<790cc2c5>] filename_lookup+0x1f/0x84
  [<790cde98>] user_path_at_empty+0x3f/0x65
  [<790cdecb>] user_path_at+0xd/0xf
  [<790c6585>] vfs_fstatat+0x40/0x88
  [<790c65f8>] vfs_stat+0x13/0x15
  [<790c67e2>] sys_stat64+0x11/0x22
  [<792b9d04>] syscall_call+0x7/0xb

What does this call trace tell us? That we locked up somewhere in the 
ext3 code. That's all that we know.

Now lets take a look at a 'verbose' entry of the same lockup, that 
also outputs other entries that 'look like' kernel text addresses:

 Call Trace:
  [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
  [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
  [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
  [<790391e8>] ? prepare_to_wait+0x18/0x57
  [<790391e8>] prepare_to_wait+0x18/0x57
  [<7914a320>] ? generic_make_request+0x80/0xb5
  [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
  [<792b7158>] __wait_on_bit+0x20/0x5f
  [<7914a427>] ? submit_bio+0xd2/0xfb
  [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
  [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
  [<79039086>] ? autoremove_wake_function+0x31/0x31
  [<790e4eff>] __wait_on_buffer+0x1b/0x1e
  [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
  [<7911015f>] ext3_iget+0x45/0x346
  [<79114e03>] ext3_lookup+0x97/0xa6
  [<790c9c0e>] lookup_real+0x20/0x35
  [<790ca1d2>] __lookup_hash+0x2a/0x31
  [<790caa0e>] lookup_slow+0x36/0x8c
  [<790cbdda>] path_lookupat+0xf9/0x5c5
  [<790cc2c5>] filename_lookup+0x1f/0x84
  [<790cde98>] user_path_at_empty+0x3f/0x65
  [<790cdecb>] user_path_at+0xd/0xf
  [<790c6585>] vfs_fstatat+0x40/0x88
  [<7903f844>] ? lg_local_unlock+0x31/0x47
  [<790c65f8>] vfs_stat+0x13/0x15
  [<790c67e2>] sys_stat64+0x11/0x22
  [<790c3c47>] ? __fput+0x187/0x1a0
  [<792b9d37>] ? restore_all+0xf/0xf
  [<79165bb4>] ? trace_hardirqs_on_thunk+0xc/0x10
  [<792b9d04>] syscall_call+0x7/0xb

Firstly, it's still easy to tell the true backtrace: ignore all the 
'?' entries.

But the '?' entries also tell us something more: that recently this 
CPU submitted IO. That might or might not be suspected from the 
original trace, but from the 'extended' trace it's really obvious that 
IO activity happened before the lockup as well.

There were many other examples in the past where '?' entries gave us 
clues about the nature of a bug: they represent a poor man's trace of 
what happened recently on that CPU.

So it's useful information for hairy bugs and it would be sad to 
remove them.

Having said that, your complaint that '?' entries can make reading of 
back traces more difficult is valid as well - so maybe we can do 
something about that.

For example if we used more horizontal separation in the output 
format:

 Call Trace:
  [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
  [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
  [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
  [<790391e8>]                                   # ? prepare_to_wait+0x18/0x57
  [<790391e8>] prepare_to_wait+0x18/0x57
  [<7914a320>]                                   # ? generic_make_request+0x80/0xb5
  [<790e4f30>]                                   # ? unmap_underlying_metadata+0x2e/0x2e
  [<792b7158>] __wait_on_bit+0x20/0x5f
  [<7914a427>]                                   # ? submit_bio+0xd2/0xfb
  [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
  [<790e4f30>]                                   # ? unmap_underlying_metadata+0x2e/0x2e
  [<79039086>]                                   # ? autoremove_wake_function+0x31/0x31
  [<790e4eff>] __wait_on_buffer+0x1b/0x1e
  [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
  [<7911015f>] ext3_iget+0x45/0x346
  [<79114e03>] ext3_lookup+0x97/0xa6
  [<790c9c0e>] lookup_real+0x20/0x35
  [<790ca1d2>] __lookup_hash+0x2a/0x31
  [<790caa0e>] lookup_slow+0x36/0x8c
  [<790cbdda>] path_lookupat+0xf9/0x5c5
  [<790cc2c5>] filename_lookup+0x1f/0x84
  [<790cde98>] user_path_at_empty+0x3f/0x65
  [<790cdecb>] user_path_at+0xd/0xf
  [<790c6585>] vfs_fstatat+0x40/0x88
  [<7903f844>]                                   # ? lg_local_unlock+0x31/0x47
  [<790c65f8>] vfs_stat+0x13/0x15
  [<790c67e2>] sys_stat64+0x11/0x22
  [<790c3c47>]                                   # ? __fput+0x187/0x1a0
  [<792b9d37>]                                   # ? restore_all+0xf/0xf
  [<79165bb4>]                                   # ? trace_hardirqs_on_thunk+0xc/0x10
  [<792b9d04>] syscall_call+0x7/0xb

then the information would still be there, it would still be in the 
right place, but it would also be much easier to ignore the right side 
column it visually. (The '# ?' prefix would also make it clear that 
this information is not so important as the left hand side column.)

Or something like that. Assuming this doesn't confuse tools and such.

Thanks,

	Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ