lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 12 Jan 2019 12:09:14 +1100
From:   Balbir Singh <bsingharora@...il.com>
To:     Joe Lawrence <joe.lawrence@...hat.com>
Cc:     linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
        live-patching@...r.kernel.org, Torsten Duwe <duwe@....de>,
        Michael Ellerman <mpe@...erman.id.au>,
        Jiri Kosina <jkosina@...e.cz>
Subject: Re: ppc64le reliable stack unwinder and scheduled tasks

On Thu, Jan 10, 2019 at 04:14:00PM -0500, Joe Lawrence wrote:
> Hi all,
> 
> tl;dr: On ppc64le, what is top-most stack frame for scheduled tasks
>        about?
> 
> I am looking at a bug in which ~10% of livepatch tests on RHEL-7 and
> RHEL-8 distro kernels, the ppc64le reliable stack unwinder consistently
> reports an unreliable stack for a given task.  This condition can
> eventually resolve itself, but sometimes this state remains wedged for
> hours and I can live-debug the system with crash-utility.
>

In summary, you think the reliable stack tracer is being conservative?
xmon (built in debugger) also allows for live-debugging and might
be more easier to get some of this done.
 
> I have tried reproducing with upstream 4.20 kernel, but haven't seen
> this exact condition yet.  More on upstream in a bit.
> 
> That said, I have a question about the ppc64 stack frame layout with
> regard to scheduled tasks.  In each sticky "unreliable" stack trace
> instance that I've looked at, the task's stack has an interesting
> frame at the top:
>

Could you please define interesting frame on top a bit more? Usually
the topmost return address is in LR
 
> 
> Example 1 (RHEL-7)
> ==================
> 
> crash> struct task_struct.thread c00000022fd015c0 | grep ksp
>     ksp = 0xc0000000288af9c0
> 
> crash> rd 0xc0000000288af9c0 -e 0xc0000000288b0000
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[0]:
> 
> c0000000288af9c0:  c0000000288afb90 0000000000dd0000   ...(............
> c0000000288af9d0:  c000000000002a94 c000000001c60a00   .*..............
> 
>         crash> sym c000000000002a94
>         c000000000002a94 (T) hardware_interrupt_common+0x114

What does bt look like in this case? what does r1 point to? I would
look at xmon and see what the "t" (stack trace) looks like, I think
it's a more familiar tool.

> 
> c0000000288af9e0:  c000000001c60a80 0000000000000000   ................
> c0000000288af9f0:  c0000000288afbc0 0000000000dd0000   ...(............
> c0000000288afa00:  c0000000014322e0 c000000001c60a00   ."C.............
> c0000000288afa10:  c0000002303ae380 c0000002303ae380   ..:0......:0....
> c0000000288afa20:  7265677368657265 0000000000002200   erehsger."......
> 
>         Uh-oh...
> 
>         /* Mark stacktraces with exception frames as unreliable. */
>         stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
> 
> c0000000288afa30:  c0000000001240ac c0000000288afcb0   .@.........(....
> c0000000288afa40:  c0000000013e4d00 0000000000000000   .M>.............
> c0000000288afa50:  0000000000000001 0000000000000001   ................
> c0000000288afa60:  c0000000014322e0 0000000000000804   ."C.............
> c0000000288afa70:  c0000000288ac080 c00000022fd015c0   ...(......./....
> c0000000288afa80:  c0000000288afae0 00000000ffffffff   ...(............
> c0000000288afa90:  c0000000288afae0 c000000007b80900   ...(............
> c0000000288afaa0:  c000000000e90a00 c000000000e90a00   ................
> c0000000288afab0:  c0000000001240ac c000000000e90a00   .@..............
> c0000000288afac0:  c000000000e90a00 c000000004790580   ..........y.....
> c0000000288afad0:  0000000000000001 c000000000e90a00   ................
> c0000000288afae0:  0000000000000000 0000000000000004   ................
> c0000000288afaf0:  c00000022fd01ad0 c000000000e73be0   .../.....;......
> c0000000288afb00:  c00000023900f450 c000000001488190   P..9......H.....
> c0000000288afb10:  0000000000ad0000 c00000023900ef40   ........@.......
> c0000000288afb20:  c0000000288ac000 c000000000e73be0   ...(.....;......
> c0000000288afb30:  c00000000001b514 c00000022fd01628   ........(../....
> c0000000288afb40:  c0000000288afbb0 c000000000008800   ...(............
> c0000000288afb50:  c000000000162880 0000000000000000   .(..............
> c0000000288afb60:  00000000240f3022 0000000000000004   "0.$............
> c0000000288afb70:  c000000000e90a00 c00000022fd01a20   ........ ../....
> c0000000288afb80:  c0000000288afbf0 c0000000014322e0   ...(....."C.....
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[1]:
> 
> c0000000288afb90:  c0000000288afbf0 c000000001960a00   ...(............
> c0000000288afba0:  c00000000001b514 0000000000000004   ................
> 
>         crash> sym c00000000001b514
>         c00000000001b514 (T) __switch_to+0x264
> 
> c0000000288afbb0:  c000000000e90a00 0000000000000000   ................
> c0000000288afbc0:  c0000000288ac000 c0000000014322e0   ...(....."C.....
> c0000000288afbd0:  c000000000e90a00 c000000001960a00   ................
> c0000000288afbe0:  c00000022fd015c0 c00000023900ef40   .../....@.......
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[2]:
> 
> c0000000288afbf0:  c0000000288afcd0 c000000000003300   ...(.....3......
> c0000000288afc00:  c000000000a83918 c0000000013e4d00   .9.......M>.....
> 
>         crash> sym c000000000a83918
>         c000000000a83918 (t) __schedule+0x428
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[3]:
> 
> c0000000288afcd0:  c0000000288afd80 0000000000003300   ...(.....3......
> c0000000288afce0:  c0000000001240ac 0000000000000000   .@..............
> 
>         crash> sym c0000000001240ac
>         c0000000001240ac (t) rescuer_thread+0x3ac
> 
> 	   [ ... and so on as we would normally expect ... ]
> 
> 
> Example 2 - (RHEL-7)
> ====================
> 
> crash> struct task_struct.thread c00000004a660a00 | grep ksp
>     ksp = 0xc0000005e85439c0,
> 
> crash> rd 0xc0000005e8543b90 -e 0xc0000005e8544000
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[0]:
> c0000005e85439c0:  c0000005e8543b90 0000000000000000   .;T.............
> c0000005e85439d0:  c000000000002a94 0000000000000000   .*..............
> 
>         crash> sym c000000000002a94
>         c000000000002a94 (T) hardware_interrupt_common+0x114
> 
> c0000005e8543a20:  7265677368657265 c0000000013e4d00   erehsger.M>.....
> 
>         /* Mark stacktraces with exception frames as unreliable. */
>         stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[1]:
> c0000005e8543b90:  c0000005e8543bf0 c000000001960a00   .;T.............
> c0000005e8543ba0:  c00000000001b514 0000000000000004   ................
> 
>         crash> sym c00000000001b514
>         c00000000001b514 (T) __switch_to+0x264
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[2]:
> c0000005e8543bf0:  c0000005e8543cd0 c000000000003300   .<T......3......
> c0000005e8543c00:  c000000000a83918 c0000000013e4d00   .9.......M>.....
> 
>         crash> sym c000000000a83918
>         c000000000a83918 (t) __schedule+0x428
> 
> 	   [ ... and so on as we would normally expect ... ]
> 
> 
> Example 3 (upstream 4.20)
> =========================
> 
> This is not a repro of the sticky "unreliable" stack trace, but just the
> stack of a workqueue that fired and called msleep.
> 
> 
> crash> struct task_struct.thread c0000002257f0e00 | grep ksp
>     ksp = 0xc00000022715f9b0,
> 
> crash> rd 0xc00000022715f9b0 100
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[0]:
> 
> c00000022715f9b0:  c00000022715fb90 c00000023f4e8280   ...'......N?....
> c00000022715f9c0:  c00000022715f9e0 0000000000000003   ...'............
> 
>         crash> sym c00000022715f9e0
>         c00000022715f9e0 (B) _end+0x22594f9e0

That is weird, can you compare this to the output of "bt" or "t"
from xmon?

> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[1]:
> 
> c00000022715fb90:  c00000022715fbf0 c00000000158634c   ...'....LcX.....
> c00000022715fba0:  c00000000001ebfc c000000001318a60   ........`.1.....
> 
>         crash> sym c00000000001ebfc
> 	c00000000001ebfc (T) __switch_to+0x2dc
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[2]:
> 
> c00000022715fbf0:  c00000022715fcd0 c000000000008800   ...'............
> c00000022715fc00:  c000000000abcf40 c0000000013abf00   @.........:.....
> 
>         crash> sym c000000000abcf40
> 	c000000000abcf40 (t) __schedule+0x3b0
> 
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> 
> sp[3]:
> 
> c00000022715fcd0:  c00000022715fd00 2c50fff000000000   ...'..........P,
> c00000022715fce0:  c000000000abd628 c00000023bff57e8   (........W.;....
> 
>         crash> sym c000000000abd628
> 	c000000000abd628 (T) schedule+0x48
> 
> 			    [ ... etc ... ]
> 
> 
> save_stack_trace_tsk_reliable
> =============================
> 
> arch/powerpc/kernel/stacktrace.c :: save_stack_trace_tsk_reliable() does
> take into account the first stackframe, but only to verify that the link
> register is indeed pointing at kernel code address.
> 
> Can someone explain what __switch_to is doing with the stack and whether
> in such circumstances, the reliable stack unwinder should be skipping
> the first frame when verifying the frame contents like STACK_FRAME_MARKER,
> etc.
> 
> I may be on the wrong path in debugging this, but figuring out this
> sp[0] frame state would be helpful.
>

I would compare the output of xmon across the unreliable stack frames with
the contents of what the stack unwinder has.

I suspect the patch is stuck trying to transition to enabled state, it'll
be interesting to see if we are really stuck

Balbir Singh.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ