lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <AC94227C-28D4-4745-9EC8-14EC59766DAD@gmail.com>
Date:	Thu, 15 Oct 2015 22:39:26 +0900
From:	Jungseok Lee <jungseoklee85@...il.com>
To:	AKASHI Takahiro <takahiro.akashi@...aro.org>
Cc:	James Morse <james.morse@....com>, catalin.marinas@....com,
	will.deacon@....com, linux-arm-kernel@...ts.infradead.org,
	mark.rutland@....com, barami97@...il.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 2/2] arm64: Expand the stack trace feature to support IRQ stack

On Oct 15, 2015, at 1:19 PM, AKASHI Takahiro wrote:
> Jungseok,

Hi Akashi,

> On 10/14/2015 09:55 PM, Jungseok Lee wrote:
>> On Oct 14, 2015, at 9:24 PM, Jungseok Lee wrote:
>>> On Oct 14, 2015, at 4:13 PM, AKASHI Takahiro wrote:
>>>> On 10/09/2015 11:24 PM, James Morse wrote:
>>>>> Hi Jungseok,
>>>>> 
>>>>> On 07/10/15 16:28, Jungseok Lee wrote:
>>>>>> Currently, a call trace drops a process stack walk when a separate IRQ
>>>>>> stack is used. It makes a call trace information much less useful when
>>>>>> a system gets paniked in interrupt context.
>>>>> 
>>>>> panicked
>>>>> 
>>>>>> This patch addresses the issue with the following schemes:
>>>>>> 
>>>>>>  - Store aborted stack frame data
>>>>>>  - Decide whether another stack walk is needed or not via current sp
>>>>>>  - Loosen the frame pointer upper bound condition
>>>>> 
>>>>> It may be worth merging this patch with its predecessor - anyone trying to
>>>>> bisect a problem could land between these two patches, and spend time
>>>>> debugging the truncated call traces.
>>>>> 
>>>>> 
>>>>>> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
>>>>>> index 6ea82e8..e5904a1 100644
>>>>>> --- a/arch/arm64/include/asm/irq.h
>>>>>> +++ b/arch/arm64/include/asm/irq.h
>>>>>> @@ -2,13 +2,25 @@
>>>>>> #define __ASM_IRQ_H
>>>>>> 
>>>>>> #include <linux/irqchip/arm-gic-acpi.h>
>>>>>> +#include <asm/stacktrace.h>
>>>>>> 
>>>>>> #include <asm-generic/irq.h>
>>>>>> 
>>>>>> struct irq_stack {
>>>>>> 	void *stack;
>>>>>> +	struct stackframe frame;
>>>>>> };
>>>>>> 
>>>>>> +DECLARE_PER_CPU(struct irq_stack, irq_stacks);
>>>>> 
>>>>> Good idea, storing this in the per-cpu data makes it immune to stack
>>>>> corruption.
>>>> 
>>>> Is this the only reason that you have a dummy stack frame in per-cpu data?
>>>> By placing this frame in an interrupt stack, I think, we will be able to eliminate
>>>> changes in dump_stace(). and
>>>> 
>>>>> 
>>>>>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>>>>>> index 407991b..5124649 100644
>>>>>> --- a/arch/arm64/kernel/stacktrace.c
>>>>>> +++ b/arch/arm64/kernel/stacktrace.c
>>>>>> @@ -43,7 +43,27 @@ int notrace unwind_frame(struct stackframe *frame)
>>>>>> 	low  = frame->sp;
>>>>>> 	high = ALIGN(low, THREAD_SIZE);
>>>>>> 
>>>>>> -	if (fp < low || fp > high - 0x18 || fp & 0xf)
>>>>>> +	/*
>>>>>> +	 * A frame pointer would reach an upper bound if a prologue of the
>>>>>> +	 * first function of call trace looks as follows:
>>>>>> +	 *
>>>>>> +	 *	stp     x29, x30, [sp,#-16]!
>>>>>> +	 *	mov     x29, sp
>>>>>> +	 *
>>>>>> +	 * Thus, the upper bound is (top of stack - 0x20) with consideration
>>>>> 
>>>>> The terms 'top' and 'bottom' of the stack are confusing, your 'top' appears
>>>>> to be the highest address, which is used first, making it the bottom of the
>>>>> stack.
>>>>> 
>>>>> I would try to use the terms low/est and high/est, in keeping with the
>>>>> variable names in use here.
>>>>> 
>>>>> 
>>>>>> +	 * of a 16-byte empty space in THREAD_START_SP.
>>>>>> +	 *
>>>>>> +	 * The value, 0x20, however, does not cover all cases as interrupts
>>>>>> +	 * are handled using a separate stack. That is, a call trace can start
>>>>>> +	 * from elx_irq exception vectors. The symbols could not be promoted
>>>>>> +	 * to candidates for a stack trace under the restriction, 0x20.
>>>>>> +	 *
>>>>>> +	 * The scenario is handled without complexity as 1) considering
>>>>>> +	 * (bottom of stack + THREAD_START_SP) as a dummy frame pointer, the
>>>>>> +	 * content of which is 0, and 2) allowing the case, which changes
>>>>>> +	 * the value to 0x10 from 0x20.
>>>>> 
>>>>> Where has 0x20 come from? The old value was 0x18.
>>>>> 
>>>>> My understanding is the highest part of the stack looks like this:
>>>>> high        [ off-stack ]
>>>>> high - 0x08 [ left free by THREAD_START_SP ]
>>>>> high - 0x10 [ left free by THREAD_START_SP ]
>>>>> high - 0x18 [#1 x30 ]
>>>>> high - 0x20 [#1 x29 ]
>>>>> 
>>>>> So the condition 'fp > high - 0x18' prevents returning either 'left free'
>>>>> address, or off-stack-value as a frame. Changing it to 'fp > high - 0x10'
>>>>> allows the first half of that reserved area to be a valid stack frame.
>>>>> 
>>>>> This change is breaking perf using incantations [0] and [1]:
>>>>> 
>>>>> Before, with just patch 1/2:
>>>>>                  ---__do_softirq
>>>>>                     |
>>>>>                     |--92.95%-- __handle_domain_irq
>>>>>                     |          __irqentry_text_start
>>>>>                     |          el1_irq
>>>>>                     |
>>>>> 
>>>>> After, with both patches:
>>>>>                 ---__do_softirq
>>>>>                    |
>>>>>                    |--83.83%-- __handle_domain_irq
>>>>>                    |          __irqentry_text_start
>>>>>                    |          el1_irq
>>>>>                    |          |
>>>>>                    |          |--99.39%-- 0x400008040d00000c
>>>>>                    |           --0.61%-- [...]
>>>>>                    |
>>>> 
>>>> This also shows that walk_stackframe() doesn't walk through a process stack.
>>>> Now I'm trying the following hack on top of Jungseok's patch.
>>>> (It doesn't traverse from an irq stack to an process stack yet. I need modify
>>>> unwind_frame().)
>>> 
>>> I've got a difference between perf and dump_backtrace() as reviewing perf call
>>> chain operation. Perf relies on walk_stackframe(), but dump_backtrace() does not.
>>> That is, a symbol is printed out *before* unwind_frame() call in case of perf.
>>> By contrast, dump_backtrace() records a symbol *after* unwind_frame(). I think
>>> perf behavior is correct since frame.pc is retrieved from a valid stack frame.
>>> 
>>> So, the following diff is a prerequisite. It looks reasonable to remove dump_mem()
>>> call since frame.sp is calculated incorrectly now. If accepted, dump_backtrace()
>>> could utilize walk_stackframe(), which simplifies the code.
>>> 
>>> ----8<----
>>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>>> index f93aae5..e18be43 100644
>>> --- a/arch/arm64/kernel/traps.c
>>> +++ b/arch/arm64/kernel/traps.c
>>> @@ -103,12 +103,15 @@ static void dump_mem(const char *lvl, const char *str, unsigned long bottom,
>>>        set_fs(fs);
>>> }
>>> 
>>> -static void dump_backtrace_entry(unsigned long where, unsigned long stack)
>>> +static void dump_backtrace_entry(unsigned long where)
>>> {
>>> +       /*
>>> +        * PC has a physical address when MMU is disabled.
>>> +        */
>>> +       if (!kernel_text_address(where))
>>> +               where = (unsigned long)phys_to_virt(where);
>>> +
>>>        print_ip_sym(where);
>>> -       if (in_exception_text(where))
>>> -               dump_mem("", "Exception stack", stack,
>>> -                        stack + sizeof(struct pt_regs), false);
>>> }
>>> 
>>> static void dump_instr(const char *lvl, struct pt_regs *regs)
>>> @@ -172,12 +175,17 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>>>        pr_emerg("Call trace:\n");
>>>        while (1) {
>>>                unsigned long where = frame.pc;
>>> +               unsigned long stack;
>>>                int ret;
>>> 
>>> +               dump_backtrace_entry(where);
>>>                ret = unwind_frame(&frame);
>>>                if (ret < 0)
>>>                        break;
>>> -               dump_backtrace_entry(where, frame.sp);
>>> +               stack = frame.sp;
>>> +               if (in_exception_text(where))
>>> +                       dump_mem("", "Exception stack", stack,
>>> +                                stack + sizeof(struct pt_regs), false);
>>>        }
>>> }
>>> ----8<----
>>> 
>>>> Thanks,
>>>> -Takahiro AKASHI
>>>> ----8<----
>>>> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
>>>> index 650cc05..5fbd1ea 100644
>>>> --- a/arch/arm64/kernel/entry.S
>>>> +++ b/arch/arm64/kernel/entry.S
>>>> @@ -185,14 +185,12 @@ alternative_endif
>>>> 	mov	x23, sp
>>>> 	and	x23, x23, #~(THREAD_SIZE - 1)
>>>> 	cmp	x20, x23			// check irq re-enterance
>>>> +	mov	x19, sp
>>>> 	beq	1f
>>>> -	str	x29, [x19, #IRQ_FRAME_FP]
>>>> -	str	x21, [x19, #IRQ_FRAME_SP]
>>>> -	str	x22, [x19, #IRQ_FRAME_PC]
>>>> -	mov	x29, x24
>>>> -1:	mov	x19, sp
>>>> -	csel	x23, x19, x24, eq		// x24 = top of irq stack
>>>> -	mov	sp, x23
>>>> +	mov	sp, x24				// x24 = top of irq stack
>>>> +	stp	x29, x22, [sp, #-32]!
>>>> +	mov	x29, sp
>>>> +1:
>>>> 	.endm
>>>> 
>>>> 	/*
>>> 
>>> Is it possible to decide which stack is used without aborted SP information?
>> 
>> We could know which stack is used via current SP, but how could we decide
>> a variable 'low' in unwind_frame() when walking a process stack?
> 
> The following patch, replacing your [PATCH 2/2], seems to work nicely,
> traversing from interrupt stack to process stack. I tried James' method as well
> as "echo c > /proc/sysrq-trigger."

Great thanks!

Since I'm favor of your approach, I've played with this patch instead of my one.
A kernel panic is observed when using 'perf record with -g option' and sysrq.
I guess some other changes are on your tree..

Please refer to my analysis.

> The only issue that I have now is that dump_backtrace() does not show
> correct "pt_regs" data on process stack (actually it dumps interrupt stack):
> 
> CPU1: stopping
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D         4.3.0-rc5+ #24
> Hardware name: ARM Arm Versatile Express/Arm Versatile Express, BIOS 11:37:19 Jul 16 2015
> Call trace:
> [<ffffffc00008a7b0>] dump_backtrace+0x0/0x19c
> [<ffffffc00008a968>] show_stack+0x1c/0x28
> [<ffffffc0003936d0>] dump_stack+0x88/0xc8
> [<ffffffc00008fdf8>] handle_IPI+0x258/0x268
> [<ffffffc000082530>] gic_handle_irq+0x88/0xa4
> Exception stack(0xffffffc87b1bffa0 to 0xffffffc87b1c00c0) <== HERE
> ffa0: ffffffc87b18fe30 ffffffc87b1bc000 ffffffc87b18ff50 ffffffc000086ac8
> ffc0: ffffffc87b18c000 afafafafafafafaf ffffffc87b18ff50 ffffffc000086ac8
> ffe0: ffffffc87b18ff50 ffffffc87b18ff50 afafafafafafafaf afafafafafafafaf
> 0000: 0000000000000000 ffffffffffffffff ffffffc87b195c00 0000000200000002
> 0020: 0000000057ac6e9d afafafafafafafaf afafafafafafafaf afafafafafafafaf
> 0040: afafafafafafafaf afafafafafafafaf afafafafafafafaf afafafafafafafaf
> 0060: afafafafafafafaf afafafafafafafaf afafafafafafafaf afafafafafafafaf
> 0080: afafafafafafafaf afafafafafafafaf afafafafafafafaf afafafafafafafaf
> 00a0: afafafafafafafaf afafafafafafafaf afafafafafafafaf afafafafafafafaf
> [<ffffffc0000855e0>] el1_irq+0xa0/0x114
> [<ffffffc000086ac4>] arch_cpu_idle+0x14/0x20
> [<ffffffc0000fc110>] default_idle_call+0x1c/0x34
> [<ffffffc0000fc464>] cpu_startup_entry+0x2cc/0x30c
> [<ffffffc00008f7c4>] secondary_start_kernel+0x120/0x148
> [<ffffffc0000827a8>] secondary_startup+0x8/0x20

My 'dump_backtrace() rework' patch is in your working tree. Right?

> 
> Thanks,
> -Takahiro AKASHI
> 
> ----8<----
> From 1aa8d4e533d44099f69ff761acfa3c1045a00796 Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi@...aro.org>
> Date: Thu, 15 Oct 2015 09:04:10 +0900
> Subject: [PATCH] arm64: revamp unwind_frame for interrupt stack
> 
> This patch allows unwind_frame() to traverse from interrupt stack
> to process stack correctly by having a dummy stack frame for irq_handler
> created at its prologue.
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@...aro.org>
> ---
> arch/arm64/kernel/entry.S      |   22 ++++++++++++++++++++--
> arch/arm64/kernel/stacktrace.c |   14 +++++++++++++-
> 2 files changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 6d4e8c5..25cabd9 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -185,8 +185,26 @@ alternative_endif
> 	and	x23, x23, #~(THREAD_SIZE - 1)
> 	cmp	x20, x23			// check irq re-enterance
> 	mov	x19, sp
> -	csel	x23, x19, x24, eq		// x24 = top of irq stack
> -	mov	sp, x23
> +	beq	1f
> +	mov	sp, x24				// x24 = top of irq stack
> +	stp	x29, x21, [sp, #-16]!		// for sanity check
> +	stp	x29, x22, [sp, #-16]!		// dummy stack frame
> +	mov	x29, sp
> +1:
> +	/*
> +	 * Layout of interrupt stack after this macro is invoked:
> +	 *
> +	 *     |                |
> +	 *-0x20+----------------+ <= dummy stack frame
> +	 *     |      fp        |    : fp on process stack
> +	 *-0x18+----------------+
> +	 *     |      lr        |    : return address
> +	 *-0x10+----------------+
> +	 *     |    fp (copy)   |    : for sanity check
> +	 * -0x8+----------------+
> +	 *     |      sp        |    : sp on process stack
> +	 *  0x0+----------------+
> +	 */
> 	.endm
> 
> 	/*
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 407991b..03611a1 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -43,12 +43,24 @@ int notrace unwind_frame(struct stackframe *frame)
> 	low  = frame->sp;
> 	high = ALIGN(low, THREAD_SIZE);
> 
> -	if (fp < low || fp > high - 0x18 || fp & 0xf)
> +	if (fp < low || fp > high - 0x20 || fp & 0xf)
> 		return -EINVAL;

IMO, this condition should be changes as follows.

	if (fp < low || fp > high - 0x20 || fp & 0xf || !fp)

Please refer to the below for details.

> 
> 	frame->sp = fp + 0x10;
> 	frame->fp = *(unsigned long *)(fp);
> 	/*
> +	 * check whether we are going to walk trough from interrupt stack
> +	 * to process stack
> +	 * If the previous frame is the initial (dummy) stack frame on
> +	 * interrupt stack, frame->sp now points to just below the frame
> +	 * (dummy frame + 0x10).
> +	 * See entry.S
> +	 */
> +#define STACK_LOW(addr) round_down((addr), THREAD_SIZE)
> +	if ((STACK_LOW(frame->sp) != STACK_LOW(frame->fp)) &&
> +			(frame->fp == *(unsigned long *)frame->sp))
> +		frame->sp = *((unsigned long *)(frame->sp + 8));

An original intention seems to catch a stack change from IRQ stack to process one.
Unfortunately, this condition hits when the last of stack frame of swapper is
retrieved. This leads to NULL pointer access due to the following code snippet.

ENTRY(__secondary_switched)
        ldr     x0, [x21]                       // get secondary_data.stack
        mov     sp, x0
        mov     x29, #0
        b       secondary_start_kernel
ENDPROC(__secondary_switched)

This is why x29 should be checked.

Best Regards
Jungseok Lee--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ