linux-kernel - Re: [PATCH v15 4/6] arm64: Introduce stack trace reliability checks in the unwinder

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YrgZkz7BA1U09gUC@FVFF77S0Q05N>
Date:   Sun, 26 Jun 2022 09:32:19 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     madvenka@...ux.microsoft.com
Cc:     broonie@...nel.org, jpoimboe@...hat.com, ardb@...nel.org,
        nobuta.keiya@...itsu.com, sjitindarsingh@...il.com,
        catalin.marinas@....com, will@...nel.org,
        jamorris@...ux.microsoft.com, linux-arm-kernel@...ts.infradead.org,
        live-patching@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v15 4/6] arm64: Introduce stack trace reliability checks
 in the unwinder

On Fri, Jun 17, 2022 at 04:07:15PM -0500, madvenka@...ux.microsoft.com wrote:
> From: "Madhavan T. Venkataraman" <madvenka@...ux.microsoft.com>
> 
> There are some kernel features and conditions that make a stack trace
> unreliable. Callers may require the unwinder to detect these cases.
> E.g., livepatch.
> 
> Introduce a new function called unwind_check_reliability() that will
> detect these cases and set a flag in the stack frame. Call
> unwind_check_reliability() for every frame in unwind().
> 
> Introduce the first reliability check in unwind_check_reliability() - If
> a return PC is not a valid kernel text address, consider the stack
> trace unreliable. It could be some generated code. Other reliability checks
> will be added in the future.
> 
> Let unwind() return a boolean to indicate if the stack trace is
> reliable.
> 
> Signed-off-by: Madhavan T. Venkataraman <madvenka@...ux.microsoft.com>
> Reviewed-by: Mark Brown <broonie@...nel.org>
> ---
>  arch/arm64/kernel/stacktrace.c | 31 +++++++++++++++++++++++++++++--
>  1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index c749129aba5a..5ef2ce217324 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -44,6 +44,8 @@
>   * @final_fp:	 Pointer to the final frame.
>   *
>   * @failed:      Unwind failed.
> + *
> + * @reliable:    Stack trace is reliable.
>   */

I would strongly prefer if we could have something like an
unwind_state_is_reliable() helper, and just use that directly, rather than
storing that into the state.

That way, we can opt-into any expensive checks in the reliable unwinder (e.g.
__kernel_text_address), and can use them elsewhere for informative purposes
(e.g. when dumping a stacktrace out to the console).

>  struct unwind_state {
>  	unsigned long fp;
> @@ -57,6 +59,7 @@ struct unwind_state {
>  	struct task_struct *task;
>  	unsigned long final_fp;
>  	bool failed;
> +	bool reliable;
>  };
>  
>  static void unwind_init_common(struct unwind_state *state,
> @@ -80,6 +83,7 @@ static void unwind_init_common(struct unwind_state *state,
>  	state->prev_fp = 0;
>  	state->prev_type = STACK_TYPE_UNKNOWN;
>  	state->failed = false;
> +	state->reliable = true;
>  
>  	/* Stack trace terminates here. */
>  	state->final_fp = (unsigned long)task_pt_regs(task)->stackframe;
> @@ -242,11 +246,34 @@ static void notrace unwind_next(struct unwind_state *state)
>  }
>  NOKPROBE_SYMBOL(unwind_next);
>  
> -static void notrace unwind(struct unwind_state *state,
> +/*
> + * Check the stack frame for conditions that make further unwinding unreliable.
> + */
> +static void unwind_check_reliability(struct unwind_state *state)
> +{
> +	if (state->fp == state->final_fp) {
> +		/* Final frame; no more unwind, no need to check reliability */
> +		return;
> +	}
> +
> +	/*
> +	 * If the PC is not a known kernel text address, then we cannot
> +	 * be sure that a subsequent unwind will be reliable, as we
> +	 * don't know that the code follows our unwind requirements.
> +	 */
> +	if (!__kernel_text_address(state->pc))
> +		state->reliable = false;
> +}

I'd strongly prefer that we split this into two helpers, e.g.

static inline bool unwind_state_is_final(struct unwind_state *state)
{
	return state->fp == state->final_fp;
}

static inline bool unwind_state_is_reliable(struct unwind_state *state)
{
	return __kernel_text_address(state->pc);
}

> +
> +static bool notrace unwind(struct unwind_state *state,
>  			   stack_trace_consume_fn consume_entry, void *cookie)
>  {
> -	while (unwind_continue(state, consume_entry, cookie))
> +	unwind_check_reliability(state);
> +	while (unwind_continue(state, consume_entry, cookie)) {
>  		unwind_next(state);
> +		unwind_check_reliability(state);

This is going to slow down regular unwinds even when the reliablity value is
not consumed (e.g. for KASAN traces on alloc and free), so I don't think this
should live here, and should be intreoduced with arch_stack_walk_reliable().

Thanks,
Mark.

> +	}
> +	return !state->failed && state->reliable;
>  }
>  NOKPROBE_SYMBOL(unwind);
>  
> -- 
> 2.25.1
>