linux-kernel - Re: [RFC PATCH v5 1/2] arm64: Introduce stack trace reliability checks in the unwinder

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d9451984-d3fe-405f-f2e6-6571acd518e9@linux.microsoft.com>
Date:   Fri, 25 Jun 2021 12:05:18 -0500
From:   "Madhavan T. Venkataraman" <madvenka@...ux.microsoft.com>
To:     Mark Brown <broonie@...nel.org>
Cc:     Mark Rutland <mark.rutland@....com>, jpoimboe@...hat.com,
        ardb@...nel.org, nobuta.keiya@...itsu.com, catalin.marinas@....com,
        will@...nel.org, jmorris@...ei.org, pasha.tatashin@...een.com,
        jthierry@...hat.com, linux-arm-kernel@...ts.infradead.org,
        live-patching@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH v5 1/2] arm64: Introduce stack trace reliability
 checks in the unwinder



On 6/25/21 10:51 AM, Mark Brown wrote:
> On Fri, Jun 25, 2021 at 10:39:57AM -0500, Madhavan T. Venkataraman wrote:
>> On 6/24/21 9:40 AM, Mark Rutland wrote:
> 
>>> At a high-level, I'm on-board with keeping track of this per unwind
>>> step, but if we do that then I want to be abel to use this during
>>> regular unwinds (e.g. so that we can have a backtrace idicate when a
>>> step is not reliable, like x86 does with '?'), and to do that we need to
>>> be a little more accurate.
> 
>> The only consumer of frame->reliable is livepatch. So, in retrospect, my
>> original per-frame reliability flag was an overkill. I was just trying to
>> provide extra per-frame debug information which is not really a requirement
>> for livepatch.
> 
> It's not a requirement for livepatch but if it's there a per frame
> reliability flag would have other uses - for example Mark has mentioned
> the way x86 prints a ? next to unreliable entries in oops output for
> example, that'd be handy for people debugging issues and would have the
> added bonus of ensuring that there's more constant and widespread
> exercising of the reliability stuff than if it's just used for livepatch
> which is a bit niche.
> 

I agree. That is why I introduced the per-frame flag.

So, let us try a different approach.

First, let us get rid of the frame->reliable flag from this patch series. That flag
can be implemented when all of the pieces are in place for per-frame debug and tracking.

For consumers such as livepatch that don't really care about per-frame stuff, let us
solve it more cleanly via the return value of unwind_frame().

Currently, the return value from unwind_frame() is a tri-state return value which is
somewhat confusing.

	0	means continue unwinding
	-error	means stop unwinding. However,
			-ENOENT means successful termination
			Other values mean an error has happened.

Instead, let unwind_frame() return one of 3 values:

enum {
	UNWIND_CONTINUE,
	UNWIND_CONTINUE_WITH_ERRORS,
	UNWIND_STOP,
};

All consumers will stop unwinding upon seeing UNWIND_STOP.

Livepatch type consumers will stop unwinding upon seeing anything other than UNWIND_CONTINUE.

Debug type consumers can choose to continue upon seeing UNWIND_CONTINUE_WITH_ERRORS.

When we eventually implement per-frame stuff, debug consumers can examine the
frame for more information when they see UNWIND_CONTINUE_WITH_ERRORS.

This way, my patch series does not have a dependency on the per-frame enhancements.

>> So, let us separate the two. I will rename frame->reliable to frame->livepatch_safe.
>> This will apply to the whole stacktrace and not to every frame.
> 
> I'd rather keep it as reliable, even with only the livepatch usage I
> think it's clearer.
> 

See suggestion above.

>> Finally, it might be a good idea to perform reliability checks even in
>> start_backtrace() so we don't assume that the starting frame is reliable even
>> if the caller passes livepatch_safe=true. What do you think?
> 
> That makes sense to me.
> 

Thanks.

Madhavan