linux-kernel - Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160307204317.GR6344@twins.programming.kicks-ass.net>
Date:	Mon, 7 Mar 2016 21:43:17 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Chris Metcalf <cmetcalf@...lanox.com>
Cc:	Daniel Thompson <daniel.thompson@...aro.org>,
	Russell King <linux@....linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Andrew Morton <akpm@...l.org>,
	linux-kernel@...r.kernel.org, Aaron Tomlin <atomlin@...hat.com>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle
 cpus

On Mon, Mar 07, 2016 at 12:38:16PM -0500, Chris Metcalf wrote:
> On 03/07/2016 04:48 AM, Peter Zijlstra wrote:
> I'm a little skeptical that a single percpu write is going to add much
> measurable overhead to this path. 

So that write is almost guaranteed to be a cacheline miss, those things
hurt and do show up on profiles.

> However, we can certainly adapt
> alternate approaches that stay away from the actual idle code.
> 
> One approach (diff appended) is to just test to see if the PC is
> actually in the architecture-specific halt code.  There are two downsides:
> 
> 1. It requires a small amount of per-architecture support.  I've provided
>    the tile support as an example, since that's what I tested.  I expect
>    x86 is a little more complicated since there are more idle paths and
>    they don't currently run the idle instruction(s) at a fixed address, but
>    it's unlikely to be too complicated on any platform.
>    Still, adding anything per-architecture is certainly a downside.
> 
> 2. As proposed, my new alternate solution only handles the non-polling
>    case, so if you are in the polling loop, we won't benefit from having
>    the NMI backtrace code skip over you.  However my guess is that 99% of
>    the time folks do choose to run the default non-polling mode, so this
>    probably still achieves a pretty reasonable outcome.
> 
> A different approach that would handle downside #2 and probably make it
> easier to implement the architecture-specific code for more complicated
> platforms like x86 would be to use the SCHED_TEXT model and tag all the
> low-level idling functions as CPUIDLE_TEXT.  Then the "are we idling"
> test is just a range compare on the PC against __cpuidle_text_{start,end}.
> 
> We'd have to decide whether to make cpu_idle_poll() non-inline and just
> test for being in that function, or whether we could tag all of
> cpu_idle_loop() as being CPUIDLE_TEXT and just omit any backtrace
> whenever the PC is anywhere in that function.  Obviously if we have
> called out to more complicated code (e.g. Daniel's concern about calling
> out to power management code) the PC would no longer be in the CPUIDLE_TEXT
> at that point, so that might be OK too.

But the CPU would also not be idle if its running pm code.

So I like the CPUIDLE_TEXT approach, since it has no impact on the
generated code.

An alternative option could be to inspect the stack, we already take a
stack dump, so you could say that everything that has cpuidle_enter() in
its callchain is an 'idle' cpu.

Yet another option would be to look at rq->idle_state or any other state
cpuidle already tracks. The 'obvious' downside is relying on cpuidle,
which I understand isn't supported by everyone.