linux-kernel - Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 7 Mar 2016 15:26:37 +0700
From:	Daniel Thompson <daniel.thompson@...aro.org>
To:	Chris Metcalf <cmetcalf@...lanox.com>
Cc:	Russell King <linux@....linux.org.uk>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Andrew Morton <akpm@...l.org>,
	linux-kernel@...r.kernel.org, Aaron Tomlin <atomlin@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle
 cpus

On 01/03/16 23:01, Chris Metcalf wrote:
> (+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)
>
> On 03/01/2016 09:23 AM, Daniel Thompson wrote:
>> On 29/02/16 21:40, Chris Metcalf wrote:
>>> When doing an nmi backtrace of many cores, and most of them are idle,
>>> the output is a little overwhelming and very uninformative. Suppress
>>> messages for cpus that are idling when they are interrupted and
>>> just emit one line, "NMI backtrace for N skipped: idle".
>>
>> I can see this makes the logs more attractive but this is code for
>> emergency situations.
>>
>> The idle task is responsible for certain power management activities.
>> How can you be sure the system is wedged because of bugs in that code?
>
> It's a fair point, but as core count increases, you really run the risk
> of losing the valuable data in a sea of data that isn't.  For example,
> for the architecture I maintain, we have the TILE-Gx72, which is a
> 72-core chip.  If each core's register dump and backtrace is 40 lines,
> we're up to around 3,000 lines of console output. Filtering that down by
> a factor of 10x or more (if only a handful of cores happen to be active,
> which is not uncommon) is a substantial usability improvement.

No objections to your use case. The output feels very verbose even with 
"only" eight cores.


> That said, it's true that the original solution I offered (examining
> just is_idle_task() plus interrupt nesting) is imprecise.  It is
> relatively straightforward to add a bit of per-cpu state that is set at
> the same moment we currently do stop/start_critical_timings(), which
> would indicate much more specifically that the cpu was running the
> idling code itself, and not anything more complex.  In that case if the
> flag was set, you would know you were either sitting on a
> processor-specific idle instruction in arch_cpu_idle(), or else polling
> one or two memory locations in a tight loop in cpu_idle_poll(), which
> presumably would offer sufficient precision to feel safe.
>
> Here's an alternative version of the patch which incorporates this
> idea.  Do you think this is preferable?  Thanks!

I prefer the approach taken by the new patch although I think the 
implementation might be buggy...


> commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
> Author: Chris Metcalf <cmetcalf@...hip.com>
> Date:   Mon Feb 29 11:56:32 2016 -0500
>
>      nmi_backtrace: generate one-line reports for idle cpus
>      When doing an nmi backtrace of many cores, and most of them are idle,
>      the output is a little overwhelming and very uninformative.  Suppress
>      messages for cpus that are idling when they are interrupted and
>      just emit one line, "NMI backtrace for N skipped: idle".
>      Signed-off-by: Chris Metcalf <cmetcalf@...hip.com>
>
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 786ad32631a6..b8c3c4cf88ad 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct
> cpuidle_driver *drv,
>   /* kernel/sched/idle.c */
>   extern void sched_idle_set_state(struct cpuidle_state *idle_state);
>   extern void default_idle_call(void);
> +extern bool in_cpu_idle(void);
>
>   #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
>   void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
> atomic_t *a);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 544a7133cbd1..9aff315f278b 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>   __setup("hlt", cpu_idle_nopoll_setup);
>   #endif
>
> +static DEFINE_PER_CPU(bool, cpu_idling);
> +
> +/* Was the cpu was in the low-level idle code when interrupted? */
> +bool in_cpu_idle(void)
> +{
> +    return this_cpu_read(cpu_idling);

I think we continue to need the code to identify a core that is running 
an interrupt handler. Interrupts are not masked at the point we set 
cpu_idling to false meaning we can easily be preempted before we clear 
the flag.


> +}
> +
>   static inline int cpu_idle_poll(void)
>   {
>       rcu_idle_enter();
>       trace_cpu_idle_rcuidle(0, smp_processor_id());
>       local_irq_enable();
>       stop_critical_timings();
> +    this_cpu_write(cpu_idling, true);
>       while (!tif_need_resched() &&
>           (cpu_idle_force_poll || tick_check_broadcast_expired()))
>           cpu_relax();
> +    this_cpu_write(cpu_idling, false);
>       start_critical_timings();
>       trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>       rcu_idle_exit();
> @@ -89,7 +99,9 @@ void default_idle_call(void)
>           local_irq_enable();
>       } else {
>           stop_critical_timings();
> +        this_cpu_write(cpu_idling, true);
>           arch_cpu_idle();
> +        this_cpu_write(cpu_idling, false);
>           start_critical_timings();
>       }
>   }
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index db63ac75eba0..75b5eacaa5d3 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -17,6 +17,7 @@
>   #include <linux/kprobes.h>
>   #include <linux/nmi.h>
>   #include <linux/seq_buf.h>
> +#include <linux/cpuidle.h>
>
>   #ifdef arch_trigger_cpumask_backtrace
>   /* For reliability, we're prepared to waste bits here. */
> @@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>
>           /* Replace printk to write into the NMI seq */
>           this_cpu_write(printk_func, nmi_vprintk);
> -        pr_warn("NMI backtrace for cpu %d\n", cpu);
> -        if (regs)
> -            show_regs(regs);
> -        else
> -            dump_stack();
> +        if (in_cpu_idle()) {
> +            pr_warn("NMI backtrace for cpu %d skipped: idle\n",
> +                cpu);
> +        } else {
> +            pr_warn("NMI backtrace for cpu %d\n", cpu);
> +            if (regs)
> +                show_regs(regs);
> +            else
> +                dump_stack();
> +        }
>           this_cpu_write(printk_func, printk_func_save);
>
>           cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>