lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Dec 2018 00:11:12 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Anna-Maria Gleixner <anna-maria@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Grygorii Strashko <grygorii.strashko@...com>
Subject: Re: Fix 80d20d35af1e ("nohz: Fix local_timer_softirq_pending()") may
 have revealed another problem

On 27.12.2018 07:53, Frederic Weisbecker wrote:
> On Mon, Oct 15, 2018 at 10:58:54PM +0200, Heiner Kallweit wrote:
>> On 28.09.2018 15:18, Frederic Weisbecker wrote:
>>> On Thu, Sep 27, 2018 at 06:05:46PM +0200, Thomas Gleixner wrote:
>>>> On Tue, 28 Aug 2018, Frederic Weisbecker wrote:
>>>>> On Fri, Aug 24, 2018 at 07:06:32PM +0200, Heiner Kallweit wrote:
>>>>>> I tested it and Frederic is right, it doesn't help. Can it be somehow related to
>>>>>> the cpu being brought down during suspend? Because I get the warning only during
>>>>>> suspend when the cpu is inactive already (but still online).
>>>>>
>>>>> It's hard to tell, I haven't been able to reproduce on suspend to disk/mem.
>>>>>
>>>>> Does this script eventually trigger it after some time?
>>>>
>>>> Any update to this?
>>>
>>> Heiner? Can you please test the script I sent to you?
>>>
>>> Thanks.
>>>
>> Sorry, took some time .. And yes, running your script triggers the message too.
>>
[...]
> 
> Sorry, I got sidetracked and almost forgot about it.
> 
> So this is triggered by CPU hotplug. At some point the CPU has an
> opportunity to go idle and for some reason the timer softirq is still
> pending. We need to know which timer this is about and why the timer
> softirq keeps pending.
> 
> I'm going to need your help again. Can you please run the following (possibly
> repeat until it triggers the bug) ?
> 
>    echo 1 > /sys/devices/system/cpu/cpu1/online
> 
>    # Pause and reset tracing
>    echo 0 > /sys/kernel/debug/tracing/tracing_on
>    echo > /sys/kernel/debug/tracing/trace
> 
>    # Turn on relevant events
>    echo 1 > /sys/kernel/debug/tracing/events/timer/timer_*/enable
>    echo 1 > /sys/kernel/debug/tracing/events/irq/softirq_*/enable
>    echo 1 > /sys/kernel/debug/tracing/tracing_on
> 
>    # Trigger
>    echo 0 > /sys/devices/system/cpu/cpu1/online
> 
>    echo 0 > /sys/kernel/debug/tracing/tracing_on
> 
> And please apply the following patch before. With all that I'll have the
> relevant informations stored in /sys/kernel/debug/tracing/per_cpu/cpu1/trace
> Please send its content to me. Thanks!
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 69e673b..0e57a3b 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -892,6 +892,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
>  		    (local_softirq_pending() & SOFTIRQ_STOP_IDLE_MASK)) {
>  			pr_warn("NOHZ: local_softirq_pending %02x\n",
>  				(unsigned int) local_softirq_pending());
> +			trace_dump_stack(0);
>  			ratelimit++;
>  		}
>  		return false;
> 
> 

OK, did as you advised and here comes the trace. That's the related dmesg part:

[ 1479.025092] x86: Booting SMP configuration:
[ 1479.025129] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 1479.094715] NOHZ: local_softirq_pending 202
[ 1479.096557] smpboot: CPU 1 is now offline

Hope it helps.
Heiner


# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
          <idle>-0     [001] d.h2  1479.099092: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.099098: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.099106: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.099114: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.099120: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.099121: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.099134: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.099135: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.099141: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.100094: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.100109: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.100116: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.101091: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.101113: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.101118: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.102094: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.102114: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.102121: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.103091: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.103097: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.103105: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.103114: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.103118: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.103119: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.103131: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.103132: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.103138: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.105092: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.105115: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.105119: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.106092: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.106112: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] .Ns2  1479.106144: softirq_exit: vec=9 [action=RCU]
         cpuhp/1-13    [001] d..2  1479.106279: timer_cancel: timer=0000000009a25653
          <idle>-0     [001] d.h2  1479.106965: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.106969: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.106974: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.106981: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.106984: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.106985: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.106994: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.106995: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.106999: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.107996: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.108010: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.108014: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.109009: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.109013: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.109024: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.109028: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.109028: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.109033: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.110013: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.110033: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] ..s2  1479.110040: softirq_exit: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.111011: softirq_raise: vec=1 [action=TIMER]
          <idle>-0     [001] d.h2  1479.111017: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d.h2  1479.111026: softirq_raise: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.111035: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.111040: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] ..s2  1479.111040: softirq_entry: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.111052: softirq_exit: vec=7 [action=SCHED]
          <idle>-0     [001] ..s2  1479.111052: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] .Ns2  1479.111079: softirq_exit: vec=9 [action=RCU]
         cpuhp/1-13    [001] dNh2  1479.112930: softirq_raise: vec=1 [action=TIMER]
         cpuhp/1-13    [001] dNh2  1479.112935: softirq_raise: vec=9 [action=RCU]
          <idle>-0     [001] d..1  1479.113077: <stack trace>
 => can_stop_idle_tick.isra.14
 => tick_nohz_get_sleep_length
 => menu_select
 => cpuidle_select
 => do_idle
 => cpu_startup_entry
 => start_secondary
 => secondary_startup_64
          <idle>-0     [001] .Ns2  1479.113110: softirq_entry: vec=1 [action=TIMER]
          <idle>-0     [001] .Ns2  1479.113114: softirq_exit: vec=1 [action=TIMER]
          <idle>-0     [001] .Ns2  1479.113115: softirq_entry: vec=9 [action=RCU]
          <idle>-0     [001] .Ns2  1479.113139: softirq_exit: vec=9 [action=RCU]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ