lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190924171453.GA1804@vmlxhi-102.adit-jv.com>
Date:   Tue, 24 Sep 2019 19:14:53 +0200
From:   Eugeniu Rosca <erosca@...adit-jv.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Balasubramani Vivekanandan 
        <balasubramani_vivekanandan@...tor.com>
CC:     Frederic Weisbecker <fweisbec@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        <linux-renesas-soc@...r.kernel.org>,
        Eugeniu Rosca <erosca@...adit-jv.com>,
        Eugeniu Rosca <roscaeugeniu@...il.com>
Subject: Re: [PATCH V1 1/1] tick: broadcast-hrtimer: Fix a race in bc_set_next

On Mon, Sep 23, 2019 at 09:51:54PM +0200, Thomas Gleixner wrote:
> On Wed, 18 Sep 2019, Balasubramani Vivekanandan wrote:
> > 
> > When there are no more cpus subscribed to broadcast, the timer callback
> > might not set the expiry time for hrtimer. Therefore the callback timer
> > function is modified to set the state of broadcast clock to
> > CLOCK_EVT_STATE_ONESHOT_STOPPED which in turn will set the expiry time
> > of hrtimer to KTIME_MAX.
> 
> That's an ugly layering violation, really.
> 
> Aside of that the whole try to cancel logic and the comment that the
> hrtimer cannot be armed from within the callback is wrong. All of this can
> go way.
> 
> Find a completely untested patch below [1]
> 
> Thanks,
> 
> 	tglx
> 

jFTR, for the sake of anyone who lives on older kernels and happen to
deal with the same issue as reported by Balasubramani in this thread,
after enabling cpuidle [2-3] on R-Car H3ULCB (which seems to be a hard
prerequisite for seeing this issue reproduced), I got below [4,5,6]
backtraces on recent v5.3-12025-g4c07e2ddab5b using various intermittent
stress-ng loads. I believe they portray the issue reported by Bala.

I was not able to reproduce the RCU stalls after applying [1].

[1] https://lkml.org/lkml/2019/9/23/803
[2] https://patchwork.kernel.org/patch/10769701/
    ("[v2,1/5] arm64: dts: r8a7795: Add cpuidle support for CA57 cores")
[3] https://patchwork.kernel.org/patch/10769689/
    ("[v2,2/5] arm64: dts: r8a7795: Add cpuidle support for CA53 cores")

[4] RCU stall #1 reproduced on v5.3-12025-g4c07e2ddab5b + patches [2-3]

[  701.628630] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  701.635259] rcu:     1-...!: (0 ticks this GP) idle=5b4/0/0x0 softirq=178953/178953 fqs=0 last_accelerate: c1a9/209c, Nonlazy posted: ..D
[  701.648195] rcu:     3-...!: (667 GPs behind) idle=1a0/0/0x0 softirq=188640/188641 fqs=0 last_accelerate: c1a6/20a0, Nonlazy posted: .LD
[  701.660666] rcu:     5-...!: (0 ticks this GP) idle=674/0/0x0 softirq=39323/39323 fqs=0 last_accelerate: c380/20a0, Nonlazy posted: ..D
[  701.673046] rcu:     6-...!: (0 ticks this GP) idle=628/0/0x0 softirq=27147/27147 fqs=0 last_accelerate: c00c/20a0, Nonlazy posted: ..D
[  701.685419]  (detected by 2, t=25012 jiffies, g=392945, q=968)
[  701.691487] Task dump for CPU 1:
[  701.694873] swapper/1       R  running task        0     0      1 0x00000028
[  701.702227] Call trace:
[  701.704822]  __switch_to+0x2c4/0x2e8
[  701.708578]  num_spec.70129+0x177e7d/0x1d336c
[  701.713130] Task dump for CPU 3:
[  701.716514] swapper/3       R  running task        0     0      1 0x00000028
[  701.723865] Call trace:
[  701.726445]  __switch_to+0x2c4/0x2e8
[  701.730192]  0x3
[  701.732140] Task dump for CPU 5:
[  701.735524] swapper/5       R  running task        0     0      1 0x00000028
[  701.742874] Call trace:
[  701.745455]  __switch_to+0x2c4/0x2e8
[  701.749204]  num_spec.70129+0x177e7d/0x1d336c
[  701.753756] Task dump for CPU 6:
[  701.757139] swapper/6       R  running task        0     0      1 0x00000028
[  701.764489] Call trace:
[  701.767070]  __switch_to+0x2c4/0x2e8
[  701.770819]  num_spec.70129+0x177e7d/0x1d336c
[  701.775391] rcu: rcu_preempt kthread starved for 25068 jiffies! g392945 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=5
[  701.786394] rcu: RCU grace-period kthread stack dump:
[  701.791652] rcu_preempt     I    0    10      2 0x00000028
[  701.797369] Call trace:
[  701.799944]  __switch_to+0x2c4/0x2e8
[  701.803688]  __schedule+0xe1c/0xf58
[  701.807337]  schedule+0xc0/0x104
[  701.810721]  schedule_timeout+0x298/0x6ec
[  701.814911]  rcu_gp_kthread+0xf8c/0xf94
[  701.818923]  kthread+0x1c4/0x1dc
[  701.822304]  ret_from_fork+0x10/0x18
[  701.826061] Kernel panic - not syncing: RCU Stall
[  701.830964] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0+ #18
[  701.837201] Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
[  701.845503] Call trace:
[  701.848072]  dump_backtrace+0x0/0x1ec
[  701.851893]  show_stack+0x24/0x30
[  701.855361]  dump_stack+0xcc/0x128
[  701.858915]  panic+0x288/0x5a4
[  701.862112]  panic_on_rcu_stall+0x2c/0x34
[  701.866293]  rcu_sched_clock_irq+0xe2c/0x1554
[  701.870835]  update_process_times+0x34/0x60
[  701.875196]  tick_sched_handle+0x80/0x98
[  701.879286]  tick_sched_timer+0x64/0xbc
[  701.883289]  __hrtimer_run_queues+0x5c8/0xb94
[  701.887829]  hrtimer_interrupt+0x1ec/0x454
[  701.892107]  arch_timer_handler_phys+0x40/0x58
[  701.896737]  handle_percpu_devid_irq+0x170/0x6f4
[  701.901548]  generic_handle_irq+0x3c/0x54
[  701.905729]  __handle_domain_irq+0x114/0x118
[  701.910178]  gic_handle_irq+0x70/0xac
[  701.913999]  el1_irq+0xbc/0x180
[  701.917289]  __asan_load8+0x40/0xb0
[  701.920934]  tick_check_broadcast_expired+0x70/0x8c
[  701.926011]  cpu_idle_poll+0x1ec/0x380
[  701.929925]  do_idle+0x188/0x354
[  701.933301]  cpu_startup_entry+0x28/0x2c
[  701.937396]  secondary_start_kernel+0x344/0x374
[  701.942184] SMP: stopping secondary CPUs
[  701.946316] Dumping ftrace buffer:
[  701.950694]    (ftrace buffer empty)
[  701.954445] Kernel Offset: disabled
[  701.958093] CPU features: 0x0002,21006004
[  701.962274] Memory Limit: none
[  701.965552] ---[ end Kernel panic - not syncing: RCU Stall ]---

[5] RCU stall #2 reproduced on v5.3-12025-g4c07e2ddab5b + patches [2-3]

[ 1193.212038] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:max-time=5 run; sleep 5; done                                                                                                                                    
[ 1193.218757] rcu:     4-...!: (0 ticks this GP) idle=17c/0/0x0 softirq=130361/130361 fqs=1 last_accelerate: 9cdc/a0d9, Nonlazy posted: .LD
[ 1193.231958]  (detected by 3, t=25021 jiffies, g=774405, q=58539)
[ 1193.238238] Task dump for CPU 4:
[ 1193.241641] swapper/4       R  running task        0     0      1 0x00000028
[ 1193.249021] Call trace:
[ 1193.251631]  __switch_to+0x2c4/0x2e8
[ 1193.255395]  0x1fffe000db4d0fe4
[ 1193.258720] rcu: rcu_preempt kthread starved for 7974 jiffies! g774405 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
[ 1193.270900] rcu: RCU grace-period kthread stack dump:
[ 1193.276542] rcu_preempt     I    0    10      2 0x00000028
[ 1193.282278] Call trace:
[ 1193.284864]  __switch_to+0x2c4/0x2e8
[ 1193.288619]  __schedule+0xe1c/0xf58
[ 1193.292278]  schedule+0xc0/0x104
[ 1193.295671]  schedule_timeout+0x298/0x6ec
[ 1193.299875]  rcu_gp_kthread+0xf8c/0xf94
[ 1193.303901]  kthread+0x1c4/0x1dc
[ 1193.307295]  ret_from_fork+0x10/0x18
[ 1193.311069] Kernel panic - not syncing: RCU Stall
[ 1193.315986] CPU: 3 PID: 2892 Comm: bash Not tainted 5.3.0+ #18
[ 1193.322056] Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
[ 1193.330372] Call trace:
[ 1193.332951]  dump_backtrace+0x0/0x1ec
[ 1193.336782]  show_stack+0x24/0x30
[ 1193.340259]  dump_stack+0xcc/0x128
[ 1193.343822]  panic+0x288/0x5a4
[ 1193.347026]  panic_on_rcu_stall+0x2c/0x34
[ 1193.351217]  rcu_sched_clock_irq+0xe2c/0x1554
[ 1193.355770]  update_process_times+0x34/0x60
[ 1193.360141]  tick_sched_handle+0x80/0x98
[ 1193.364240]  tick_sched_timer+0x64/0xbc
[ 1193.368252]  __hrtimer_run_queues+0x5c8/0xb94
[ 1193.372804]  hrtimer_interrupt+0x1ec/0x454
[ 1193.377092]  arch_timer_handler_phys+0x40/0x58
[ 1193.381734]  handle_percpu_devid_irq+0x170/0x6f4
[ 1193.386558]  generic_handle_irq+0x3c/0x54
[ 1193.390749]  __handle_domain_irq+0x114/0x118
[ 1193.395208]  gic_handle_irq+0x70/0xac
[ 1193.399037]  el1_irq+0xbc/0x180
[ 1193.402333]  lock_acquire+0x154/0x39c
[ 1193.406164]  _raw_spin_lock+0x4c/0x88
[ 1193.409997]  set_close_on_exec+0x44/0x1ac
[ 1193.414190]  do_fcntl+0x578/0xbf4
[ 1193.417664]  __arm64_sys_fcntl+0xa4/0xdc
[ 1193.421767]  el0_svc_common.constprop.0+0x158/0x1f8
[ 1193.426854]  el0_svc_handler+0x94/0xa0
[ 1193.430775]  el0_svc+0x8/0xc
[ 1193.433878] SMP: stopping secondary CPUs
[ 1193.438059] Dumping ftrace buffer:
[ 1193.442613]    (ftrace buffer empty)
[ 1193.446373] Kernel Offset: disabled
[ 1193.450028] CPU features: 0x0002,21006004
[ 1193.454217] Memory Limit: none
[ 1193.457508] ---[ end Kernel panic - not syncing: RCU Stall ]---

[6] RCU stall #3 reproduced on v5.3-12025-g4c07e2ddab5b + patches [2-3]

[66194.249069] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[66194.255718] rcu:     7-...!: (0 ticks this GP) idle=b34/0/0x0 softirq=1969905/1969905 fqs=1 last_accelerate: 36d8/7727, Nonlazy posted: .LD
[66194.268853]  (detected by 2, t=25021 jiffies, g=8936349, q=98599)
[66194.275195] Task dump for CPU 7:
[66194.278585] swapper/7       R  running task        0     0      1 0x00000028
[66194.285941] Call trace:
[66194.288537]  __switch_to+0x2c4/0x2e8
[66194.292298]  num_spec.70129+0x177e7d/0x1d336c
[66194.296872] rcu: rcu_preempt kthread starved for 16505 jiffies! g8936349 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=7
[66194.307968] rcu: RCU grace-period kthread stack dump:
[66194.313228] rcu_preempt     I    0    10      2 0x00000028
[66194.318949] Call trace:
[66194.321527]  __switch_to+0x2c4/0x2e8
[66194.325275]  __schedule+0xe1c/0xf58
[66194.328926]  schedule+0xc0/0x104
[66194.332310]  schedule_timeout+0x298/0x6ec
[66194.336502]  rcu_gp_kthread+0xf8c/0xf94
[66194.340516]  kthread+0x1c4/0x1dc
[66194.343900]  ret_from_fork+0x10/0x18
[66194.347664] Kernel panic - not syncing: RCU Stall
[66194.352574] CPU: 2 PID: 17232 Comm: stress-ng-cpu Not tainted 5.3.0+ #18
[66194.359535] Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
[66194.367848] Call trace:
[66194.370428]  dump_backtrace+0x0/0x1ec
[66194.374256]  show_stack+0x24/0x30
[66194.377730]  dump_stack+0xcc/0x128
[66194.381288]  panic+0x288/0x5a4
[66194.384493]  panic_on_rcu_stall+0x2c/0x34
[66194.388680]  rcu_sched_clock_irq+0xe2c/0x1554
[66194.393229]  update_process_times+0x34/0x60
[66194.397595]  tick_sched_handle+0x80/0x98
[66194.401691]  tick_sched_timer+0x64/0xbc
[66194.405700]  __hrtimer_run_queues+0x5c8/0xb94
[66194.410246]  hrtimer_interrupt+0x1ec/0x454
[66194.414528]  arch_timer_handler_phys+0x40/0x58
[66194.419167]  handle_percpu_devid_irq+0x170/0x6f4
[66194.423990]  generic_handle_irq+0x3c/0x54
[66194.428178]  __handle_domain_irq+0x114/0x118
[66194.432630]  gic_handle_irq+0x70/0xac
[66194.436454]  el0_irq_naked+0x50/0x5c
[66194.440261] SMP: stopping secondary CPUs
[66194.444571] Dumping ftrace buffer:
[66194.449132]    (ftrace buffer empty)
[66194.452890] Kernel Offset: disabled
[66194.456538] CPU features: 0x0002,21006004
[66194.460719] Memory Limit: none
[66194.463997] ---[ end Kernel panic - not syncing: RCU Stall ]---

-- 
Best Regards,
Eugeniu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ