linux-kernel - Re: stress-ng --hrtimers hangs system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200605032014.GA4455@paulmck-ThinkPad-P72>
Date:   Thu, 4 Jun 2020 20:20:14 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Vladimir Oltean <olteanv@...il.com>
Cc:     linux-rt-users@...r.kernel.org,
        lkml <linux-kernel@...r.kernel.org>, rcu@...r.kernel.org,
        Colin King <colin.king@...onical.com>
Subject: Re: stress-ng --hrtimers hangs system

On Fri, Jun 05, 2020 at 04:47:51AM +0300, Vladimir Oltean wrote:
> Hi,
> 
> I was testing stress-ng on an ARM64 box and I found that it can be killed instantaneously with a --hrtimers 1 test:
> https://github.com/ColinIanKing/stress-ng/blob/master/stress-hrtimers.c
> The console shell locks up immediately after starting the process, and I get this rcu_preempt splat after 21 seconds,
> letting me know that the grace-periods kernel thread could not run:
> 
> # stress-ng --hrtimers 1
> stress-ng: info:  [320] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor
> stress-ng: info:  [320] dispatching hogs: 1 hrtimers
> stress-ng: info:  [320] cache allocate: using defaults, can't determine cache details from sysfs
> [   85.827528] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [   85.833656]  (detected by 1, t=6502 jiffies, g=1789, q=12)
> [   85.839163] rcu: All QSes seen, last rcu_preempt kthread activity 6502 (4294913720-4294907218), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [   85.851647] rcu: rcu_preempt kthread starved for 6502 jiffies! g1789 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> [   85.861859] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [   85.871025] rcu: RCU grace-period kthread stack dump:
> [   85.876094] rcu_preempt     R  running task        0    10      2 0x00000028
> [   85.883173] Call trace:
> [   85.885636]  __switch_to+0xf8/0x148
> [   85.889137]  __schedule+0x2d8/0x808
> [   85.892636]  schedule+0x48/0x100
> [   85.895875]  schedule_timeout+0x1c8/0x420
> [   85.899900]  rcu_gp_kthread+0x738/0x1b78
> [   85.903836]  kthread+0x158/0x168
> [   85.907075]  ret_from_fork+0x10/0x18
> [   93.283548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 33s!
> [   93.291569] BUG: workqueue lockup - pool cpus=0-1 flags=0x4 nice=0 stuck for 33s!
> [   93.299105] Showing busy workqueues and worker pools:
> [   93.304189] workqueue events: flags=0x0
> [   93.308116]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [   93.315031]     pending: vmstat_shepherd
> [   93.318990] workqueue events_unbound: flags=0x2
> [   93.323577]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/512 refcnt=3
> [   93.330309]     pending: flush_to_ldisc
> [   93.334161]
> [   93.334164] ======================================================
> [   93.334165] WARNING: possible circular locking dependency detected
> [   93.334167] 5.7.0-08604-g7dcedf8c04c0 #118 Not tainted
> [   93.334168] ------------------------------------------------------
> [   93.334169] stress-ng-hrtim/326 is trying to acquire lock:
> [   93.334171] ffffb236f6aee1a0 (console_owner){-.-.}-{0:0}, at: console_unlock+0x21c/0x678
> [   93.334176]
> [   93.334177] but task is already holding lock:
> [   93.334178] ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
> [   93.334184]
> [   93.334186] which lock already depends on the new lock.
> [   93.334187]
> [   93.334188]
> [   93.334189] the existing dependency chain (in reverse order) is:
> [   93.334190]
> [   93.334191] -> #2 (&pool->lock/1){-.-.}-{2:2}:
> [   93.334197]        _raw_spin_lock+0x5c/0x78
> [   93.334198]        __queue_work+0x124/0x7c8
> [   93.334199]        queue_work_on+0xd0/0xf0
> [   93.334200]        tty_flip_buffer_push+0x3c/0x48
> [   93.334202]        serial8250_rx_chars+0x74/0x88
> [   93.334203]        fsl8250_handle_irq+0x15c/0x1a0
> [   93.334204]        serial8250_interrupt+0x70/0xb8
> [   93.334206]        __handle_irq_event_percpu+0xe0/0x478
> [   93.334207]        handle_irq_event_percpu+0x40/0x98
> [   93.334208]        handle_irq_event+0x4c/0xd0
> [   93.334209]        handle_fasteoi_irq+0xb4/0x158
> [   93.334211]        generic_handle_irq+0x3c/0x58
> [   93.334212]        __handle_domain_irq+0x68/0xc0
> [   93.334213]        gic_handle_irq+0x6c/0x160
> [   93.334214]        el1_irq+0xbc/0x180
> [   93.334216]        cpuidle_enter_state+0xb4/0x4f8
> [   93.334217]        cpuidle_enter+0x3c/0x50
> [   93.334218]        call_cpuidle+0x44/0x78
> [   93.334219]        do_idle+0x228/0x2c8
> [   93.334220]        cpu_startup_entry+0x2c/0x48
> [   93.334222]        rest_init+0x1ac/0x280
> [   93.334223]        arch_call_rest_init+0x14/0x1c
> [   93.334224]        start_kernel+0x4ec/0x524
> [   93.334225]
> [   93.334226] -> #1 (&port->lock#2){-.-.}-{2:2}:
> [   93.334232]        _raw_spin_lock_irqsave+0x78/0xa0
> [   93.334233]        serial8250_console_write+0x1f4/0x348
> [   93.334234]        univ8250_console_write+0x44/0x58
> [   93.334235]        console_unlock+0x480/0x678
> [   93.334237]        vprintk_emit+0x188/0x370
> [   93.334238]        vprintk_default+0x48/0x58
> [   93.334239]        vprintk_func+0xf0/0x238
> [   93.334240]        printk+0x74/0x94
> [   93.334241]        register_console+0x1a0/0x300
> [   93.334243]        uart_add_one_port+0x4a0/0x4e0
> [   93.334244]        serial8250_register_8250_port+0x2d0/0x4b0
> [   93.334245]        of_platform_serial_probe+0x318/0x650
> [   93.334246]        platform_drv_probe+0x58/0xa8
> [   93.334248]        really_probe+0x10c/0x358
> [   93.334249]        driver_probe_device+0x5c/0x100
> [   93.334250]        device_driver_attach+0x74/0x98
> [   93.334252]        __driver_attach+0x98/0xe0
> [   93.334253]        bus_for_each_dev+0x84/0xd8
> [   93.334254]        driver_attach+0x30/0x40  [   93.334255]        bus_add_driver+0x160/0x208
> [   93.334256]        driver_register+0x64/0x110
> [   93.334258]        __platform_driver_register+0x58/0x68
> [   93.334259]        of_platform_serial_driver_init+0x20/0x28
> [   93.334260]        do_one_initcall+0x94/0x438
> [   93.334261]        kernel_init_freeable+0x2cc/0x338
> [   93.334263]        kernel_init+0x18/0x110  [   93.334264]        ret_from_fork+0x10/0x18  [   93.334265]
> [   93.334266] -> #0 (console_owner){-.-.}-{0:0}:
> [   93.334271]        __lock_acquire+0x1170/0x1510
> [   93.334272]        lock_acquire+0xf8/0x458  [   93.334273]        console_unlock+0x274/0x678
> [   93.334274]        vprintk_emit+0x188/0x370
> [   93.334275]        vprintk_default+0x48/0x58
> [   93.334277]        vprintk_func+0xf0/0x238  [   93.334278]        printk+0x74/0x94
> [   93.334279]        show_pwq+0x80/0x2c0
> [   93.334280]        show_workqueue_state+0x2ac/0x3a8
> [   93.334281]        wq_watchdog_timer_fn+0x194/0x2d8
> [   93.334283]        call_timer_fn+0xcc/0x400
> [   93.334284]        run_timer_softirq+0x250/0x6e0
> [   93.334285]        __do_softirq+0x12c/0x5b4
> [   93.334286]        irq_exit+0x13c/0x148
> [   93.334287]        __handle_domain_irq+0x6c/0xc0
> [   93.334289]        gic_handle_irq+0x6c/0x160
> [   93.334290]        el1_irq+0xbc/0x180
> [   93.334291]        lock_acquire+0x110/0x458
> [   93.334292]        __might_fault+0x84/0xa8  [   93.334293]        __arm64_sys_rt_sigreturn+0x154/0x980
> [   93.334295]        el0_svc_common.constprop.2+0x7c/0x178
> [   93.334296]        do_el0_svc+0x34/0xa0
> [   93.334297]        el0_sync_handler+0x160/0x1f0
> [   93.334298]        el0_sync+0x140/0x180
> [   93.334299]
> [   93.334301] other info that might help us debug this:
> [   93.334301]
> [   93.334303] Chain exists of:
> [   93.334303]   console_owner --> &port->lock#2 --> &pool->lock/1
> [   93.334311]
> [   93.334313]  Possible unsafe locking scenario:
> [   93.334314]
> [   93.334315]        CPU0                    CPU1
> [   93.334316]        ----                    ----
> [   93.334317]   lock(&pool->lock/1);
> [   93.334321]                                lock(&port->lock#2);
> [   93.334325]                                lock(&pool->lock/1);
> [   93.334328]   lock(console_owner);
> [   93.334331]
> [   93.334332]  *** DEADLOCK ***
> [   93.334333]
> [   93.334334] 5 locks held by stress-ng-hrtim/326:
> [   93.334335]  #0: ffff002077cf9a18 (&mm->mmap_sem){++++}-{3:3}, at: __might_fault+0x60/0xa8
> [   93.334341]  #1: ffff800010003de0 ((&wq_watchdog_timer)){+.-.}-{0:0}, at: call_timer_fn+0x0/0x400
> [   93.334347]  #2: ffffb236f6aefd60 (rcu_read_lock){....}-{1:2}, at: show_workqueue_state+0x0/0x3a8
> [   93.334353]  #3: ffff00207ac0d018 (&pool->lock/1){-.-.}-{2:2}, at: show_workqueue_state+0x288/0x3a8
> [   93.334360]  #4: ffffb236f6aee080 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x180/0x370
> [   93.334366]
> [   93.334367] stack backtrace:
> [   93.334369] CPU: 0 PID: 326 Comm: stress-ng-hrtim Not tainted 5.7.0-08604-g7dcedf8c04c0 #118
> [   93.334370] Hardware name: LS1028A RDB Board (DT)
> [   93.334371] Call trace:
> [   93.334372]  dump_backtrace+0x0/0x1a8
> [   93.334373]  show_stack+0x24/0x30
> [   93.334374]  dump_stack+0xe8/0x150
> [   93.334376]  print_circular_bug.isra.41+0x228/0x280
> [   93.334377]  check_noncircular+0x14c/0x1b0
> [   93.334378]  __lock_acquire+0x1170/0x1510  [   93.334379]  lock_acquire+0xf8/0x458
> [   93.334381]  console_unlock+0x274/0x678
> [   93.334382]  vprintk_emit+0x188/0x370
> [   93.334383]  vprintk_default+0x48/0x58
> [   93.334384]  vprintk_func+0xf0/0x238
> [   93.334385]  printk+0x74/0x94
> [   93.334386]  show_pwq+0x80/0x2c0
> [   93.334387]  show_workqueue_state+0x2ac/0x3a8
> [   93.334389]  wq_watchdog_timer_fn+0x194/0x2d8
> [   93.334390]  call_timer_fn+0xcc/0x400
> [   93.334391]  run_timer_softirq+0x250/0x6e0
> [   93.334392]  __do_softirq+0x12c/0x5b4
> [   93.334393]  irq_exit+0x13c/0x148
> [   93.334394]  __handle_domain_irq+0x6c/0xc0
> [   93.334396]  gic_handle_irq+0x6c/0x160
> [   93.334397]  el1_irq+0xbc/0x180
> [   93.334398]  lock_acquire+0x110/0x458
> [   93.334399]  __might_fault+0x84/0xa8
> [   93.334400]  __arm64_sys_rt_sigreturn+0x154/0x980
> [   93.334402]  el0_svc_common.constprop.2+0x7c/0x178
> [   93.334403]  do_el0_svc+0x34/0xa0
> [   93.334404]  el0_sync_handler+0x160/0x1f0  [   93.334405]  el0_sync+0x140/0x180
> [   93.984979] workqueue events_freezable: flags=0x4
> [   93.989705]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [   93.996522]     pending: mmc_rescan
> [   94.000032] workqueue events_power_efficient: flags=0x82
> [   94.005369]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=4/256 refcnt=6
> [   94.011748]     pending: phy_state_machine, phy_state_machine, neigh_periodic_work, do_cache_clean
> [   94.020761] workqueue pm: flags=0x4
> [   94.024266]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [   94.031081]     pending: pm_runtime_work
> [   94.035025]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [   94.041840]     pending: pm_runtime_work
> [   94.045782] workqueue writeback: flags=0x4a
> [   94.049984]   pwq 4: cpus=0-1 flags=0x4 nice=0 active=1/256 refcnt=3
> [   94.056362]     pending: wb_workfn
> [   94.059784] workqueue kblockd: flags=0x18  [   94.063812]   pwq 3: cpus=1 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
> [   94.070802]     pending: blk_mq_timeout_work
> 
> Looks like there's another issue in fsl8250_handle_irq, but I think that's unrelated so I'll ignore it.
> So it looks like CPU 1 detects that the rcu_preempt kthread isn't scheduled, despite the fact that all
> quiescent states were seen. So it looks like there is no RCU usage bug in here, but a legitimate CPU
> starvation issue as far as I can see.

That agrees with my interpretation of the stall warning message.

On the rest, I must defer to others.

							Thanx, Paul

> I have some noob questions:
> 
> - Is my system (hardware, drivers, kernel config) misconfigured? This can be reproduced at least on the
>   NXP LS1028A running today's master branch with just the arm64 defconfig. On other hardware however
>   (NXP T1040 PowerPC), RT throttling kicks in faster and the system doesn't hang (more below).
> 
> - What is the expected behavior of a system to this hrtimer test? Intuitively I would say the kenel
>   shouldn't hang, but I don't exactly know what are the mechanisms to avoid that. Below are 2 things I tried.
> 
> - I tried to enable CONFIG_RCU_BOOST and had limited success. I could get rid of the RCU stall messages
>   but the system would still hang after a longer while. When it hanged, it didn't even respond to ping
>   from the outside, no shell, only magic sysrq could restart the kernel.
> 
> - I read a little bit of the "Limiting the CPU usage of real-time and deadline processes" chapter from
>   man 7 sched. Is this what I should be doing? My /proc/sys/kernel/sched_rt_period_us and
>   /proc/sys/kernel/sched_rt_runtime_us are set to their default values (RT period 1 second, RT runtime
>   .95 seconds), why isn't that enough? If I reduce the RT runtime to .8 seconds while keeping the period
>   at 1 second, RT throttling (finally!) kicks in and the system does not die.
>   The RCU stall warning does keep, however, getting printed continuously.
> 
> [  385.732074] sched: RT throttling activated
> [  385.883960] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  385.890077]  (detected by 1, t=6503 jiffies, g=1801, q=68)
> [  385.895582] rcu: All QSes seen, last rcu_preempt kthread activity 6504 (4294988738-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [  385.908062] rcu: rcu_preempt kthread starved for 6508 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [  385.918273] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [  385.927437] rcu: RCU grace-period kthread stack dump:
> [  385.932503] rcu_preempt     R  running task        0    11      2 0x00000028
> [  385.939579] Call trace:
> [  385.942040]  __switch_to+0xf8/0x148
> [  385.945539]  __schedule+0x2d8/0x808
> [  385.949036]  schedule+0x48/0x100
> [  385.952273]  schedule_timeout+0x1c8/0x420
> [  385.956295]  rcu_gp_kthread+0x704/0x1870
> [  385.960228]  kthread+0x158/0x168
> [  385.963464]  ret_from_fork+0x10/0x18
> (...)
> [  463.903959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  463.910077]  (detected by 1, t=26008 jiffies, g=1801, q=148)
> [  463.915756] rcu: All QSes seen, last rcu_preempt kthread activity 26009 (4295008243-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [  463.928324] rcu: rcu_preempt kthread starved for 26013 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [  463.938622] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [  463.947786] rcu: RCU grace-period kthread stack dump:
> [  463.952852] rcu_preempt     R  running task        0    11      2 0x00000028
> [  463.959928] Call trace:
> [  463.962389]  __switch_to+0xf8/0x148
> [  463.965888]  __schedule+0x2d8/0x808
> [  463.969385]  schedule+0x48/0x100
> [  463.972622]  schedule_timeout+0x1c8/0x420
> [  463.976644]  rcu_gp_kthread+0x704/0x1870
> [  463.980578]  kthread+0x158/0x168
> [  463.983814]  ret_from_fork+0x10/0x18
> (...)
> [  541.923959] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  541.930076]  (detected by 1, t=45513 jiffies, g=1801, q=230)
> [  541.935754] rcu: All QSes seen, last rcu_preempt kthread activity 45514 (4295027748-4294982234), jiffies_till_next_fqs=1, root ->qsmask 0x0
> [  541.948322] rcu: rcu_preempt kthread starved for 45518 jiffies! g1801 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> [  541.958620] rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> [  541.967784] rcu: RCU grace-period kthread stack dump:
> [  541.972850] rcu_preempt     R  running task        0    11      2 0x00000028
> [  541.979926] Call trace:
> [  541.982387]  __switch_to+0xf8/0x148
> [  541.985887]  __schedule+0x2d8/0x808
> [  541.989384]  schedule+0x48/0x100
> [  541.992621]  schedule_timeout+0x1c8/0x420
> [  541.996643]  rcu_gp_kthread+0x704/0x1870
> [  542.000576]  kthread+0x158/0x168
> [  542.003813]  ret_from_fork+0x10/0x18
> 
> I would really appreciate some pointers.
> 
> Thanks,
> -Vladimir