linux-kernel - Re: [PATCH] kfence: Avoid stalling work queue task without allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201123185338.GA2263110@elver.google.com>
Date:   Mon, 23 Nov 2020 19:53:38 +0100
From:   Marco Elver <elver@...gle.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     "Paul E. McKenney" <paulmck@...nel.org>,
        Anders Roxell <anders.roxell@...aro.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexander Potapenko <glider@...gle.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        Jann Horn <jannh@...gle.com>,
        Mark Rutland <mark.rutland@....com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        kasan-dev <kasan-dev@...glegroups.com>, rcu@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Tejun Heo <tj@...nel.org>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [PATCH] kfence: Avoid stalling work queue task without
 allocations

On Mon, Nov 23, 2020 at 11:36AM -0500, Steven Rostedt wrote:
> On Mon, 23 Nov 2020 11:28:12 -0500
> Steven Rostedt <rostedt@...dmis.org> wrote:
> 
> > I noticed:
> > 
> > 
> > [  237.650900] enabling event benchmark_event
> > 
> > In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
> > the issue goes away. That event kicks off a thread that spins in a tight
> > loop for some time and could possibly cause some issues.
> > 
> > It still shouldn't break things, we can narrow it down if it is the culprit.
> 
> And it probably is the issue because that thread will never sleep! It runs
> a loop of:
> 
> 
> static int benchmark_event_kthread(void *arg)
> {
> 	/* sleep a bit to make sure the tracepoint gets activated */
> 	msleep(100);
> 
> 	while (!kthread_should_stop()) {
> 
> 		trace_do_benchmark();
> 
> 		/*
> 		 * We don't go to sleep, but let others run as well.
> 		 * This is basically a "yield()" to let any task that
> 		 * wants to run, schedule in, but if the CPU is idle,
> 		 * we'll keep burning cycles.
> 		 *
> 		 * Note the tasks_rcu_qs() version of cond_resched() will
> 		 * notify synchronize_rcu_tasks() that this thread has
> 		 * passed a quiescent state for rcu_tasks. Otherwise
> 		 * this thread will never voluntarily schedule which would
> 		 * block synchronize_rcu_tasks() indefinitely.
> 		 */
> 		cond_resched_tasks_rcu_qs();
> 	}
> 
> 	return 0;
> }
> 
> 
> Did something change, where that "cond_resched_tasks_rcu_qs()" doesn't let
> things progress on ARM64?
> 
> I noticed that you have PREEMPT enabled so this will only be preempted when
> its schedule time runs out and something else wants to run. How would that
> affect other threads?

Bad news: It still crashes with CONFIG_TRACEPOINT_BENCHMARK=n :-/

| [  118.146638] Testing all events: OK
| [  169.829624] Running tests again, along with the function tracer
| [  169.843216] Running tests on all trace events:
| [  169.855714] Testing all events: 
| [  173.025250] hrtimer: interrupt took 10544064 ns
| [  253.206548] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
| [  253.314369] Showing busy workqueues and worker pools:
| [  253.324289] workqueue events: flags=0x0
| [  253.349488]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  253.362043]     pending: vmstat_shepherd
| [  253.380223] workqueue events_power_efficient: flags=0x82
| [  253.417256]   pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
| [  253.428133]     in-flight: 61:check_lifetime
| [  253.446077] pool 2: cpus=0 flags=0x4 nice=0 hung=5s workers=4 idle: 101 99 7
| [  444.944487] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
| [  445.034039] Showing busy workqueues and worker pools:
| [  445.050842] workqueue events: flags=0x0
| [  445.056803]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [  445.067830]     pending: vmstat_shepherd
| [  445.080635] pool 2: cpus=0 flags=0x5 nice=0 hung=11s workers=2 manager: 61 idle: 101
| [  462.719775] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
| [  462.726014] 	(detected by 0, t=3752 jiffies, g=2405, q=4)
| [  462.728031] rcu: All QSes seen, last rcu_preempt kthread activity 3211 (4295005085-4295001874), jiffies_till_next_fqs=1, root ->qsmask 0x0
| [  462.739092] rcu: rcu_preempt kthread starved for 3211 jiffies! g2405 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
| [  462.745093] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
| [  462.750782] rcu: RCU grace-period kthread stack dump:
| [  462.755426] task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
| [  462.764703] Call trace:
| [  462.768560]  __switch_to+0x148/0x1f0
| [  462.772700]  __schedule+0x2dc/0x9a8
| [  462.776802]  preempt_schedule_common+0x4c/0x1a8
| [  462.781214]  preempt_schedule+0x38/0x40
| [  462.785430]  _raw_spin_unlock_irqrestore+0x90/0xa0
| [  462.789928]  finish_swait+0x64/0x88
| [  462.794055]  rcu_gp_kthread+0x448/0x1298
| [  462.798276]  kthread+0x13c/0x188
| [  462.802302]  ret_from_fork+0x10/0x34
| [  462.806485] rcu: Stack dump where RCU GP kthread last ran:
| [  462.811098] Task dump for CPU 0:
| [  462.815033] task:kcompactd0      state:S stack:    0 pid:   26 ppid:     2 flags:0x00000428
| [  462.822011] Call trace:
| [  462.825784]  dump_backtrace+0x0/0x240
| [  462.829908]  show_stack+0x34/0x88
| [  462.833915]  sched_show_task+0x208/0x230
| [  462.838140]  dump_cpu_task+0x4c/0x5c
| [  462.842246]  rcu_check_gp_kthread_starvation+0x240/0x388
| [  462.846896]  rcu_sched_clock_irq+0x14f4/0x17a0
| [  462.851284]  update_process_times+0x6c/0xb8
| [  462.855578]  tick_sched_handle.isra.0+0x58/0x88
| [  462.859991]  tick_sched_timer+0x68/0xe0
| [  462.864175]  __hrtimer_run_queues+0x288/0x730
| [  462.868582]  hrtimer_interrupt+0x114/0x288
| [  462.872855]  arch_timer_handler_virt+0x50/0x70
| [  462.877243]  handle_percpu_devid_irq+0x104/0x4c0
| [  462.881653]  generic_handle_irq+0x54/0x78
| [  462.885842]  __handle_domain_irq+0xac/0x130
| [  462.890101]  gic_handle_irq+0x70/0x108
| [  462.894233]  el1_irq+0xc0/0x180
| [  462.898224]  _raw_spin_unlock_irq+0x50/0x98
| [  462.902481]  finish_task_switch+0xb4/0x398
| [  462.906768]  __schedule+0x2e0/0x9a8
| [  462.910852]  preempt_schedule_notrace+0x70/0x1c0
| [  462.915285]  ftrace_ops_list_func+0x10c/0x258
| [  462.919609]  ftrace_graph_call+0x0/0x4
| [  462.923792]  kthread_should_stop+0x8/0x70
| [  462.928038]  kcompactd+0x314/0x4b8
| [  462.932105]  kthread+0x13c/0x188
| [  462.936096]  ret_from_fork+0x10/0x34
| [  462.952198] 
| [  462.954584] ================================
| [  462.957671] WARNING: inconsistent lock state
| [  462.960977] 5.10.0-rc4-next-20201119-00004-g77838ee21ff6 #22 Not tainted
| [  462.964810] --------------------------------
| [  462.967939] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
| [  462.971578] kcompactd0/26 [HC0[0]:SC0[0]:HE0:SE1] takes:
| [  462.975088] ffffcb9ccd9f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
| [  462.983339] {IN-HARDIRQ-W} state was registered at:
| [  462.986815]   __lock_acquire+0xae8/0x1ac8
| [  462.989968]   lock_acquire+0x268/0x508
| [  462.993015]   _raw_spin_lock_irqsave+0x78/0x14c
| [  462.996267]   rcu_sched_clock_irq+0x7c0/0x17a0
| [  462.999496]   update_process_times+0x6c/0xb8
| [  463.002663]   tick_sched_handle.isra.0+0x58/0x88
| [  463.005926]   tick_sched_timer+0x68/0xe0
| [  463.009016]   __hrtimer_run_queues+0x288/0x730
| [  463.012245]   hrtimer_interrupt+0x114/0x288
| [  463.015401]   arch_timer_handler_virt+0x50/0x70
| [  463.018666]   handle_percpu_devid_irq+0x104/0x4c0
| [  463.021974]   generic_handle_irq+0x54/0x78
| [  463.025112]   __handle_domain_irq+0xac/0x130
| [  463.028279]   gic_handle_irq+0x70/0x108
| [  463.031320]   el1_irq+0xc0/0x180
| [  463.034196]   _raw_spin_unlock_irq+0x50/0x98
| [  463.037365]   finish_task_switch+0xb4/0x398
| [  463.040505]   __schedule+0x2e0/0x9a8
| [  463.043472]   preempt_schedule_notrace+0x70/0x1c0
| [  463.046777]   ftrace_ops_list_func+0x10c/0x258
| [  463.049998]   ftrace_graph_call+0x0/0x4
| [  463.053078]   kthread_should_stop+0x8/0x70
| [  463.056221]   kcompactd+0x314/0x4b8
| [  463.059184]   kthread+0x13c/0x188
| [  463.062074]   ret_from_fork+0x10/0x34
| [  463.065079] irq event stamp: 40276
| [  463.068061] hardirqs last  enabled at (40275): [<ffffcb9ccc826168>] _raw_spin_unlock_irq+0x48/0x98
| [  463.072691] hardirqs last disabled at (40276): [<ffffcb9ccb4123c0>] el1_irq+0x80/0x180
| [  463.076954] softirqs last  enabled at (40006): [<ffffcb9ccb410b80>] __do_softirq+0x630/0x6b4
| [  463.081364] softirqs last disabled at (40003): [<ffffcb9ccb4c623c>] irq_exit+0x1cc/0x1e0
| [  463.085574] 
| [  463.085574] other info that might help us debug this:
| [  463.089361]  Possible unsafe locking scenario:
| [  463.089361] 
| [  463.092907]        CPU0
| [  463.095454]        ----
| [  463.097991]   lock(rcu_node_0);
| [  463.102493]   <Interrupt>
| [  463.105092]     lock(rcu_node_0);
| [  463.109675] 
| [  463.109675]  *** DEADLOCK ***
| [  463.109675] 
| [  463.113299] 1 lock held by kcompactd0/26:
| [  463.116335]  #0: ffffcb9ccd9f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c0/0x17a0
| [  463.125315] 
| [  463.125315] stack backtrace:
| [  463.128700] CPU: 0 PID: 26 Comm: kcompactd0 Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6 #22
| [  463.133364] Hardware name: linux,dummy-virt (DT)
| [  463.136648] Call trace:
| [  463.139294]  dump_backtrace+0x0/0x240
| [  463.142270]  show_stack+0x34/0x88
| [  463.145148]  dump_stack+0x140/0x1bc
| [  463.148098]  print_usage_bug+0x2a0/0x2f0
| [  463.151167]  mark_lock.part.0+0x438/0x4e8
| [  463.154249]  mark_held_locks+0x54/0x90
| [  463.157269]  lockdep_hardirqs_on_prepare+0xe0/0x290
| [  463.160613]  trace_hardirqs_on+0x90/0x370
| [  463.163699]  el1_irq+0xdc/0x180
| [  463.166559]  _raw_spin_unlock_irq+0x50/0x98
| [  463.169717]  finish_task_switch+0xb4/0x398
| [  463.172841]  __schedule+0x2e0/0x9a8
| [  463.175812]  preempt_schedule_notrace+0x70/0x1c0
| [  463.179109]  ftrace_ops_list_func+0x10c/0x258
| [  463.182317]  ftrace_graph_call+0x0/0x4
| [  463.185338]  kthread_should_stop+0x8/0x70
| [  463.188421]  kcompactd+0x314/0x4b8
| [  463.191349]  kthread+0x13c/0x188
| [  463.194230]  ret_from_fork+0x10/0x34
| [  463.199912] BUG: scheduling while atomic: kcompactd0/26/0x00000002
| [  463.211543] INFO: lockdep is turned off.
| [  463.219382] Modules linked in:
| [  463.231386] Preemption disabled at:
| [  463.235427] [<ffffcb9ccb6233dc>] ftrace_ops_list_func+0x10c/0x258
| [  463.250883] CPU: 0 PID: 26 Comm: kcompactd0 Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6 #22
| [  463.255542] Hardware name: linux,dummy-virt (DT)
| [  463.258751] Call trace:
| [  463.261423]  dump_backtrace+0x0/0x240
| [  463.264426]  show_stack+0x34/0x88
| [  463.267350]  dump_stack+0x140/0x1bc
| [  463.270308]  __schedule_bug+0xcc/0xe0
| [  463.273317]  __schedule+0x888/0x9a8
| [  463.276295]  preempt_schedule_notrace+0x70/0x1c0
| [  463.279608]  ftrace_ops_list_func+0x10c/0x258
| [  463.282809]  ftrace_graph_call+0x0/0x4
| [  463.285835]  kthread_should_stop+0x8/0x70
| [  463.288933]  kcompactd+0x314/0x4b8
| [  463.291871]  kthread+0x13c/0x188
| [  463.294746]  ret_from_fork+0x10/0x34


In case you're interested in reproducing, the qemu cmdline I use here is
(config attached, disk image not required):

$ qemu-system-aarch64 -kernel $KERNEL_WORKTREE/arch/arm64/boot/Image \
	-append "console=ttyAMA0 root=/dev/sda debug earlycon earlyprintk=serial workqueue.watchdog_thresh=10" \
	-nographic -smp 1 -machine virt -cpu cortex-a57 -m 2G

Thanks,
-- Marco

View attachment ".config" of type "text/plain" (231865 bytes)