[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220819022906.GM2125313@paulmck-ThinkPad-P17-Gen-1>
Date: Thu, 18 Aug 2022 19:29:06 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: bpf <bpf@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
Kernel Team <Kernel-team@...com>, rcu@...r.kernel.org,
Shigeru Yoshida <syoshida@...hat.com>
Subject: Re: kernel splat during boot
On Thu, Aug 18, 2022 at 05:10:47PM -0700, Alexei Starovoitov wrote:
> Hi Paul,
>
> I see the following splat just booting net-next or bpf-next trees.
> I have lockdep and kasan on (if that matters).
> Is this a known issue?
>
> [ 3.011826] cblist_init_generic: Setting adjustable number of
> callback queues.
> [ 3.011880]
> [ 3.011883] =============================
> [ 3.011885] [ BUG: Invalid wait context ]
> [ 3.011889] 5.19.0-14019-g75179e2b7f9a #4201 Not tainted
> [ 3.011893] -----------------------------
> [ 3.011896] swapper/0/1 is trying to lock:
> [ 3.011899] ffffffff85be94b8 (&port_lock_key){....}-{3:3}, at:
> serial8250_console_write+0x5fc/0x640
> [ 3.011929] other info that might help us debug this:
> [ 3.011931] context-{5:5}
> [ 3.011934] 3 locks held by swapper/0/1:
> [ 3.011938] #0: ffffffff8404dae0
> (rcu_tasks_rude.cbs_gbl_lock){....}-{2:2}, at:
> cblist_init_generic+0x27/0x340
> [ 3.011964] #1: ffffffff84041d80 (console_lock){+.+.}-{0:0}, at:
> vprintk_emit+0xda/0x2e0
> [ 3.011984] #2: ffffffff839416e0 (console_owner){....}-{0:0}, at:
> console_emit_next_record.constprop.37+0x1c9/0x4d0
> [ 3.012004] stack backtrace:
> [ 3.012007] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 5.19.0-14019-g75179e2b7f9a #4201
> [ 3.012011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> [ 3.012011] Call Trace:
> [ 3.012011] <TASK>
> [ 3.012011] dump_stack_lvl+0x44/0x57
> [ 3.012011] __lock_acquire.cold.73+0xc7/0x31b
> [ 3.012011] ? lockdep_hardirqs_on_prepare+0x1f0/0x1f0
> [ 3.012011] ? rcu_read_lock_sched_held+0x91/0xc0
> [ 3.012011] ? rcu_read_lock_bh_held+0xa0/0xa0
> [ 3.012011] lock_acquire+0x133/0x380
> [ 3.012011] ? serial8250_console_write+0x5fc/0x640
> [ 3.012011] ? lock_release+0x3b0/0x3b0
> [ 3.012011] _raw_spin_lock_irqsave+0x35/0x50
> [ 3.012011] ? serial8250_console_write+0x5fc/0x640
> [ 3.012011] serial8250_console_write+0x5fc/0x640
> [ 3.012011] ? rcu_read_lock_bh_held+0xa0/0xa0
> [ 3.012011] ? serial8250_default_handle_irq+0x80/0x80
> [ 3.012011] ? lock_release+0x3b0/0x3b0
> [ 3.012011] ? do_raw_spin_lock+0x107/0x1c0
> [ 3.012011] ? rwlock_bug.part.2+0x60/0x60
> [ 3.012011] ? prb_final_commit+0x50/0x50
> [ 3.012011] console_emit_next_record.constprop.37+0x271/0x4d0
> [ 3.012011] ? info_print_ext_header.constprop.38+0x110/0x110
> [ 3.012011] ? rcu_read_lock_sched_held+0x91/0xc0
> [ 3.012011] console_unlock+0x1d8/0x2c0
> [ 3.012011] ? devkmsg_open+0x170/0x170
> [ 3.012011] ? vprintk_emit+0xda/0x2e0
> [ 3.012011] vprintk_emit+0xe3/0x2e0
> [ 3.012011] _printk+0x96/0xb0
> [ 3.012011] ? pm_suspend.cold.6+0x2e3/0x2e3
> [ 3.012011] ? _raw_write_unlock+0x1f/0x30
> [ 3.012011] ? do_raw_spin_lock+0x107/0x1c0
> [ 3.012011] cblist_init_generic.cold.36+0x24/0x32
> [ 3.012011] rcu_init_tasks_generic+0x23/0xf0
> [ 3.012011] kernel_init_freeable+0x1d3/0x38a
> [ 3.012011] ? _raw_spin_unlock_irq+0x24/0x30
> [ 3.012011] ? rest_init+0x1d0/0x1d0
> [ 3.012011] kernel_init+0x18/0x130
> [ 3.012011] ? rest_init+0x1d0/0x1d0
> [ 3.012011] ret_from_fork+0x1f/0x30
> [ 3.012011] </TASK>
> [ 3.012029] cblist_init_generic: Setting shift to 3 and lim to 1.
> [ 3.013301] cblist_init_generic: Setting shift to 3 and lim to 1.
> [ 3.014203] Running RCU-tasks wait API self tests
Does the the patch from Shigeru Yoshida shown below help?
This patch is in my view a workaround for printf() losing the ability
to print from interrupts-diabled sections of code, but maybe it will be
of help to you.
Thanx, Paul
------------------------------------------------------------------------
>From SRS0=JFiA=YG=redhat.com=syoshida@...nel.org Tue Aug 2 09:31:25 2022
From: Shigeru Yoshida <syoshida@...hat.com>
To: paulmck@...nel.org,
frederic@...nel.org,
neeraju@...cinc.com,
josh@...htriplett.org
Cc: rostedt@...dmis.org,
mathieu.desnoyers@...icios.com,
jiangshanlai@...il.com,
joel@...lfernandes.org,
rcu@...r.kernel.org,
linux-kernel@...r.kernel.org,
Shigeru Yoshida <syoshida@...hat.com>
Subject: [PATCH] rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
Date: Wed, 3 Aug 2022 01:22:05 +0900
Message-Id: <20220802162205.817796-1-syoshida@...hat.com>
pr_info() is called with rtp->cbs_gbl_lock spin lock locked. Because
pr_info() calls printk() that might sleep, this will result in BUG
like below:
[ 0.206455] cblist_init_generic: Setting adjustable number of callback queues.
[ 0.206463]
[ 0.206464] =============================
[ 0.206464] [ BUG: Invalid wait context ]
[ 0.206465] 5.19.0-00428-g9de1f9c8ca51 #5 Not tainted
[ 0.206466] -----------------------------
[ 0.206466] swapper/0/1 is trying to lock:
[ 0.206467] ffffffffa0167a58 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x327/0x4a0
[ 0.206473] other info that might help us debug this:
[ 0.206473] context-{5:5}
[ 0.206474] 3 locks held by swapper/0/1:
[ 0.206474] #0: ffffffff9eb597e0 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1f0
[ 0.206478] #1: ffffffff9eb579c0 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x7e
[ 0.206482] #2: ffffffff9ea77780 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.0+0x111/0x330
[ 0.206485] stack backtrace:
[ 0.206486] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-00428-g9de1f9c8ca51 #5
[ 0.206488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[ 0.206489] Call Trace:
[ 0.206490] <TASK>
[ 0.206491] dump_stack_lvl+0x6a/0x9f
[ 0.206493] __lock_acquire.cold+0x2d7/0x2fe
[ 0.206496] ? stack_trace_save+0x46/0x70
[ 0.206497] lock_acquire+0xd1/0x2f0
[ 0.206499] ? serial8250_console_write+0x327/0x4a0
[ 0.206500] ? __lock_acquire+0x5c7/0x2720
[ 0.206502] _raw_spin_lock_irqsave+0x3d/0x90
[ 0.206504] ? serial8250_console_write+0x327/0x4a0
[ 0.206506] serial8250_console_write+0x327/0x4a0
[ 0.206508] console_emit_next_record.constprop.0+0x180/0x330
[ 0.206511] console_unlock+0xf7/0x1f0
[ 0.206512] vprintk_emit+0xf7/0x330
[ 0.206514] _printk+0x63/0x7e
[ 0.206516] cblist_init_generic.constprop.0.cold+0x24/0x32
[ 0.206518] rcu_init_tasks_generic+0x5/0xd9
[ 0.206522] kernel_init_freeable+0x15b/0x2a2
[ 0.206523] ? rest_init+0x160/0x160
[ 0.206526] kernel_init+0x11/0x120
[ 0.206527] ret_from_fork+0x1f/0x30
[ 0.206530] </TASK>
[ 0.207018] cblist_init_generic: Setting shift to 1 and lim to 1.
This patch moves pr_info() so that it is called without
rtp->cbs_gbl_lock locked.
Signed-off-by: Shigeru Yoshida <syoshida@...hat.com>
---
kernel/rcu/tasks.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 3925e32159b5..d46dd970bf22 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -227,7 +227,6 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
if (rcu_task_enqueue_lim < 0) {
rcu_task_enqueue_lim = 1;
rcu_task_cb_adjust = true;
- pr_info("%s: Setting adjustable number of callback queues.\n", __func__);
} else if (rcu_task_enqueue_lim == 0) {
rcu_task_enqueue_lim = 1;
}
@@ -256,6 +255,10 @@ static void cblist_init_generic(struct rcu_tasks *rtp)
raw_spin_unlock_rcu_node(rtpcp); // irqs remain disabled.
}
raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags);
+
+ if (rcu_task_cb_adjust)
+ pr_info("%s: Setting adjustable number of callback queues.\n", __func__);
+
pr_info("%s: Setting shift to %d and lim to %d.\n", __func__, data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim));
}
--
2.37.1
Powered by blists - more mailing lists