[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zx8ip5avDVafVhtL@dread.disaster.area>
Date: Mon, 28 Oct 2024 16:35:35 +1100
From: Dave Chinner <david@...morbit.com>
To: John Garry <john.g.garry@...cle.com>
Cc: Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
viro@...iv.linux.org.uk, brauner@...nel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: v6.12-rc workqueue lockups
On Thu, Oct 24, 2024 at 11:23:17PM +0100, John Garry wrote:
> On 24/10/2024 22:13, Dave Chinner wrote:
> > > > BTW, can you please share logs which would contain full stacktraces that
> > > > this softlockup reports produce? The attached dmesg is just from fresh
> > > > boot... Thanks!
> > > >
> > > thanks for getting back to me.
> > >
> > > So I think that enabling /proc/sys/kernel/softlockup_all_cpu_backtrace is
> > > required there. Unfortunately my VM often just locks up without any sign of
> > > life.
> > Attach a "serial" console to the vm - add "console=ttyS0,115600" to
> > the kernel command line and add "-serial pty" to the qemu command
> > line. You can then attach something like minicom to the /dev/pts/X
> > device that qemu creates for the console output and capture
> > everything from initial boot right through to the softlockup traces
> > that are emitted...
>
> I am using an OCI instance, so I can't change the qemu command line (as far
> as I know).
>
> For this issue, the Cloud Shell locks up also. There are other console
> connection methods, which I can try.
>
> BTW, earlier today I got this once when trying to recreate this issue:
>
> [ 1549.241972] ------------[ cut here ]------------
> [ 1609.240236] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1609.240243] rcu: 5-...!: (0 ticks this GP)
> idle=a8f4/1/0x4000000000000000 softirq=71287/71287 fqs=1
> [ 1609.240249] rcu: (detected by 2, t=60004 jiffies, g=168077, q=10823
> ncpus=16)
> [ 1609.240252] Sending NMI from CPU 2 to CPUs 5:
> [ 1609.240277] NMI backtrace for cpu 5
> [ 1609.240281] CPU: 5 UID: 1002 PID: 8250 Comm: mysqld Tainted: G W
> 6.12.0-rc4-g556c97f2ecbf #40
> [ 1609.240286] Tainted: [W]=WARN
> [ 1609.240288] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.5.1 06/16/2021
> [ 1609.240289] RIP: 0010:native_halt+0xe/0x20
> [ 1609.240296] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 90 90 90 90 90 90
> 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 23 f1 17 01 f4 <e9>
> 28 c3 05 01 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90
> [ 1609.240298] RSP: 0018:ffffc0c8c71dbd20 EFLAGS: 00000046
> [ 1609.240301] RAX: 0000000000000003 RBX: ffff9ff73fab6580 RCX:
> 0000000000000008
> [ 1609.240303] RDX: ffff9ff7bffaf740 RSI: 0000000000000003 RDI:
> ffff9ff73fab6580
> [ 1609.240304] RBP: ffff9ff73f8b7440 R08: 0000000000000008 R09:
> 0000000000000074
> [ 1609.240306] R10: 0000000000000002 R11: 0000000000000000 R12:
> 0000000000000000
> [ 1609.240307] R13: 0000000000000001 R14: 0000000000000100 R15:
> 0000000000180000
> [ 1609.240311] FS: 00007f9e12600700(0000) GS:ffff9ff73f880000(0000)
> knlGS:0000000000000000
> [ 1609.240313] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1609.240315] CR2: 00007f9d63e00004 CR3: 0000001a0bc04005 CR4:
> 0000000000770ef0
> [ 1609.240319] PKRU: 55555554
> [ 1609.240320] Call Trace:
> [ 1609.240322] <NMI>
> [ 1609.240325] ? nmi_cpu_backtrace+0x98/0x110
> [ 1609.240330] ? nmi_cpu_backtrace_handler+0x11/0x20
> [ 1609.240334] ? nmi_handle+0x5c/0x150
> [ 1609.240339] ? default_do_nmi+0x4e/0x120
> [ 1609.240343] ? exc_nmi+0x137/0x1d0
> [ 1609.240347] ? end_repeat_nmi+0xf/0x53
> [ 1609.240354] ? native_halt+0xe/0x20
> [ 1609.240357] ? native_halt+0xe/0x20
> [ 1609.240360] ? native_halt+0xe/0x20
> [ 1609.240363] </NMI>
> [ 1609.240364] <TASK>
> [ 1609.240366] kvm_wait+0x47/0x60
> [ 1609.240368] __pv_queued_spin_lock_slowpath+0x255/0x370
> [ 1609.240373] _raw_spin_lock+0x29/0x30
> [ 1609.240376] raw_spin_rq_lock_nested+0x1c/0x80
> [ 1609.240381] __task_rq_lock+0x3f/0xe0
> [ 1609.240384] try_to_wake_up+0x3cf/0x640
> [ 1609.240387] ? plist_del+0x63/0xc0
> [ 1609.240391] wake_up_q+0x4d/0x90
> [ 1609.240394] futex_wake+0x154/0x180
> [ 1609.240400] do_futex+0xf8/0x1d0
> [ 1609.240404] __x64_sys_futex+0x68/0x1c0
> [ 1609.240407] ? restore_fpregs_from_fpstate+0x3c/0xa0
> [ 1609.240411] do_syscall_64+0x62/0x170
> [ 1609.240416] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Yup, I'm seeing random RCU stalls as well when running a 64p
VM under hard concurrent fstests load. The serial console output is
occasionally tripping RCU stall warnings, too.
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists