linux-kernel - rcu: INFO: rcu_sched self-detected stall on CPU: Workqueue: xfs-conv/md0 xfs_end

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <1b07e849-cffd-db1f-f01b-2b8b45ce8c36@molgen.mpg.de>
Date:   Fri, 5 Feb 2021 08:29:06 +0100
From:   Paul Menzel <pmenzel@...gen.mpg.de>
To:     "Darrick J. Wong" <darrick.wong@...cle.com>,
        linux-xfs@...r.kernel.org, "Paul E. McKenney" <paulmck@...nel.org>,
        Josh Triplett <josh@...htriplett.org>
Cc:     rcu@...r.kernel.org, it+linux-rcu@...gen.mpg.de,
        LKML <linux-kernel@...r.kernel.org>
Subject: rcu: INFO: rcu_sched self-detected stall on CPU: Workqueue:
 xfs-conv/md0 xfs_end_io

Dear Linux folks,


On a Dell PowerEdge T630/0NT78X, BIOS 2.8.0 05/23/2018 with Linux 
5.4.57, we twice saw a self-detected stall on a CPU (October 27th, 2020, 
January 18th, 2021).

Both times, the workqueue is `xfs-conv/md0 xfs_end_io`.

```
[    0.000000] Linux version 5.4.57.mx64.340 
(root@...internet.molgen.mpg.de) (gcc version 7.5.0 (GCC)) #1 SMP Tue 
Aug 11 13:20:33 CEST 2020
[…]
[48962.981257] rcu: INFO: rcu_sched self-detected stall on CPU
[48962.987511] rcu: 	4-....: (20999 ticks this GP) 
idle=fe6/1/0x4000000000000002 softirq=3630188/3630188 fqs=4696
[48962.998805] 	(t=21017 jiffies g=14529009 q=32263)
[48963.004074] Task dump for CPU 4:
[48963.007689] kworker/4:2     R  running task        0 25587      2 
0x80004008
[48963.015591] Workqueue: xfs-conv/md0 xfs_end_io
[48963.020570] Call Trace:
[48963.023311]  <IRQ>
[48963.025560]  sched_show_task+0x11e/0x150
[48963.029957]  rcu_dump_cpu_stacks+0x70/0xa0
[48963.034545]  rcu_sched_clock_irq+0x502/0x770
[48963.039322]  ? tick_sched_do_timer+0x60/0x60
[48963.044106]  update_process_times+0x24/0x60
[48963.048791]  tick_sched_timer+0x37/0x70
[48963.053089]  __hrtimer_run_queues+0x11f/0x2b0
[48963.057960]  ? recalibrate_cpu_khz+0x10/0x10
[48963.062744]  hrtimer_interrupt+0xe5/0x240
[48963.067235]  smp_apic_timer_interrupt+0x6f/0x130
[48963.072407]  apic_timer_interrupt+0xf/0x20
[48963.076994]  </IRQ>
[48963.079347] RIP: 0010:_raw_spin_unlock_irqrestore+0xa/0x10
[48963.085491] Code: f3 90 83 e8 01 75 e8 65 8b 3d 42 0f 56 7e e8 ed ea 
5e ff 48 29 e8 4c 39 e8 76 cf 80 0b 08 eb 8c 0f 1f 44 00 00 c6 07 00 56 
9d <c3> 0f 1f 44 00 00 0f 1f 44 00 00 b8 00 fe ff ff f0 0f c1 07 56 9d
[48963.106524] RSP: 0018:ffffc9000738fd40 EFLAGS: 00000202 ORIG_RAX: 
ffffffffffffff13
[48963.115003] RAX: ffffffff82407588 RBX: ffffffff82407580 RCX: 
ffffffff82407588
[48963.122994] RDX: ffffffff82407588 RSI: 0000000000000202 RDI: 
ffffffff82407580
[48963.130989] RBP: 0000000000000202 R08: ffffffff8203ea00 R09: 
0000000000000001
[48963.138982] R10: ffffc9000738fbb8 R11: 0000000000000001 R12: 
ffffffff82407588
[48963.146976] R13: ffffea005e7ae600 R14: ffff8897b7e5a040 R15: 
ffffea005e7ae600
[48963.154971]  wake_up_page_bit+0xe0/0x100
[48963.159366]  xfs_destroy_ioend+0xce/0x1c0
[48963.163857]  xfs_end_ioend+0xcf/0x1a0
[48963.167958]  xfs_end_io+0xa4/0xd0
[48963.171672]  process_one_work+0x1e5/0x410
[48963.176163]  worker_thread+0x2d/0x3c0
[48963.180265]  ? cancel_delayed_work+0x90/0x90
[48963.185048]  kthread+0x117/0x130
[48963.188663]  ? kthread_create_worker_on_cpu+0x70/0x70
[48963.194321]  ret_from_fork+0x35/0x40
```

As it’s just log level INFO, is there anything what should be done, or 
was the system probably just “overloaded”?


Kind regards,

Paul

View attachment "salvadorthegunzerker-linux-5.4.57-messages-dmesg.txt" of type "text/plain" (157686 bytes)