[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYR2-Y-Fe2Chh5if@dread.disaster.area>
Date: Thu, 5 Feb 2026 21:54:49 +1100
From: Dave Chinner <david@...morbit.com>
To: alexjlzheng@...il.com
Cc: cem@...nel.org, linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
Jinliang Zheng <alexjlzheng@...cent.com>
Subject: Re: [PATCH 1/2] xfs: take a breath in xlog_ioend_work()
On Thu, Feb 05, 2026 at 04:26:20PM +0800, alexjlzheng@...il.com wrote:
> From: Jinliang Zheng <alexjlzheng@...cent.com>
>
> The xlog_ioend_work() function contains several nested loops with
> fairly complex operations, which may leads to:
>
> PID: 2604722 TASK: ffff88c08306b1c0 CPU: 263 COMMAND: "kworker/263:0H"
> #0 [ffffc9001cbf8d58] machine_kexec at ffffffff9d086081
> #1 [ffffc9001cbf8db8] __crash_kexec at ffffffff9d20817a
> #2 [ffffc9001cbf8e78] panic at ffffffff9d107d8f
> #3 [ffffc9001cbf8ef8] watchdog_timer_fn at ffffffff9d243511
> #4 [ffffc9001cbf8f28] __hrtimer_run_queues at ffffffff9d1e62ff
> #5 [ffffc9001cbf8f80] hrtimer_interrupt at ffffffff9d1e73d4
> #6 [ffffc9001cbf8fd8] __sysvec_apic_timer_interrupt at ffffffff9d07bb29
> #7 [ffffc9001cbf8ff0] sysvec_apic_timer_interrupt at ffffffff9dd689f9
> --- <IRQ stack> ---
> #8 [ffffc900460a7c28] asm_sysvec_apic_timer_interrupt at ffffffff9de00e86
> [exception RIP: slab_free_freelist_hook.constprop.0+107]
> RIP: ffffffff9d3ef74b RSP: ffffc900460a7cd0 RFLAGS: 00000286
> RAX: ffff89ea4de06b00 RBX: ffff89ea4de06a00 RCX: ffff89ea4de06a00
> RDX: 0000000000000100 RSI: ffffc900460a7d28 RDI: ffff888100044c80
> RBP: ffff888100044c80 R8: 0000000000000000 R9: ffffffffc21e8500
> R10: ffff88c867e93200 R11: 0000000000000001 R12: ffff89ea4de06a00
> R13: ffffc900460a7d28 R14: ffff89ea4de06a00 R15: ffffc900460a7d30
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #9 [ffffc900460a7d18] __kmem_cache_free at ffffffff9d3f65a0
> #10 [ffffc900460a7d70] xlog_cil_committed at ffffffffc21e85af [xfs]
> #11 [ffffc900460a7da0] xlog_cil_process_committed at ffffffffc21e9747 [xfs]
> #12 [ffffc900460a7dd0] xlog_state_do_iclog_callbacks at ffffffffc21e41eb [xfs]
> #13 [ffffc900460a7e28] xlog_state_do_callback at ffffffffc21e436f [xfs]
> #14 [ffffc900460a7e50] xlog_ioend_work at ffffffffc21e6e1c [xfs]
> #15 [ffffc900460a7e70] process_one_work at ffffffff9d12de69
> #16 [ffffc900460a7ea8] worker_thread at ffffffff9d12e79b
> #17 [ffffc900460a7ef8] kthread at ffffffff9d1378fc
> #18 [ffffc900460a7f30] ret_from_fork at ffffffff9d042dd0
> #19 [ffffc900460a7f50] ret_from_fork_asm at ffffffff9d007e2b
>
> This patch adds cond_resched() to avoid softlockups similar to the one
> described above.
You've elided the soft lockup messages that tell us how long this
task was holding the CPU. What is the soft lockup timer set to?
What workload causes this to happen? How do we reproduce it?
FWIW, yes, there might be several tens of thousands of objects to
move to the AIL in this journal IO completion path, but if this
takes more than a couple of hundred milliseconds of processing time
then there is something else going wrong....
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists