linux-kernel - Re: [PATCH 1/2] xfs: take a breath in xlog_ioend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aYR2-Y-Fe2Chh5if@dread.disaster.area>
Date: Thu, 5 Feb 2026 21:54:49 +1100
From: Dave Chinner <david@...morbit.com>
To: alexjlzheng@...il.com
Cc: cem@...nel.org, linux-xfs@...r.kernel.org, linux-kernel@...r.kernel.org,
	Jinliang Zheng <alexjlzheng@...cent.com>
Subject: Re: [PATCH 1/2] xfs: take a breath in xlog_ioend_work()

On Thu, Feb 05, 2026 at 04:26:20PM +0800, alexjlzheng@...il.com wrote:
> From: Jinliang Zheng <alexjlzheng@...cent.com>
> 
> The xlog_ioend_work() function contains several nested loops with
> fairly complex operations, which may leads to:
> 
>   PID: 2604722  TASK: ffff88c08306b1c0  CPU: 263  COMMAND: "kworker/263:0H"
>    #0 [ffffc9001cbf8d58] machine_kexec at ffffffff9d086081
>    #1 [ffffc9001cbf8db8] __crash_kexec at ffffffff9d20817a
>    #2 [ffffc9001cbf8e78] panic at ffffffff9d107d8f
>    #3 [ffffc9001cbf8ef8] watchdog_timer_fn at ffffffff9d243511
>    #4 [ffffc9001cbf8f28] __hrtimer_run_queues at ffffffff9d1e62ff
>    #5 [ffffc9001cbf8f80] hrtimer_interrupt at ffffffff9d1e73d4
>    #6 [ffffc9001cbf8fd8] __sysvec_apic_timer_interrupt at ffffffff9d07bb29
>    #7 [ffffc9001cbf8ff0] sysvec_apic_timer_interrupt at ffffffff9dd689f9
>   --- <IRQ stack> ---
>    #8 [ffffc900460a7c28] asm_sysvec_apic_timer_interrupt at ffffffff9de00e86
>       [exception RIP: slab_free_freelist_hook.constprop.0+107]
>       RIP: ffffffff9d3ef74b  RSP: ffffc900460a7cd0  RFLAGS: 00000286
>       RAX: ffff89ea4de06b00  RBX: ffff89ea4de06a00  RCX: ffff89ea4de06a00
>       RDX: 0000000000000100  RSI: ffffc900460a7d28  RDI: ffff888100044c80
>       RBP: ffff888100044c80   R8: 0000000000000000   R9: ffffffffc21e8500
>       R10: ffff88c867e93200  R11: 0000000000000001  R12: ffff89ea4de06a00
>       R13: ffffc900460a7d28  R14: ffff89ea4de06a00  R15: ffffc900460a7d30
>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>    #9 [ffffc900460a7d18] __kmem_cache_free at ffffffff9d3f65a0
>   #10 [ffffc900460a7d70] xlog_cil_committed at ffffffffc21e85af [xfs]
>   #11 [ffffc900460a7da0] xlog_cil_process_committed at ffffffffc21e9747 [xfs]
>   #12 [ffffc900460a7dd0] xlog_state_do_iclog_callbacks at ffffffffc21e41eb [xfs]
>   #13 [ffffc900460a7e28] xlog_state_do_callback at ffffffffc21e436f [xfs]
>   #14 [ffffc900460a7e50] xlog_ioend_work at ffffffffc21e6e1c [xfs]
>   #15 [ffffc900460a7e70] process_one_work at ffffffff9d12de69
>   #16 [ffffc900460a7ea8] worker_thread at ffffffff9d12e79b
>   #17 [ffffc900460a7ef8] kthread at ffffffff9d1378fc
>   #18 [ffffc900460a7f30] ret_from_fork at ffffffff9d042dd0
>   #19 [ffffc900460a7f50] ret_from_fork_asm at ffffffff9d007e2b
> 
> This patch adds cond_resched() to avoid softlockups similar to the one
> described above.

You've elided the soft lockup messages that tell us how long this
task was holding the CPU. What is the soft lockup timer set to?
What workload causes this to happen? How do we reproduce it?

FWIW, yes, there might be several tens of thousands of objects to
move to the AIL in this journal IO completion path, but if this
takes more than a couple of hundred milliseconds of processing time
then there is something else going wrong....

-Dave.
-- 
Dave Chinner
david@...morbit.com