linux-kernel - Re: [PATCH 2/2] xfs: take a breath in xfsaild()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260205125000.2324010-1-alexjlzheng@tencent.com>
Date: Thu,  5 Feb 2026 20:49:59 +0800
From: Jinliang Zheng <alexjlzheng@...il.com>
To: david@...morbit.com
Cc: alexjlzheng@...il.com,
	alexjlzheng@...cent.com,
	cem@...nel.org,
	linux-kernel@...r.kernel.org,
	linux-xfs@...r.kernel.org
Subject: Re: [PATCH 2/2] xfs: take a breath in xfsaild()

On Thu, 5 Feb 2026 22:44:51 +1100, david@...morbit.com wrote:
> On Thu, Feb 05, 2026 at 04:26:21PM +0800, alexjlzheng@...il.com wrote:
> > From: Jinliang Zheng <alexjlzheng@...cent.com>
> > 
> > We noticed a softlockup like:
> > 
> >   crash> bt
> >   PID: 5153     TASK: ffff8960a7ca0000  CPU: 115  COMMAND: "xfsaild/dm-4"
> >    #0 [ffffc9001b1d4d58] machine_kexec at ffffffff9b086081
> >    #1 [ffffc9001b1d4db8] __crash_kexec at ffffffff9b20817a
> >    #2 [ffffc9001b1d4e78] panic at ffffffff9b107d8f
> >    #3 [ffffc9001b1d4ef8] watchdog_timer_fn at ffffffff9b243511
> >    #4 [ffffc9001b1d4f28] __hrtimer_run_queues at ffffffff9b1e62ff
> >    #5 [ffffc9001b1d4f80] hrtimer_interrupt at ffffffff9b1e73d4
> >    #6 [ffffc9001b1d4fd8] __sysvec_apic_timer_interrupt at ffffffff9b07bb29
> >    #7 [ffffc9001b1d4ff0] sysvec_apic_timer_interrupt at ffffffff9bd689f9
> >   --- <IRQ stack> ---
> >    #8 [ffffc90031cd3a18] asm_sysvec_apic_timer_interrupt at ffffffff9be00e86
> >       [exception RIP: part_in_flight+47]
> >       RIP: ffffffff9b67960f  RSP: ffffc90031cd3ac8  RFLAGS: 00000282
> >       RAX: 00000000000000a9  RBX: 00000000000c4645  RCX: 00000000000000f5
> >       RDX: ffffe89fffa36fe0  RSI: 0000000000000180  RDI: ffffffff9d1ae260
> >       RBP: ffff898083d30000   R8: 00000000000000a8   R9: 0000000000000000
> >       R10: ffff89808277d800  R11: 0000000000001000  R12: 0000000101a7d5be
> >       R13: 0000000000000000  R14: 0000000000001001  R15: 0000000000001001
> >       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> >    #9 [ffffc90031cd3ad8] update_io_ticks at ffffffff9b6602e4
> >   #10 [ffffc90031cd3b00] bdev_start_io_acct at ffffffff9b66031b
> >   #11 [ffffc90031cd3b20] dm_io_acct at ffffffffc18d7f98 [dm_mod]
> >   #12 [ffffc90031cd3b50] dm_submit_bio_remap at ffffffffc18d8195 [dm_mod]
> >   #13 [ffffc90031cd3b70] dm_split_and_process_bio at ffffffffc18d9799 [dm_mod]
> >   #14 [ffffc90031cd3be0] dm_submit_bio at ffffffffc18d9b07 [dm_mod]
> >   #15 [ffffc90031cd3c20] __submit_bio at ffffffff9b65f61c
> >   #16 [ffffc90031cd3c38] __submit_bio_noacct at ffffffff9b65f73e
> >   #17 [ffffc90031cd3c80] xfs_buf_ioapply_map at ffffffffc23df4ea [xfs]
> 
> This isn't from a TOT kernel. xfs_buf_ioapply_map() went away a year
> ago. What kernel is this occurring on?

Thanks for your reply. :)

It's based on v6.6.

> 
> >   #18 [ffffc90031cd3ce0] _xfs_buf_ioapply at ffffffffc23df64f [xfs]
> >   #19 [ffffc90031cd3d50] __xfs_buf_submit at ffffffffc23df7b8 [xfs]
> >   #20 [ffffc90031cd3d70] xfs_buf_delwri_submit_buffers at ffffffffc23dffbd [xfs]
> >   #21 [ffffc90031cd3df8] xfsaild_push at ffffffffc24268e5 [xfs]
> >   #22 [ffffc90031cd3eb8] xfsaild at ffffffffc2426f88 [xfs]
> >   #23 [ffffc90031cd3ef8] kthread at ffffffff9b1378fc
> >   #24 [ffffc90031cd3f30] ret_from_fork at ffffffff9b042dd0
> >   #25 [ffffc90031cd3f50] ret_from_fork_asm at ffffffff9b007e2b
> > 
> > This patch adds cond_resched() to avoid softlockups similar to the one
> > described above.
> 
> Again: how do this softlock occur?

[28089.641309] watchdog: BUG: soft lockup - CPU#115 stuck for 26s! [xfsaild/dm-4:5153] 

> 
> xfsaild_push() pushes at most 1000 items at a time for IO.  It would
> have to be a fairly fast device not to block on the request queues
> filling as we submit batches of 1000 buffers at a time.
> 
> Then the higher level AIL traversal loop would also have to be
> making continuous progress without blocking. Hence it must not hit
> the end of the AIL, nor ever hit pinned, stale, flushing or locked
> items in the AIL for as long as it takes for the soft lookup timer
> to fire.  This seems ... highly unlikely.
> 
> IOWs, if we are looping in this path without giving up the CPU for
> seconds at a time, then it is not behaving as I'd expect it to
> behave. We need to understand why is this code apparently behaving
> in an unexpected way, not just silence the warning....
> 
> Can you please explain how the softlockup timer is being hit here so we
> can try to understand the root cause of the problem? Workload,

Again, a testsuite combining stress-ng, LTP, and fio, executed concurrently.

> hardware, filesystem config, storage stack, etc all matter here,


================================= CPU ======================================
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           45 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  384
On-line CPU(s) list:                     0-383

================================= MEM ======================================
[root@...alhost ~]# free -h
               total        used        free      shared  buff/cache   available
Mem:           1.5Ti       479Gi       1.0Ti       2.0Gi       6.2Gi       1.0Ti
Swap:             0B          0B          0B


================================= XFS ======================================
[root@...alhost ~]# xfs_info /dev/ts/home 
meta-data=/dev/mapper/ts-home    isize=512    agcount=4, agsize=45875200 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=1
data     =                       bsize=4096   blocks=183500800, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=89600, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


================================= IOS ======================================
sdb                                                                                                
├─sdb1        vfat        FAT32              EE62-ACF4                               591.3M     1% /boot/efi
├─sdb2        xfs                            95e0a12a-ed33-45b7-abe1-640c4e334468      1.6G    17% /boot
└─sdb3        LVM2_member LVM2 001           fSTN7x-BNcZ-w67a-gZBf-Vlps-7gtv-KAreS7                
  ├─ts-root   xfs                            f78c147c-86d7-4675-b9bd-ed0d512f37f4     84.5G    50% /
  └─ts-home   xfs                            01677289-32f2-41b0-ab59-3c5d14a1eefa    668.6G     4% /home

> because they all play a part in these paths never blocking on
> a lock, a full queue, a pinned buffer, etc, whilst processing
> hundreds of thousands of dirty objects for IO.

And, there's another softlockup, which is similar:

watchdog: BUG: soft lockup - CPU#342 stuck for 22s! [xfsaild/dm-4:5045]

  crash> bt
  PID: 5045     TASK: ffff89e0a0150000  CPU: 342  COMMAND: "xfsaild/dm-4"
   #0 [ffffc9001d98cd58] machine_kexec at ffffffffa8086081
   #1 [ffffc9001d98cdb8] __crash_kexec at ffffffffa820817a
   #2 [ffffc9001d98ce78] panic at ffffffffa8107d8f
   #3 [ffffc9001d98cef8] watchdog_timer_fn at ffffffffa8243511
   #4 [ffffc9001d98cf28] __hrtimer_run_queues at ffffffffa81e62ff
   #5 [ffffc9001d98cf80] hrtimer_interrupt at ffffffffa81e73d4
   #6 [ffffc9001d98cfd8] __sysvec_apic_timer_interrupt at ffffffffa807bb29
   #7 [ffffc9001d98cff0] sysvec_apic_timer_interrupt at ffffffffa8d689f9
  --- <IRQ stack> ---
   #8 [ffffc900351efa48] asm_sysvec_apic_timer_interrupt at ffffffffa8e00e86
      [exception RIP: kernel_fpu_begin_mask+66]
      RIP: ffffffffa8044f52  RSP: ffffc900351efaf8  RFLAGS: 00000206
      RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000000
      RDX: 0000000000000000  RSI: ffff899f8f9ee000  RDI: ffff89e0a0150000
      RBP: 0000000000001000   R8: 0000000000000000   R9: 0000000000001000
      R10: 0000000000000000  R11: ffff899f8f9ee030  R12: ffffc900351efb38
      R13: ffff899f8f9ee000  R14: ffff88e084624158  R15: ffff88e083828000
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #9 [ffffc900351efb10] crc32c_pcl_intel_update at ffffffffc199b390 [crc32c_intel]
  #10 [ffffc900351efb30] crc32c at ffffffffc196d03f [libcrc32c]
  #11 [ffffc900351efcb8] xfs_dir3_data_write_verify at ffffffffc250cfd9 [xfs]
  #12 [ffffc900351efce0] _xfs_buf_ioapply at ffffffffc253d5cd [xfs]
  #13 [ffffc900351efd50] __xfs_buf_submit at ffffffffc253d7b8 [xfs]
  #14 [ffffc900351efd70] xfs_buf_delwri_submit_buffers at ffffffffc253dfbd [xfs]
  #15 [ffffc900351efdf8] xfsaild_push at ffffffffc25848e5 [xfs]
  #16 [ffffc900351efeb8] xfsaild at ffffffffc2584f88 [xfs]
  #17 [ffffc900351efef8] kthread at ffffffffa81378fc
  #18 [ffffc900351eff30] ret_from_fork at ffffffffa8042dd0
  #19 [ffffc900351eff50] ret_from_fork_asm at ffffffffa8007e2b

Thanks,
Jinliang Zheng. :)

> 
> At least, I'm assuming we're talking about hundreds of thousands of
> objects, because I know the AIL can push a hundred thousand dirty
> buffers to disk every second when it is close to being CPU bound. So
> if it's not giving up the CPU for long enough to fire the soft
> lockup timer, we must be talking about processing millions of
> objects without blocking even once....
> 
> -Dave.
> -- 
> Dave Chinner
> david@...morbit.com