linux-ext4 - Re: [PATCH 6/7] generic/774: reduce file size

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251112180549.GD196366@frogsfrogsfrogs>
Date: Wed, 12 Nov 2025 10:05:49 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Christoph Hellwig <hch@...radead.org>
Cc: zlang@...hat.com, fstests@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 6/7] generic/774: reduce file size

On Tue, Nov 11, 2025 at 01:31:06AM -0800, Christoph Hellwig wrote:
> On Mon, Nov 10, 2025 at 10:27:35AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@...nel.org>
> > 
> > We've gotten complaints about this test taking hours to run and
> > producing stall warning on test VMs with a large number of cpu cores.  I
> 
> Btw, we should still find a way to avoid stalls for anything a user
> can easily trigger, independent of fixing the test case.  What were
> these stall warnings?

The ones I see look like this:

FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 oci-mtr49 6.18.0-rc5-xfsx #rc5 SMP PREEMPT_DYNAMIC Sun Nov  9 20:11:20 PST 2025
MKFS_OPTIONS  -- -f -m metadir=1,autofsck=1,uquota,gquota,pquota, -b size=32768, /dev/sda4
MOUNT_OPTIONS -- /dev/sda4 /opt

and dmesg does this:

run fstests generic/774 at 2025-11-10 13:37:52
XFS (sda3): EXPERIMENTAL large block size feature enabled.  Use at your own risk!
XFS (sda3): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda3): Mounting V5 Filesystem 0280a424-6e91-401b-8c47-864532b77ceb
XFS (sda3): Ending clean mount
XFS (sda4): EXPERIMENTAL large block size feature enabled.  Use at your own risk!
XFS (sda4): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda4): Mounting V5 Filesystem f09b8bdf-3518-44e5-97cb-e0ab36808488
XFS (sda4): Ending clean mount
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Quotacheck: Done.
XFS (sda4): Unmounting Filesystem f09b8bdf-3518-44e5-97cb-e0ab36808488
XFS (sda4): EXPERIMENTAL large block size feature enabled.  Use at your own risk!
XFS (sda4): EXPERIMENTAL metadata directory tree feature enabled.  Use at your own risk!
XFS (sda4): Mounting V5 Filesystem 11615837-dfa0-47b8-a078-21f9f1f7525b
XFS (sda4): Ending clean mount
XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Quotacheck: Done.
workqueue: iomap_dio_complete_work hogged CPU for >10666us 4 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 5 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 7 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 11 times, consider switching to WQ_UNBOUND
workqueue: iomap_dio_complete_work hogged CPU for >10666us 19 times, consider switching to WQ_UNBOUND
INFO: task 3:1:2248454 blocked for more than 61 seconds.
      Tainted: G        W           6.18.0-rc5-xfsx #rc5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:3:1             state:D stack:12744 pid:2248454 tgid:2248454 ppid:2      task_flags:0x4248060 flags:0x00080000
Workqueue: dio/sda4 iomap_dio_complete_work
Call Trace:
 <TASK>
 __schedule+0x4cb/0x1a70
 ? check_preempt_wakeup_fair+0x162/0x2a0
 ? wakeup_preempt+0x66/0x70
 schedule+0x2a/0xe0
 schedule_preempt_disabled+0x18/0x30
 rwsem_down_write_slowpath+0x2c5/0x780
 down_write+0x6e/0x70
 xfs_reflink_end_atomic_cow+0x133/0x200 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 ? finish_task_switch.isra.0+0x9b/0x2b0
 xfs_dio_write_end_io+0x117/0x320 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 iomap_dio_complete+0x41/0x240
 ? aio_fsync_work+0x120/0x120
 iomap_dio_complete_work+0x17/0x30
 process_one_work+0x196/0x3e0
 worker_thread+0x264/0x380
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 ? rescuer_thread+0x4f0/0x4f0
 kthread+0x117/0x270
 ? kthread_complete_and_exit+0x20/0x20
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0xa4/0xe0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork_asm+0x11/0x20
 </TASK>
INFO: task 3:1:2248454 <writer> blocked on an rw-semaphore likely owned by task 3:13:2308404 <writer>
INFO: task 3:13:2308404 blocked for more than 61 seconds.
      Tainted: G        W           6.18.0-rc5-xfsx #rc5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:3:13            state:D stack:12184 pid:2308404 tgid:2308404 ppid:2      task_flags:0x4248060 flags:0x00080000
Workqueue: dio/sda4 iomap_dio_complete_work
Call Trace:
 <TASK>
 __schedule+0x4cb/0x1a70
 ? do_raw_spin_unlock+0x49/0xb0
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 schedule+0x2a/0xe0
 xlog_grant_head_wait+0x5c/0x2a0 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 ? kmem_cache_free+0x540/0x5b0
 xlog_grant_head_check+0x10e/0x180 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_log_regrant+0xc2/0x1e0 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_trans_roll+0x90/0xc0 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_defer_trans_roll+0x73/0x120 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_defer_finish_noroll+0x2a3/0x520 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_trans_commit+0x3d/0x70 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 xfs_reflink_end_atomic_cow+0x19c/0x200 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 ? finish_task_switch.isra.0+0x9b/0x2b0
 xfs_dio_write_end_io+0x117/0x320 [xfs 28bfed63550f2a1085614a29851734fab813b29a]
 iomap_dio_complete+0x41/0x240
 ? aio_fsync_work+0x120/0x120
 iomap_dio_complete_work+0x17/0x30
 process_one_work+0x196/0x3e0
 worker_thread+0x264/0x380
 ? _raw_spin_unlock_irqrestore+0x1e/0x40
 ? rescuer_thread+0x4f0/0x4f0
 kthread+0x117/0x270
 ? kthread_complete_and_exit+0x20/0x20
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0xa4/0xe0
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork_asm+0x11/0x20
 </TASK>

I think this is an ABBA livelock where nearly all the threads hold a log
grant reservation and are trying to obtain ILOCK; but one thread is in
the middle of an atomic cow write ioend transaction chain but has run
out of permanent reservation during xfs_trans_roll, which means it holds
ILOCK and is trying to reserve log grant space.

I also suspect this could happen with regular ioend remapping of shared
space after a COW.  Perhaps for these ioend types we should chain ioends
together like we do for direct writes to zoned storage.  Remapping isn't
going to be low-latency like pure overwrites anyway.

> The patch looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@....de>

Thanks!

--D