linux-ext4 - task hung in ext4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231017033725.r6pfo5a4ayqisct7@awork3.anarazel.de>
Date:   Mon, 16 Oct 2023 20:37:25 -0700
From:   Andres Freund <andres@...razel.de>
To:     Theodore Ts'o <tytso@....edu>,
        Thorsten Leemhuis <regressions@...mhuis.info>
Cc:     Shreeya Patel <shreeya.patel@...labora.com>,
        linux-ext4@...r.kernel.org,
        Ricardo Cañuelo <ricardo.canuelo@...labora.com>,
        gustavo.padovan@...labora.com, zsm@...gle.com, garrick@...gle.com,
        Linux regressions mailing list <regressions@...ts.linux.dev>
Subject: task hung in ext4_fallocate #2

Hi,

As previously reported in https://lore.kernel.org/all/20231004004247.zkswbseowwwc6vvk@alap3.anarazel.de/
I found some hangs below ext4_fallocate(), in 6.6-rc*.  As it looks like my
issue was unrelated to the thread I had responded to, I was asked to start
this new thread.

I just was able to reproduce the issue, after upgrading to 6.6-rc6 - this time
it took ~55min of high load (io_uring using branch of postgres, running a
write heavy transactional workload concurrently with concurrent bulk data
load) to trigger the issue.

For now I have left the system running, in case there's something you would
like me to check while the system is hung.

The first hanging task that I observed:

cat /proc/57606/stack
[<0>] inode_dio_wait+0xd5/0x100
[<0>] ext4_fallocate+0x12f/0x1040
[<0>] vfs_fallocate+0x135/0x360
[<0>] __x64_sys_fallocate+0x42/0x70
[<0>] do_syscall_64+0x38/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0

[ 3194.579039] INFO: task postgres:57606 blocked for more than 122 seconds.
[ 3194.579057]       Not tainted 6.6.0-rc6-andres-00001-g01edcfe38260 #77
[ 3194.579064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3194.579069] task:postgres        state:D stack:0     pid:57606 ppid:52606  flags:0x00020002
[ 3194.579083] Call Trace:
[ 3194.579089]  <TASK>
[ 3194.579098]  __schedule+0x388/0x13e0
[ 3194.579121]  schedule+0x5f/0xe0
[ 3194.579133]  inode_dio_wait+0xd5/0x100
[ 3194.579148]  ? membarrier_register_private_expedited+0xa0/0xa0
[ 3194.579160]  ext4_fallocate+0x12f/0x1040
[ 3194.579171]  ? timerqueue_add+0x90/0xb0
[ 3194.579183]  ? rcu_accelerate_cbs+0x1d/0x70
[ 3194.579194]  vfs_fallocate+0x135/0x360
[ 3194.579204]  __x64_sys_fallocate+0x42/0x70
[ 3194.579212]  do_syscall_64+0x38/0x80
[ 3194.579224]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 3194.579236] RIP: 0033:0x7fb8db658f82
[ 3194.579245] RSP: 002b:00007fff86eda9c8 EFLAGS: 00000246 ORIG_RAX: 000000000000011d
[ 3194.579255] RAX: ffffffffffffffda RBX: 0000000000200000 RCX: 00007fb8db658f82
[ 3194.579263] RDX: 000000003fd9e000 RSI: 0000000000000000 RDI: 0000000000000072
[ 3194.579269] RBP: 000000003fd9e000 R08: 000000003fd9e000 R09: 0000559d2f22b530
[ 3194.579275] R10: 0000000000200000 R11: 0000000000000246 R12: 0000000000001538
[ 3194.579281] R13: 000000000a000013 R14: 0000000000000061 R15: 0000559d2dced070
[ 3194.579291]  </TASK>

In contrast to earlier iterations, I also got "task blocked" errors for
io_uring workers:

[ 3194.579297] INFO: task iou-wrk-58004:58874 blocked for more than 122 seconds.
[ 3194.579304]       Not tainted 6.6.0-rc6-andres-00001-g01edcfe38260 #77
[ 3194.579310] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3194.579314] task:iou-wrk-58004   state:D stack:0     pid:58874 ppid:52606  flags:0x00004000
[ 3194.579325] Call Trace:
[ 3194.579329]  <TASK>
[ 3194.579334]  __schedule+0x388/0x13e0
[ 3194.579349]  schedule+0x5f/0xe0
[ 3194.579361]  schedule_preempt_disabled+0x15/0x20
[ 3194.579374]  rwsem_down_read_slowpath+0x26e/0x4c0
[ 3194.579385]  down_read+0x44/0xa0
[ 3194.579393]  ext4_file_write_iter+0x432/0xa80
[ 3194.579407]  io_write+0x129/0x420
[ 3194.579421]  ? set_next_entity+0xc8/0x180
[ 3194.579433]  ? __switch_to_asm+0x3a/0x60
[ 3194.579443]  ? finish_task_switch.isra.0+0x87/0x2a0
[ 3194.579457]  ? __schedule+0x390/0x13e0
[ 3194.579469]  io_issue_sqe+0x60/0x3c0
[ 3194.579482]  io_wq_submit_work+0x88/0x2c0
[ 3194.579494]  io_worker_handle_work+0x156/0x560
[ 3194.579509]  io_wq_worker+0xf5/0x360
[ 3194.579522]  ? finish_task_switch.isra.0+0x87/0x2a0
[ 3194.579534]  ? io_worker_handle_work+0x560/0x560
[ 3194.579546]  ret_from_fork+0x2d/0x50
[ 3194.579557]  ? io_worker_handle_work+0x560/0x560
[ 3194.579569]  ret_from_fork_asm+0x11/0x20
[ 3194.579581]  </TASK>

This is repeated for several other identical workers with ppid 52606 with identical
stacks.

pid:58874 is a io uring worker thread for pid 58004, which has the following
kernel stack:
sudo cat /proc/58004/stack
[<0>] __do_sys_io_uring_enter+0x7bd/0xbf0
[<0>] do_syscall_64+0x38/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0


As before, there are no busy IOs according to /sys/kernel/debug/block/nvme1n1/hctx*/busy
nor are there IO errors or such in the kernel log.


Once I hear that you don't want me to test something out on the running
system, I think a sensible next step could be to compile with lockdep and see
if that finds a problem?

Greetings,

Andres Freund

View attachment ".config" of type "text/plain" (209410 bytes)