lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <CCFA457F-E115-47F0-87F1-F64A51BDE96C@flyingcircus.io>
Date: Fri, 11 Oct 2024 15:50:12 +0200
From: Christian Theune <ct@...ingcircus.io>
To: Chris Mason <clm@...a.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
 Dave Chinner <david@...morbit.com>,
 Matthew Wilcox <willy@...radead.org>,
 Jens Axboe <axboe@...nel.dk>,
 linux-mm@...ck.org,
 "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
 linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org,
 Daniel Dao <dqminh@...udflare.com>,
 regressions@...ts.linux.dev,
 regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
 folios since Dec 2021 (any kernel from 6.1 upwards)

Hi,

> On 11. Oct 2024, at 15:06, Chris Mason <clm@...a.com> wrote:
> 
> - It's actually taking the IO a long time to finish.  We can poke at the
> pending requests, how does the device look in the VM?  (virtio, scsi etc).

I _think_ that’s not it. This is a Qemu w/ virtio-block + Ceph stack with 2x10G and fully SSD backed. The last 24 hours show operation latency at less than 0.016ms. Ceph’s slow request warning (30s limit) has not triggered in the last 24 hours.

Also, aside from a VM that was exhausting its Qemu io throttling for a minute (and stuck in completely different tracebacks) the only blocked task reports from the last 48 hours was this specific process.

I’d expect that we’d see a lot more reports about IO issues from multiple VMs and multiple loads at the same time when the storage misbehaves (we did experience those in the long long past in older Ceph versions and with spinning rust, so I’m pretty confident (at the moment) this isn’t a storage issue per se).

Incidentally this now reminds me of a different (maybe not?) issue that I’ve been trying to track down with mdraid/xfs:
https://marc.info/?l=linux-raid&m=172295385102939&w=2

This is only tested on an older kernel so far (5.15.138) and we ended up seeing IOPS stuck in the md device but not below it. However, MD isn’t involved here. I made the connection because the original traceback also shows it stuck in “wait_on_page_writeback”, but maybe that’s a red herring:

[Aug 6 09:35] INFO: task .backy-wrapped:2615 blocked for more than 122 seconds.
[ +0.008130] Not tainted 5.15.138 #1-NixOS
[ +0.005194] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.008895] task:.backy-wrapped state:D stack: 0 pid: 2615 ppid: 1 flags:0x00000002
[ +0.000005] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] __schedule+0x373/0x1580
[ +0.000009] ? xlog_cil_commit+0x559/0x880 [xfs]
[ +0.000041] schedule+0x5b/0xe0
[ +0.000001] io_schedule+0x42/0x70
[ +0.000001] wait_on_page_bit_common+0x119/0x380
[ +0.000005] ? __page_cache_alloc+0x80/0x80
[ +0.000002] wait_on_page_writeback+0x22/0x70
[ +0.000001] truncate_inode_pages_range+0x26f/0x6d0
[ +0.000006] evict+0x15f/0x180
[ +0.000003] __dentry_kill+0xde/0x170
[ +0.000001] dput+0x15b/0x330
[ +0.000002] do_renameat2+0x34e/0x5b0
[ +0.000003] __x64_sys_rename+0x3f/0x50
[ +0.000002] do_syscall_64+0x3a/0x90
[ +0.000002] entry_SYSCALL_64_after_hwframe+0x62/0xcc
[ +0.000003] RIP: 0033:0x7fdd1885275b
[ +0.000002] RSP: 002b:00007ffde643ad18 EFLAGS: 00000246 ORIG_RAX: 0000000000000052
[ +0.000002] RAX: ffffffffffffffda RBX: 00007ffde643adb0 RCX: 00007fdd1885275b
[ +0.000001] RDX: 0000000000000000 RSI: 00007fdd09a3d3d0 RDI: 00007fdd098549d0
[ +0.000001] RBP: 00007ffde643ad60 R08: 00000000ffffffff R09: 0000000000000000
[ +0.000001] R10: 00007ffde643af90 R11: 0000000000000246 R12: 00000000ffffff9c
[ +0.000000] R13: 00000000ffffff9c R14: 000000000183cab0 R15: 00007fdd0b128810
[ +0.000001] </TASK>
[ +0.000011] INFO: task kworker/u64:0:2380262 blocked for more than 122 seconds.
[ +0.008309] Not tainted 5.15.138 #1-NixOS
[ +0.005190] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.008895] task:kworker/u64:0 state:D stack: 0 pid:2380262 ppid: 2 flags:0x00004000
[ +0.000004] Workqueue: kcryptd/253:4 kcryptd_crypt [dm_crypt]
[ +0.000006] Call Trace:
[ +0.000001] <TASK>
[ +0.000001] __schedule+0x373/0x1580
[ +0.000003] schedule+0x5b/0xe0
[ +0.000001] md_bitmap_startwrite+0x177/0x1e0
[ +0.000004] ? finish_wait+0x90/0x90
[ +0.000004] add_stripe_bio+0x449/0x770 [raid456]
[ +0.000005] raid5_make_request+0x1cf/0xbd0 [raid456]
[ +0.000003] ? kmem_cache_alloc_node_trace+0x391/0x3e0
[ +0.000004] ? linear_map+0x44/0x90 [dm_mod]
[ +0.000005] ? finish_wait+0x90/0x90
[ +0.000001] ? __blk_queue_split+0x516/0x580
[ +0.000003] md_handle_request+0x122/0x1b0
[ +0.000003] md_submit_bio+0x6e/0xb0
[ +0.000001] __submit_bio+0x18f/0x220
[ +0.000002] ? crypt_page_alloc+0x46/0x60 [dm_crypt]
[ +0.000002] submit_bio_noacct+0xbe/0x2d0
[ +0.000001] kcryptd_crypt+0x392/0x550 [dm_crypt]
[ +0.000002] process_one_work+0x1d6/0x360
[ +0.000003] worker_thread+0x4d/0x3b0
[ +0.000002] ? process_one_work+0x360/0x360
[ +0.000001] kthread+0x118/0x140
[ +0.000001] ? set_kthread_struct+0x50/0x50
[ +0.000001] ret_from_fork+0x22/0x30
[ +0.000004] </TASK>
…(more md kworker tasks pile up here)

Christian

-- 
Christian Theune · ct@...ingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ