[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZEnb7KuOWmu5P+V9@ovpn-8-24.pek2.redhat.com>
Date: Thu, 27 Apr 2023 10:20:28 +0800
From: Ming Lei <ming.lei@...hat.com>
To: Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org
Cc: ming.lei@...hat.com, Andreas Dilger <adilger.kernel@...ger.ca>,
linux-block@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
Dave Chinner <dchinner@...hat.com>,
Eric Sandeen <sandeen@...hat.com>,
Christoph Hellwig <hch@....de>, Zhang Yi <yi.zhang@...hat.com>
Subject: [ext4 io hang] buffered write io hang in balance_dirty_pages
Hello Guys,
I got one report in which buffered write IO hangs in balance_dirty_pages,
after one nvme block device is unplugged physically, then umount can't
succeed.
Turns out it is one long-term issue, and it can be triggered at least
since v5.14 until the latest v6.3.
And the issue can be reproduced reliably in KVM guest:
1) run the following script inside guest:
mkfs.ext4 -F /dev/nvme0n1
mount /dev/nvme0n1 /mnt
dd if=/dev/zero of=/mnt/z.img&
sleep 10
echo 1 > /sys/block/nvme0n1/device/device/remove
2) dd hang is observed and /dev/nvme0n1 is gone actually
[root@...st-09 ~]# ps -ax | grep dd
1348 pts/0 D 0:33 dd if=/dev/zero of=/mnt/z.img
1365 pts/0 S+ 0:00 grep --color=auto dd
[root@...st-09 ~]# cat /proc/1348/stack
[<0>] balance_dirty_pages+0x649/0x2500
[<0>] balance_dirty_pages_ratelimited_flags+0x4c6/0x5d0
[<0>] generic_perform_write+0x310/0x4c0
[<0>] ext4_buffered_write_iter+0x130/0x2c0 [ext4]
[<0>] new_sync_write+0x28e/0x4a0
[<0>] vfs_write+0x62a/0x920
[<0>] ksys_write+0xf9/0x1d0
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[root@...st-09 ~]# lsblk | grep nvme
[root@...st-09 ~]#
BTW, my VM sets 2G ram, and the nvme disk size is 40GB.
So far only observed on ext4 FS, not see it on XFS. I guess it isn't
related with disk type, and not tried such test on other type of disks yet,
but will do.
Seems like dirty pages aren't cleaned after ext4 bio is failed in this
situation?
Thanks,
Ming
Powered by blists - more mailing lists