lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZEnb7KuOWmu5P+V9@ovpn-8-24.pek2.redhat.com>
Date:   Thu, 27 Apr 2023 10:20:28 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org
Cc:     ming.lei@...hat.com, Andreas Dilger <adilger.kernel@...ger.ca>,
        linux-block@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        Dave Chinner <dchinner@...hat.com>,
        Eric Sandeen <sandeen@...hat.com>,
        Christoph Hellwig <hch@....de>, Zhang Yi <yi.zhang@...hat.com>
Subject: [ext4 io hang] buffered write io hang in balance_dirty_pages

Hello Guys,

I got one report in which buffered write IO hangs in balance_dirty_pages,
after one nvme block device is unplugged physically, then umount can't
succeed.

Turns out it is one long-term issue, and it can be triggered at least
since v5.14 until the latest v6.3.

And the issue can be reproduced reliably in KVM guest:

1) run the following script inside guest:

mkfs.ext4 -F /dev/nvme0n1
mount /dev/nvme0n1 /mnt
dd if=/dev/zero of=/mnt/z.img&
sleep 10
echo 1 > /sys/block/nvme0n1/device/device/remove

2) dd hang is observed and /dev/nvme0n1 is gone actually

[root@...st-09 ~]# ps -ax | grep dd
   1348 pts/0    D      0:33 dd if=/dev/zero of=/mnt/z.img
   1365 pts/0    S+     0:00 grep --color=auto dd

[root@...st-09 ~]# cat /proc/1348/stack
[<0>] balance_dirty_pages+0x649/0x2500
[<0>] balance_dirty_pages_ratelimited_flags+0x4c6/0x5d0
[<0>] generic_perform_write+0x310/0x4c0
[<0>] ext4_buffered_write_iter+0x130/0x2c0 [ext4]
[<0>] new_sync_write+0x28e/0x4a0
[<0>] vfs_write+0x62a/0x920
[<0>] ksys_write+0xf9/0x1d0
[<0>] do_syscall_64+0x59/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd


[root@...st-09 ~]# lsblk | grep nvme
[root@...st-09 ~]#

BTW, my VM sets 2G ram, and the nvme disk size is 40GB.

So far only observed on ext4 FS, not see it on XFS. I guess it isn't
related with disk type, and not tried such test on other type of disks yet,
but will do.

Seems like dirty pages aren't cleaned after ext4 bio is failed in this
situation?


Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ