lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZFPWeOg5xJ7CbCD0@kbusch-mbp.dhcp.thefacebook.com>
Date:   Thu, 4 May 2023 09:59:52 -0600
From:   Keith Busch <kbusch@...nel.org>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        linux-block@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        Dave Chinner <dchinner@...hat.com>,
        Eric Sandeen <sandeen@...hat.com>,
        Christoph Hellwig <hch@....de>, Zhang Yi <yi.zhang@...hat.com>
Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages

On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote:
> Hello Guys,
> 
> I got one report in which buffered write IO hangs in balance_dirty_pages,
> after one nvme block device is unplugged physically, then umount can't
> succeed.
> 
> Turns out it is one long-term issue, and it can be triggered at least
> since v5.14 until the latest v6.3.
> 
> And the issue can be reproduced reliably in KVM guest:
> 
> 1) run the following script inside guest:
> 
> mkfs.ext4 -F /dev/nvme0n1
> mount /dev/nvme0n1 /mnt
> dd if=/dev/zero of=/mnt/z.img&
> sleep 10
> echo 1 > /sys/block/nvme0n1/device/device/remove
> 
> 2) dd hang is observed and /dev/nvme0n1 is gone actually

Sorry to jump in so late.

For an ungraceful nvme removal, like a surpirse hot unplug, the driver
sets the capacity to 0 and that effectively ends all dirty page writers
that could stall forward progress on the removal. And that 0 capacity
should also cause 'dd' to exit.

But this is not an ungraceful removal, so we're not getting that forced
behavior. Could we use the same capacity trick here after flushing any
outstanding dirty pages?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ