linux-ext4 - Re: [ext4 io hang] buffered write io hang in balance_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZEpH+GEj33aUGoAD@ovpn-8-26.pek2.redhat.com>
Date:   Thu, 27 Apr 2023 18:01:28 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     Baokun Li <libaokun1@...wei.com>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        linux-block@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        Dave Chinner <dchinner@...hat.com>,
        Eric Sandeen <sandeen@...hat.com>,
        Christoph Hellwig <hch@....de>, Zhang Yi <yi.zhang@...hat.com>,
        yangerkun <yangerkun@...wei.com>, ming.lei@...hat.com
Subject: Re: [ext4 io hang] buffered write io hang in balance_dirty_pages

On Thu, Apr 27, 2023 at 02:36:51PM +0800, Baokun Li wrote:
> On 2023/4/27 12:50, Ming Lei wrote:
> > Hello Matthew,
> > 
> > On Thu, Apr 27, 2023 at 04:58:36AM +0100, Matthew Wilcox wrote:
> > > On Thu, Apr 27, 2023 at 10:20:28AM +0800, Ming Lei wrote:
> > > > Hello Guys,
> > > > 
> > > > I got one report in which buffered write IO hangs in balance_dirty_pages,
> > > > after one nvme block device is unplugged physically, then umount can't
> > > > succeed.
> > > That's a feature, not a bug ... the dd should continue indefinitely?
> > Can you explain what the feature is? And not see such 'issue' or 'feature'
> > on xfs.
> > 
> > The device has been gone, so IMO it is reasonable to see FS buffered write IO
> > failed. Actually dmesg has shown that 'EXT4-fs (nvme0n1): Remounting
> > filesystem read-only'. Seems these things may confuse user.
> 
> 
> The reason for this difference is that ext4 and xfs handle errors
> differently.
> 
> ext4 remounts the filesystem as read-only or even just continues, vfs_write
> does not check for these.

vfs_write may not find anything wrong, but ext4 remount could see that
disk is gone, which might happen during or after remount, however.

> 
> xfs shuts down the filesystem, so it returns a failure at
> xfs_file_write_iter when it finds an error.
> 
> 
> ``` ext4
> ksys_write
>  vfs_write
>   ext4_file_write_iter
>    ext4_buffered_write_iter
>     ext4_write_checks
>      file_modified
>       file_modified_flags
>        __file_update_time
>         inode_update_time
>          generic_update_time
>           __mark_inode_dirty
>            ext4_dirty_inode ---> 2. void func, No propagating errors out
>             __ext4_journal_start_sb
>              ext4_journal_check_start ---> 1. Error found, remount-ro
>     generic_perform_write ---> 3. No error sensed, continue
>      balance_dirty_pages_ratelimited
>       balance_dirty_pages_ratelimited_flags
>        balance_dirty_pages
>         // 4. Sleeping waiting for dirty pages to be freed
>         __set_current_state(TASK_KILLABLE)
>         io_schedule_timeout(pause);
> ```
> 
> ``` xfs
> ksys_write
>  vfs_write
>   xfs_file_write_iter
>    if (xfs_is_shutdown(ip->i_mount))
>      return -EIO;    ---> dd fail
> ```

Thanks for the info which is really helpful for me to understand the
problem.

> > > balance_dirty_pages() is sleeping in KILLABLE state, so kill -9 of
> > > the dd process should succeed.
> > Yeah, dd can be killed, however it may be any application(s), :-)
> > 
> > Fortunately it won't cause trouble during reboot/power off, given
> > userspace will be killed at that time.
> > 
> > 
> > 
> > Thanks,
> > Ming
> > 
> Don't worry about that, we always set the current thread to TASK_KILLABLE
> 
> while waiting in balance_dirty_pages().

I have another concern, if 'dd' isn't killed, dirty pages won't be cleaned, and
these (big amount)memory becomes not usable, and typical scenario could be USB HDD
unplugged.


thanks,
Ming