linux-kernel - Re: [PATCH] f2fs: use flush command instead of FUA for zoned device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YmF2Nqu8Rtc4cx52@google.com>
Date:   Thu, 21 Apr 2022 08:20:22 -0700
From:   Jaegeuk Kim <jaegeuk@...nel.org>
To:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>
Cc:     linux-kernel@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net
Subject: Re: [PATCH] f2fs: use flush command instead of FUA for zoned device

On 04/21, Damien Le Moal wrote:
> On 4/20/22 06:57, Jaegeuk Kim wrote:
> > The block layer for zoned disk can reorder the FUA'ed IOs. Let's use flush
> > command to keep the write order.
> 
> Stricktly speaking, for a request that has data, the problem is triggered
> by REQ_PREFLUSH since in this case the request does not go through the
> scheduler and is processed through the blk-flush machinery. REQ_FUA on its
> own should not matter if the device supports it. If the device does not
> support FUA, then the same problem can happen due to POSTFLUSH (again no
> scheduler).

I think the problem is a piggy-backed data along with flush or fua whatever,
but this made me use a separate flush command.

> 
> Bypassing the scheduler leads to the write not write-locking the zone,
> which leads to reordering... Completely overlooked that case when the zone
> write locking was implemented.
> 
> Ideally, the FS should not have to care about this. blk-flush machinery
> should be a little more intelligent and process the write phase of the
> request using the scheduler. Need to look into that.

Please. I'm okay to revert this, once the block layer supports.

> 
> > 
> > Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org>
> > ---
> >  fs/f2fs/file.c | 4 +++-
> >  fs/f2fs/node.c | 2 +-
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index f08e6208e183..2aef0632f35b 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -372,7 +372,9 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
> >  	f2fs_remove_ino_entry(sbi, ino, APPEND_INO);
> >  	clear_inode_flag(inode, FI_APPEND_WRITE);
> >  flush_out:
> > -	if (!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER)
> > +	if ((!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER) ||
> > +			(atomic && !test_opt(sbi, NOBARRIER) &&
> > +					f2fs_sb_has_blkzoned(sbi)))
> 
> Aligning the conditions and not breaking the second line would make this a
> lot easier to read...

Sure.

> 
> >  		ret = f2fs_issue_flush(sbi, inode->i_ino);
> >  	if (!ret) {
> >  		f2fs_remove_ino_entry(sbi, ino, UPDATE_INO);
> > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > index c280f482c741..7224a980056f 100644
> > --- a/fs/f2fs/node.c
> > +++ b/fs/f2fs/node.c
> > @@ -1633,7 +1633,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted,
> >  		goto redirty_out;
> >  	}
> >  
> > -	if (atomic && !test_opt(sbi, NOBARRIER))
> > +	if (atomic && !test_opt(sbi, NOBARRIER) && !f2fs_sb_has_blkzoned(sbi))
> >  		fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
> 
> Is this really OK to do ? flush + write as different operations may not
> lead to the same result as a preflush+fua write.
> 
> Until the block layer is fixed to properly handle this, a simpler fix for
> f2fs would be to force enable the NOBARRIER option for zoned drives ? That
> would avoid these changes no ?

No, it will hurt the stability of FS metadata consistency.

> 
> Also, with all the testing we do on SMR disks and f2fs (smaller, older SMR
> disks due to the 16TB limit), we never have triggered this problem. How
> did you trigger it ?

This happens in Android only, since atomic_write for sqlite is taking this path.

> 
> >  
> >  	/* should add to global list before clearing PAGECACHE status */
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research