linux-kernel - Re: [PATCH] f2fs: use flush command instead of FUA for zoned device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <42e10758-e50a-7aaa-dfa9-dcf6338ebaff@opensource.wdc.com>
Date:   Thu, 21 Apr 2022 17:43:55 +0900
From:   Damien Le Moal <damien.lemoal@...nsource.wdc.com>
To:     Jaegeuk Kim <jaegeuk@...nel.org>, linux-kernel@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net
Subject: Re: [PATCH] f2fs: use flush command instead of FUA for zoned device

On 4/20/22 06:57, Jaegeuk Kim wrote:
> The block layer for zoned disk can reorder the FUA'ed IOs. Let's use flush
> command to keep the write order.

Stricktly speaking, for a request that has data, the problem is triggered
by REQ_PREFLUSH since in this case the request does not go through the
scheduler and is processed through the blk-flush machinery. REQ_FUA on its
own should not matter if the device supports it. If the device does not
support FUA, then the same problem can happen due to POSTFLUSH (again no
scheduler).

Bypassing the scheduler leads to the write not write-locking the zone,
which leads to reordering... Completely overlooked that case when the zone
write locking was implemented.

Ideally, the FS should not have to care about this. blk-flush machinery
should be a little more intelligent and process the write phase of the
request using the scheduler. Need to look into that.

> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@...nel.org>
> ---
>  fs/f2fs/file.c | 4 +++-
>  fs/f2fs/node.c | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f08e6208e183..2aef0632f35b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -372,7 +372,9 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
>  	f2fs_remove_ino_entry(sbi, ino, APPEND_INO);
>  	clear_inode_flag(inode, FI_APPEND_WRITE);
>  flush_out:
> -	if (!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER)
> +	if ((!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER) ||
> +			(atomic && !test_opt(sbi, NOBARRIER) &&
> +					f2fs_sb_has_blkzoned(sbi)))

Aligning the conditions and not breaking the second line would make this a
lot easier to read...

>  		ret = f2fs_issue_flush(sbi, inode->i_ino);
>  	if (!ret) {
>  		f2fs_remove_ino_entry(sbi, ino, UPDATE_INO);
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index c280f482c741..7224a980056f 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1633,7 +1633,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted,
>  		goto redirty_out;
>  	}
>  
> -	if (atomic && !test_opt(sbi, NOBARRIER))
> +	if (atomic && !test_opt(sbi, NOBARRIER) && !f2fs_sb_has_blkzoned(sbi))
>  		fio.op_flags |= REQ_PREFLUSH | REQ_FUA;

Is this really OK to do ? flush + write as different operations may not
lead to the same result as a preflush+fua write.

Until the block layer is fixed to properly handle this, a simpler fix for
f2fs would be to force enable the NOBARRIER option for zoned drives ? That
would avoid these changes no ?

Also, with all the testing we do on SMR disks and f2fs (smaller, older SMR
disks due to the 16TB limit), we never have triggered this problem. How
did you trigger it ?

>  
>  	/* should add to global list before clearing PAGECACHE status */

-- 
Damien Le Moal
Western Digital Research