linux-kernel - Re: [PATCH 1/2] fs: New zonefs file system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BYAPR04MB5816D17D0A14D5651E37F700E7500@BYAPR04MB5816.namprd04.prod.outlook.com>
Date:   Tue, 17 Dec 2019 00:20:07 +0000
From:   Damien Le Moal <Damien.LeMoal@....com>
To:     Hannes Reinecke <hare@...e.de>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Johannes Thumshirn <jth@...nel.org>,
        Naohiro Aota <Naohiro.Aota@....com>,
        "Darrick J . Wong" <darrick.wong@...cle.com>
Subject: Re: [PATCH 1/2] fs: New zonefs file system

On 2019/12/16 17:36, Hannes Reinecke wrote:
[...]
>> +static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>> +			      unsigned int flags, struct iomap *iomap,
>> +			      struct iomap *srcmap)
>> +{
>> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
>> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
>> +	loff_t max_isize = zi->i_max_size;
>> +	loff_t isize;
>> +
>> +	/*
>> +	 * For sequential zones, enforce direct IO writes. This is already
>> +	 * checked when writes are issued, so warn about this here if we
>> +	 * get buffered write to a sequential file inode.
>> +	 */
>> +	if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
>> +			 (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT)))
>> +		return -EIO;
>> +
>> +	/*
>> +	 * For all zones, all blocks are always mapped. For sequential zones,
>> +	 * all blocks after the write pointer (inode size) are always unwritten.
>> +	 */
>> +	mutex_lock(&zi->i_truncate_mutex);
>> +	isize = i_size_read(inode);
>> +	if (offset >= isize) {
>> +		length = min(length, max_isize - offset);
>> +		if (zi->i_ztype == ZONEFS_ZTYPE_CNV)
>> +			iomap->type = IOMAP_MAPPED;
>> +		else
>> +			iomap->type = IOMAP_UNWRITTEN;
>> +	} else {
>> +		length = min(length, isize - offset);
>> +		iomap->type = IOMAP_MAPPED;
>> +	}
>> +	mutex_unlock(&zi->i_truncate_mutex);
>> +
>> +	iomap->offset = offset & (~sbi->s_blocksize_mask);
>> +	iomap->length = ((offset + length + sbi->s_blocksize_mask) &
>> +			 (~sbi->s_blocksize_mask)) - iomap->offset;
>> +	iomap->bdev = inode->i_sb->s_bdev;
>> +	iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset;
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct iomap_ops zonefs_iomap_ops = {
>> +	.iomap_begin	= zonefs_iomap_begin,
>> +};
>> +
> This probably shows my complete ignorance, but what is the effect on 
> enforcing the direct I/O writes on the pagecache?
> IE what happens for buffered reads? Will the pages be invalidated when a 
> write has been issued?

Yes, a direct write issued to a file range that has cached pages result
in these pages to be invalidated. But note that in the case of zonefs,
this can happen only in the case of conventional zones. For sequential
zones, this does not happen: reads can be buffered and cache pages but
only for pages below the write pointer. And writes can only be issued at
the write pointer. So there is never any possible overlap between
buffered reads and direct writes.

> Or do we simply rely on upper layers to ensure no concurrent buffered 
> and direct I/O is being made?

Nope. VFS, or the file system specific implementation, takes care of
that. See generic_file_direct_write() and its call to
invalidate_inode_pages2_range().

> 
> [ .. ]
>> +
>> +static int zonefs_seq_file_truncate(struct inode *inode, loff_t isize)
>> +{
>> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
>> +	loff_t old_isize;
>> +	enum req_opf op;
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * For sequential zone files, we can only allow truncating to 0 size,
>> +	 * which is equivalent to a zone reset, or to the maximum file size,
>> +	 * which is equivalent toa zone finish.
> 
> Spelling: to a

Good catch. Will fix it. Thanks.

> 
> [ .. ]
> 
> Other than that:
> Reviewed-by: Hannes Reinecke <hare@...e.de>
> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research