lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 17 Dec 2019 00:20:07 +0000
From:   Damien Le Moal <Damien.LeMoal@....com>
To:     Hannes Reinecke <hare@...e.de>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Johannes Thumshirn <jth@...nel.org>,
        Naohiro Aota <Naohiro.Aota@....com>,
        "Darrick J . Wong" <darrick.wong@...cle.com>
Subject: Re: [PATCH 1/2] fs: New zonefs file system

On 2019/12/16 17:36, Hannes Reinecke wrote:
[...]
>> +static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>> +			      unsigned int flags, struct iomap *iomap,
>> +			      struct iomap *srcmap)
>> +{
>> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
>> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
>> +	loff_t max_isize = zi->i_max_size;
>> +	loff_t isize;
>> +
>> +	/*
>> +	 * For sequential zones, enforce direct IO writes. This is already
>> +	 * checked when writes are issued, so warn about this here if we
>> +	 * get buffered write to a sequential file inode.
>> +	 */
>> +	if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
>> +			 (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT)))
>> +		return -EIO;
>> +
>> +	/*
>> +	 * For all zones, all blocks are always mapped. For sequential zones,
>> +	 * all blocks after the write pointer (inode size) are always unwritten.
>> +	 */
>> +	mutex_lock(&zi->i_truncate_mutex);
>> +	isize = i_size_read(inode);
>> +	if (offset >= isize) {
>> +		length = min(length, max_isize - offset);
>> +		if (zi->i_ztype == ZONEFS_ZTYPE_CNV)
>> +			iomap->type = IOMAP_MAPPED;
>> +		else
>> +			iomap->type = IOMAP_UNWRITTEN;
>> +	} else {
>> +		length = min(length, isize - offset);
>> +		iomap->type = IOMAP_MAPPED;
>> +	}
>> +	mutex_unlock(&zi->i_truncate_mutex);
>> +
>> +	iomap->offset = offset & (~sbi->s_blocksize_mask);
>> +	iomap->length = ((offset + length + sbi->s_blocksize_mask) &
>> +			 (~sbi->s_blocksize_mask)) - iomap->offset;
>> +	iomap->bdev = inode->i_sb->s_bdev;
>> +	iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset;
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct iomap_ops zonefs_iomap_ops = {
>> +	.iomap_begin	= zonefs_iomap_begin,
>> +};
>> +
> This probably shows my complete ignorance, but what is the effect on 
> enforcing the direct I/O writes on the pagecache?
> IE what happens for buffered reads? Will the pages be invalidated when a 
> write has been issued?

Yes, a direct write issued to a file range that has cached pages result
in these pages to be invalidated. But note that in the case of zonefs,
this can happen only in the case of conventional zones. For sequential
zones, this does not happen: reads can be buffered and cache pages but
only for pages below the write pointer. And writes can only be issued at
the write pointer. So there is never any possible overlap between
buffered reads and direct writes.

> Or do we simply rely on upper layers to ensure no concurrent buffered 
> and direct I/O is being made?

Nope. VFS, or the file system specific implementation, takes care of
that. See generic_file_direct_write() and its call to
invalidate_inode_pages2_range().

> 
> [ .. ]
>> +
>> +static int zonefs_seq_file_truncate(struct inode *inode, loff_t isize)
>> +{
>> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
>> +	loff_t old_isize;
>> +	enum req_opf op;
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * For sequential zone files, we can only allow truncating to 0 size,
>> +	 * which is equivalent to a zone reset, or to the maximum file size,
>> +	 * which is equivalent toa zone finish.
> 
> Spelling: to a

Good catch. Will fix it. Thanks.

> 
> [ .. ]
> 
> Other than that:
> Reviewed-by: Hannes Reinecke <hare@...e.de>
> 
> Cheers,
> 
> Hannes
> 


-- 
Damien Le Moal
Western Digital Research

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ