linux-kernel - Re: [PATCH 1/2] fs: New zonefs file system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <32e3418b-727e-3018-1b8a-0530608fb34d@suse.de>
Date:   Tue, 17 Dec 2019 08:28:33 +0100
From:   Hannes Reinecke <hare@...e.de>
To:     Damien Le Moal <Damien.LeMoal@....com>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Johannes Thumshirn <jth@...nel.org>,
        Naohiro Aota <Naohiro.Aota@....com>,
        "Darrick J . Wong" <darrick.wong@...cle.com>
Subject: Re: [PATCH 1/2] fs: New zonefs file system

On 12/17/19 1:20 AM, Damien Le Moal wrote:
> On 2019/12/16 17:36, Hannes Reinecke wrote:
> [...]
>>> +static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>>> +			      unsigned int flags, struct iomap *iomap,
>>> +			      struct iomap *srcmap)
>>> +{
>>> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
>>> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
>>> +	loff_t max_isize = zi->i_max_size;
>>> +	loff_t isize;
>>> +
>>> +	/*
>>> +	 * For sequential zones, enforce direct IO writes. This is already
>>> +	 * checked when writes are issued, so warn about this here if we
>>> +	 * get buffered write to a sequential file inode.
>>> +	 */
>>> +	if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
>>> +			 (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT)))
>>> +		return -EIO;
>>> +
>>> +	/*
>>> +	 * For all zones, all blocks are always mapped. For sequential zones,
>>> +	 * all blocks after the write pointer (inode size) are always unwritten.
>>> +	 */
>>> +	mutex_lock(&zi->i_truncate_mutex);
>>> +	isize = i_size_read(inode);
>>> +	if (offset >= isize) {
>>> +		length = min(length, max_isize - offset);
>>> +		if (zi->i_ztype == ZONEFS_ZTYPE_CNV)
>>> +			iomap->type = IOMAP_MAPPED;
>>> +		else
>>> +			iomap->type = IOMAP_UNWRITTEN;
>>> +	} else {
>>> +		length = min(length, isize - offset);
>>> +		iomap->type = IOMAP_MAPPED;
>>> +	}
>>> +	mutex_unlock(&zi->i_truncate_mutex);
>>> +
>>> +	iomap->offset = offset & (~sbi->s_blocksize_mask);
>>> +	iomap->length = ((offset + length + sbi->s_blocksize_mask) &
>>> +			 (~sbi->s_blocksize_mask)) - iomap->offset;
>>> +	iomap->bdev = inode->i_sb->s_bdev;
>>> +	iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static const struct iomap_ops zonefs_iomap_ops = {
>>> +	.iomap_begin	= zonefs_iomap_begin,
>>> +};
>>> +
>> This probably shows my complete ignorance, but what is the effect on
>> enforcing the direct I/O writes on the pagecache?
>> IE what happens for buffered reads? Will the pages be invalidated when a
>> write has been issued?
> 
> Yes, a direct write issued to a file range that has cached pages result
> in these pages to be invalidated. But note that in the case of zonefs,
> this can happen only in the case of conventional zones. For sequential
> zones, this does not happen: reads can be buffered and cache pages but
> only for pages below the write pointer. And writes can only be issued at
> the write pointer. So there is never any possible overlap between
> buffered reads and direct writes.
> 
Oh, indeed, you are correct. That's indeed easy then.

>> Or do we simply rely on upper layers to ensure no concurrent buffered
>> and direct I/O is being made?
> 
> Nope. VFS, or the file system specific implementation, takes care of
> that. See generic_file_direct_write() and its call to
> invalidate_inode_pages2_range().
> 
Of course.
One could even say: not applicable, as it won't happen.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@...e.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer