lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <96bbdb25-b420-67b1-d4c4-b838a5c70f9f@huaweicloud.com>
Date: Mon, 6 May 2024 19:44:44 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Dave Chinner <david@...morbit.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, tytso@....edu,
 adilger.kernel@...ger.ca, jack@...e.cz, ritesh.list@...il.com,
 hch@...radead.org, djwong@...nel.org, willy@...radead.org,
 zokeefe@...gle.com, yi.zhang@...wei.com, chengzhihao1@...wei.com,
 yukuai3@...wei.com, wangkefeng.wang@...wei.com
Subject: Re: [RFC PATCH v4 24/34] ext4: implement buffered write iomap path

On 2024/5/1 16:33, Dave Chinner wrote:
> On Wed, May 01, 2024 at 06:11:13PM +1000, Dave Chinner wrote:
>> On Wed, Apr 10, 2024 at 10:29:38PM +0800, Zhang Yi wrote:
>>> From: Zhang Yi <yi.zhang@...wei.com>
>>>
>>> Implement buffered write iomap path, use ext4_da_map_blocks() to map
>>> delalloc extents and add ext4_iomap_get_blocks() to allocate blocks if
>>> delalloc is disabled or free space is about to run out.
>>>
>>> Note that we always allocate unwritten extents for new blocks in the
>>> iomap write path, this means that the allocation type is no longer
>>> controlled by the dioread_nolock mount option. After that, we could
>>> postpone the i_disksize updating to the writeback path, and drop journal
>>> handle in the buffered dealloc write path completely.
> .....
>>> +/*
>>> + * Drop the staled delayed allocation range from the write failure,
>>> + * including both start and end blocks. If not, we could leave a range
>>> + * of delayed extents covered by a clean folio, it could lead to
>>> + * inaccurate space reservation.
>>> + */
>>> +static int ext4_iomap_punch_delalloc(struct inode *inode, loff_t offset,
>>> +				     loff_t length)
>>> +{
>>> +	ext4_es_remove_extent(inode, offset >> inode->i_blkbits,
>>> +			DIV_ROUND_UP_ULL(length, EXT4_BLOCK_SIZE(inode->i_sb)));
>>>  	return 0;
>>>  }
>>>  
>>> +static int ext4_iomap_buffered_write_end(struct inode *inode, loff_t offset,
>>> +					 loff_t length, ssize_t written,
>>> +					 unsigned int flags,
>>> +					 struct iomap *iomap)
>>> +{
>>> +	handle_t *handle;
>>> +	loff_t end;
>>> +	int ret = 0, ret2;
>>> +
>>> +	/* delalloc */
>>> +	if (iomap->flags & IOMAP_F_EXT4_DELALLOC) {
>>> +		ret = iomap_file_buffered_write_punch_delalloc(inode, iomap,
>>> +			offset, length, written, ext4_iomap_punch_delalloc);
>>> +		if (ret)
>>> +			ext4_warning(inode->i_sb,
>>> +			     "Failed to clean up delalloc for inode %lu, %d",
>>> +			     inode->i_ino, ret);
>>> +		return ret;
>>> +	}
>>
>> Why are you creating a delalloc extent for the write operation and
>> then immediately deleting it from the extent tree once the write
>> operation is done?
> 
> Ignore this, I mixed up the ext4_iomap_punch_delalloc() code
> directly above with iomap_file_buffered_write_punch_delalloc().
> 
> In hindsight, iomap_file_buffered_write_punch_delalloc() is poorly
> named, as it is handling a short write situation which requires
> newly allocated delalloc blocks to be punched.
> iomap_file_buffered_write_finish() would probably be a better name
> for it....
> 
>> Also, why do you need IOMAP_F_EXT4_DELALLOC? Isn't a delalloc iomap
>> set up with iomap->type = IOMAP_DELALLOC? Why can't that be used?
> 
> But this still stands - the first thing
> iomap_file_buffered_write_punch_delalloc() is:
> 
> 	if (iomap->type != IOMAP_DELALLOC)
>                 return 0;
> 

Thanks for the suggestion, the delalloc and non-delalloc write paths
share the same ->iomap_end() now (i.e. ext4_iomap_buffered_write_end()),
I use the IOMAP_F_EXT4_DELALLOC to identify the write path. For
non-delalloc path, If we have allocated more blocks and copied less, we
should truncate extra blocks that newly allocated by ->iomap_begin().
If we use IOMAP_DELALLOC, we can't tell if the blocks are pre-existing
or newly allocated, we can't truncate the pre-existing blocks, so I have
to introduce IOMAP_F_EXT4_DELALLOC. But if we split the delalloc and
non-delalloc handler, we could drop IOMAP_F_EXT4_DELALLOC.

I also checked xfs, IIUC, xfs doesn't free the extra blocks beyond EOF
in xfs_buffered_write_iomap_end() for non-delalloc case since they will
be freed by xfs_free_eofblocks in some other inactive paths, like
xfs_release()/xfs_inactive()/..., is that right?

Thanks,
Yi.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ