lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff8abd75-b325-4f93-a838-0020e97e0b25@huaweicloud.com>
Date: Wed, 18 Dec 2024 15:10:36 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
 jack@...e.cz, yi.zhang@...wei.com, chengzhihao1@...wei.com,
 yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v4 03/10] ext4: don't write back data before punch hole in
 nojournal mode

On 2024/12/17 22:31, Ojaswin Mujoo wrote:
> On Mon, Dec 16, 2024 at 09:39:08AM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@...wei.com>
>>
>> There is no need to write back all data before punching a hole in
>> non-journaled mode since it will be dropped soon after removing space.
>> Therefore, the call to filemap_write_and_wait_range() can be eliminated.
> 
> Hi, sorry I'm a bit late to this however following the discussion here
> [1], I believe the initial concern was that we don't in PATCH v1 01/10 
> was that after truncating the pagecache, the ext4_alloc_file_blocks()
> call might fail with errors like EIO, ENOMEM etc leading to inconsistent
> data. 
> 
> Is my understanding correct that  we realised that these are very rare
> cases and are not worth the performance penalty of writeback? In which
> case, is it really okay to just let the scope for corruption exist even
> though its rare. There might be some other error cases we might be
> missing which might be more easier to hit. For eg I think we can also
> fail ext4_alloc_file_blocks() with ENOSPC in case there is a written to
> unwritten extent conversion causing an extent split leading to  extent
> tree node allocation. (Maybe can be avoided by using PRE_IO with
> EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT in the first ext4_alloc_file_blocks() call)
> 
> So does it make sense to retain the writeback behavior or am I just
> being paranoid :) 
> 

Hi, Ojaswin!

Yeah, from my point of view, ENOSPC could happen, and it may be more
likely to happen if we intentionally create conditions for it. However,
all the efforts we can make at this point are merely best efforts and
reduce the probability. We cannot 100% guarantee it will not happen,
even if we write back the whole range before manipulating extents and
blocks. This is because we do not accurately reserve space for extents
split. Additionally, In ext4_punch_hole(), we have used 'nofail' flag
while freeing blocks to reduce the possibility of ENOSPC. So I suppose
it's fine by now, but we may need to implement additional measures if
we truly want to resolve the issue completely.

Thanks,
Yi.

> 
>> Besides, similar to ext4_zero_range(), we must address the case of
>> partially punched folios when block size < page size. It is essential to
>> remove writable userspace mappings to ensure that the folio can be
>> faulted again during subsequent mmap write access.
>>
>> In journaled mode, we need to write dirty pages out before discarding
>> page cache in case of crash before committing the freeing data
>> transaction, which could expose old, stale data, even if synchronization
>> has been performed.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>> ---
>>  fs/ext4/inode.c | 18 +++++-------------
>>  1 file changed, 5 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index bf735d06b621..a5ba2b71d508 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -4018,17 +4018,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>  
>>  	trace_ext4_punch_hole(inode, offset, length, 0);
>>  
>> -	/*
>> -	 * Write out all dirty pages to avoid race conditions
>> -	 * Then release them.
>> -	 */
>> -	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
>> -		ret = filemap_write_and_wait_range(mapping, offset,
>> -						   offset + length - 1);
>> -		if (ret)
>> -			return ret;
>> -	}
>> -
>>  	inode_lock(inode);
>>  
>>  	/* No need to punch hole beyond i_size */
>> @@ -4090,8 +4079,11 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
>>  		ret = ext4_update_disksize_before_punch(inode, offset, length);
>>  		if (ret)
>>  			goto out_dio;
>> -		truncate_pagecache_range(inode, first_block_offset,
>> -					 last_block_offset);
>> +
>> +		ret = ext4_truncate_page_cache_block_range(inode,
>> +				first_block_offset, last_block_offset + 1);
>> +		if (ret)
>> +			goto out_dio;
>>  	}
>>  
>>  	if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
>> -- 
>> 2.46.1
>>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ