[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z2GPszLGfwG/ujl2@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>
Date: Tue, 17 Dec 2024 20:20:27 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, tytso@....edu, adilger.kernel@...ger.ca,
jack@...e.cz, yi.zhang@...wei.com, chengzhihao1@...wei.com,
yukuai3@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v4 03/10] ext4: don't write back data before punch hole
in nojournal mode
On Tue, Dec 17, 2024 at 08:01:26PM +0530, Ojaswin Mujoo wrote:
> On Mon, Dec 16, 2024 at 09:39:08AM +0800, Zhang Yi wrote:
> > From: Zhang Yi <yi.zhang@...wei.com>
> >
> > There is no need to write back all data before punching a hole in
> > non-journaled mode since it will be dropped soon after removing space.
> > Therefore, the call to filemap_write_and_wait_range() can be eliminated.
>
> Hi, sorry I'm a bit late to this however following the discussion here
> [1], I believe the initial concern was that we don't in PATCH v1 01/10
> was that after truncating the pagecache, the ext4_alloc_file_blocks()
> call might fail with errors like EIO, ENOMEM etc leading to inconsistent
> data.
>
> Is my understanding correct that we realised that these are very rare
> cases and are not worth the performance penalty of writeback? In which
> case, is it really okay to just let the scope for corruption exist even
> though its rare. There might be some other error cases we might be
> missing which might be more easier to hit. For eg I think we can also
> fail ext4_alloc_file_blocks() with ENOSPC in case there is a written to
> unwritten extent conversion causing an extent split leading to extent
> tree node allocation. (Maybe can be avoided by using PRE_IO with
> EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT in the first ext4_alloc_file_blocks() call)
>
> So does it make sense to retain the writeback behavior or am I just
> being paranoid :)
>
> Regards,
> ojaswin
[1]
https://lore.kernel.org/linux-ext4/20240917165007.j5dywaekvnirfffm@quack3/
>
> > Besides, similar to ext4_zero_range(), we must address the case of
> > partially punched folios when block size < page size. It is essential to
> > remove writable userspace mappings to ensure that the folio can be
> > faulted again during subsequent mmap write access.
> >
> > In journaled mode, we need to write dirty pages out before discarding
> > page cache in case of crash before committing the freeing data
> > transaction, which could expose old, stale data, even if synchronization
> > has been performed.
> >
> > Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> > ---
> > fs/ext4/inode.c | 18 +++++-------------
> > 1 file changed, 5 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index bf735d06b621..a5ba2b71d508 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -4018,17 +4018,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> >
> > trace_ext4_punch_hole(inode, offset, length, 0);
> >
> > - /*
> > - * Write out all dirty pages to avoid race conditions
> > - * Then release them.
> > - */
> > - if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
> > - ret = filemap_write_and_wait_range(mapping, offset,
> > - offset + length - 1);
> > - if (ret)
> > - return ret;
> > - }
> > -
> > inode_lock(inode);
> >
> > /* No need to punch hole beyond i_size */
> > @@ -4090,8 +4079,11 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
> > ret = ext4_update_disksize_before_punch(inode, offset, length);
> > if (ret)
> > goto out_dio;
> > - truncate_pagecache_range(inode, first_block_offset,
> > - last_block_offset);
> > +
> > + ret = ext4_truncate_page_cache_block_range(inode,
> > + first_block_offset, last_block_offset + 1);
> > + if (ret)
> > + goto out_dio;
> > }
> >
> > if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> > --
> > 2.46.1
> >
Powered by blists - more mailing lists